Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Update SC #126

Merged
merged 15 commits into from
Jul 28, 2023
Merged

[Feature] Update SC #126

merged 15 commits into from
Jul 28, 2023

Conversation

Leymore
Copy link
Collaborator

@Leymore Leymore commented Jul 28, 2023

The same as #57

@Leymore Leymore requested a review from liushz July 28, 2023 08:50
@Leymore Leymore merged commit d862f57 into open-compass:main Jul 28, 2023
1 check passed
@Leymore Leymore deleted the liushz/sc branch July 28, 2023 09:29
```

```{note}
注意,OpenCompass 默认使用默认使用 argmax 的方式采样下一个 token,因此若不指定采样参数,模型每次的推理结果将会是完全一致的,多轮评测将会失效。
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English?

注意,OpenCompass 默认使用默认使用 argmax 的方式采样下一个 token,因此若不指定采样参数,模型每次的推理结果将会是完全一致的,多轮评测将会失效。
```

Where `SAMPLE_SIZE` is the number of reasoning paths in Self-Consistency, higher value usually outcome higher performance. The following figure from the paper demonstrates the relation between reasoning paths and performance in several reasoning tasks:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From which paper? We need to make a citation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to point out that the sample generation_kwargs only works for HuggingFace models.

Comment on lines +70 to +72
Where `SAMPLE_SIZE` is the number of reasoning paths in Self-Consistency, higher value usually outcome higher performance. The following figure from the paper demonstrates the relation between reasoning paths and performance in several reasoning tasks:
![image](https://github.com/InternLM/opencompass/assets/28834990/05c7d850-7076-43ca-b165-e6251f9b3001)
From the figure, it can be seen that in different reasoning tasks, performance tends to improve as the number of reasoning paths increases. However, for some tasks, increasing the number of reasoning paths may reach a limit, and further increasing the number of paths may not bring significant performance improvement. Therefore, it is necessary to conduct experiments and adjustments on specific tasks to find the optimal number of reasoning paths that best suit the task.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A blank line between the paragraph and image makes layout better


## 3. Self-Consistency

The SC (Self-Consistency) method is proposed in [this paper](https://arxiv.org/abs/2203.11171), which will sample multiple reasoning paths for the question, and make majority voting to the generated answers for LLMs. This method displays remarkable proficiency among reasoning tasks with high accuracy but may consume more time and resources when inferencing, because of the majority voting strategy. In OpenCompass, you can simply set SC method in the dataset config like:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explicitly tell readers they have to replace GenInferencer with SCInferencer

)
)
gsm8k_eval_cfg = dict(sc_size=SAMPLE_SIZE)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a link to the new gsm8k config for interested readers to follow

sc_results.append(results)
sc_prediction = list(map(list, zip(*sc_results)))
generated = sc_prediction
print(generated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del

save_every: Optional[int] = None,
fix_id_list: Optional[List[int]] = None,
sc_size: Optional[int] = 1,
infer_type: Optional[str] = '',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infer_type is not even used here.

Its implementation seems pretty close to GenInferencer. Consider employing inheritance to cut down on code redundancy and ease future maintenance.

@@ -164,6 +186,14 @@ def _extract_role_pred(self, s: str, begin_str: Optional[str],

return s[start:end]

def _get_vote_out(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short docstring is required here.

@gaotongxiao
Copy link
Collaborator

Don't forget to update the ToC of documentation docs/en/index.rst & docs/zh_cn/index.rst. Otherwise the markdown docs won't be rendered in readthedocs

go-with-me000 pushed a commit to go-with-me000/opencompass that referenced this pull request Oct 9, 2023
* add self-consistency

* add CoT method Self-Consistency

* fix typo error and update openicl_eval

* add tydiQA-GoldP task

* fix sc

* rename gsm8k_sc

* fix sc

* add self-consistency doc

* refine sc

---------

Authored-by: liushz <qq1791167085@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants