Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-round evaluation for chat models #1816

Open
YilunZhou opened this issue May 9, 2024 · 1 comment
Open

Multi-round evaluation for chat models #1816

YilunZhou opened this issue May 9, 2024 · 1 comment

Comments

@YilunZhou
Copy link

Is it possible to conduct multi-round evaluations for chat models? For example, I want to study how a chat model can take hints to solve math problems. Say, I have a multiple-choice math problem, with one correct choice, and one hint for each wrong choice. I first ask the question to the model, get a model answer (using generate_until and some parsing), in the following workflow:

User: Solve the following math question, and output a single letter corresponding to your selection. [question]
Agent: B

If this is correct, I stop here. If not, I continue, where the agent left off:

User: Solve the following math question, and output a single letter corresponding to your selection. [question]
Agent: C
User: This is not correct. Consider the hint and try again: [hint corresponding to choice C]
Agent: B

In the evaluation, I want to calculate the percentage of the first round correct answer, as well as the percentage of the second round correct answer when the first round answer is incorrect.

Is it possible to do in the current framework? Or is the library not suitable for this kind of evaluation?

@artemorloff
Copy link
Contributor

right now I'm working on solution for your problem!
take a look at #1571
here I suggest the way multi-step and multi-round tasks may be handled. The magic lies in update_request and update_storage funcs. You can customize them to cover your problem
it would be great, if you can take a look at this PR and leave your feedback, so that I can make it better and also draw @haileyschoelkopf @lintangsutawika attention to the desired feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants