Multi-round evaluation for chat models #1816

YilunZhou · 2024-05-09T18:02:13Z

Is it possible to conduct multi-round evaluations for chat models? For example, I want to study how a chat model can take hints to solve math problems. Say, I have a multiple-choice math problem, with one correct choice, and one hint for each wrong choice. I first ask the question to the model, get a model answer (using generate_until and some parsing), in the following workflow:

User: Solve the following math question, and output a single letter corresponding to your selection. [question]
Agent: B

If this is correct, I stop here. If not, I continue, where the agent left off:

User: Solve the following math question, and output a single letter corresponding to your selection. [question]
Agent: C
User: This is not correct. Consider the hint and try again: [hint corresponding to choice C]
Agent: B

In the evaluation, I want to calculate the percentage of the first round correct answer, as well as the percentage of the second round correct answer when the first round answer is incorrect.

Is it possible to do in the current framework? Or is the library not suitable for this kind of evaluation?

The text was updated successfully, but these errors were encountered:

artemorloff · 2024-05-13T22:13:53Z

right now I'm working on solution for your problem!
take a look at #1571
here I suggest the way multi-step and multi-round tasks may be handled. The magic lies in update_request and update_storage funcs. You can customize them to cover your problem
it would be great, if you can take a look at this PR and leave your feedback, so that I can make it better and also draw @haileyschoelkopf @lintangsutawika attention to the desired feature

artemorloff mentioned this issue May 13, 2024

add context-based requests processing #1571

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-round evaluation for chat models #1816

Multi-round evaluation for chat models #1816

YilunZhou commented May 9, 2024

artemorloff commented May 13, 2024

Multi-round evaluation for chat models #1816

Multi-round evaluation for chat models #1816

Comments

YilunZhou commented May 9, 2024

artemorloff commented May 13, 2024