updating turn handling for multi-turn evals#23
Merged
khyatimahajan merged 1 commit intomainfrom Dec 31, 2025
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📌 Description
Updated mutli-turn handling for messages to extend into more than 2 turns.
🔗 Related Issue(s)
Feature change
🛠️ Type of Change
✅ How Has This Been Tested?
MT-Bench task was run before and after changes, it ran successfully. Outputs were manually verified as well.
📸 Screenshots / Demos
Sample logs before changes:
{"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "candidate": {"id": 101, "category": "reasoning", "instructions": ["", ""], "responses": ["If you have just overtaken the second person in the race, you are now in second place. The person you just overtook is now in third place.", "If the second person is changed to the last person in the previous question, then the person you just overtook would be the last person in the race. Your current position would still be second, and the person you overtook would now be in the last position."], "targets": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "turns": [0, 1]}, "score": {"turn1": 10.0, "turn2": 3.0, "overall": 65.0}} {"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "candidate": {"id": 102, "category": "reasoning", "instructions": ["", ""], "responses": ["The description you provided doesn't mention a white house. If you're looking for a specific location or landmark, please provide more details or context.", "The original question does not contain any clues about the location of the White House. It describes different colored houses and places, but none of them are identified as the White House."], "targets": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "turns": [0, 1]}, "score": {"turn1": 1.0, "turn2": 7.0, "overall": 40.0}}Sample logs after changes:
{"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "candidate": {"id": 101, "category": "reasoning", "instructions": ["", ""], "responses": ["If you have just overtaken the second person in the race, you are now in second place. The person you just overtook is now in third place.", "If the second person is changed to the last person in the previous question, then the person you just overtook would be the last person in the race. Your current position would still be second, and the person you overtook would now be in the last position."], "targets": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "turns": [0, 1]}, "score": {"turn1": 10.0, "turn2": 4.0, "overall": 70.0}} {"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "candidate": {"id": 102, "category": "reasoning", "instructions": ["", ""], "responses": ["The description you provided doesn't mention a white house. If you're looking for a specific location or landmark, please provide more details or context.", "The original question does not contain any clues about the location of the White House. It describes different colored houses and places, but none of them are identified as the White House."], "targets": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "turns": [0, 1]}, "score": {"turn1": 2.0, "turn2": 6.0, "overall": 40.0}}📋 Checklist
🙌 Additional Notes