Skip to content

updating turn handling for multi-turn evals#23

Merged
khyatimahajan merged 1 commit intomainfrom
feat/update_multi_turn
Dec 31, 2025
Merged

updating turn handling for multi-turn evals#23
khyatimahajan merged 1 commit intomainfrom
feat/update_multi_turn

Conversation

@khyatimahajan
Copy link
Copy Markdown
Collaborator

@khyatimahajan khyatimahajan commented Dec 24, 2025

📌 Description

Updated mutli-turn handling for messages to extend into more than 2 turns.

🔗 Related Issue(s)

Feature change

🛠️ Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality including new tasks)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor / Code cleanup
  • Maintenance / Chore / Task
  • Other (please describe):

✅ How Has This Been Tested?

MT-Bench task was run before and after changes, it ran successfully. Outputs were manually verified as well.

  • Unit tests
  • Integration tests
  • Manual testing

📸 Screenshots / Demos

Sample logs before changes:

{"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "candidate": {"id": 101, "category": "reasoning", "instructions": ["", ""], "responses": ["If you have just overtaken the second person in the race, you are now in second place. The person you just overtook is now in third place.", "If the second person is changed to the last person in the previous question, then the person you just overtook would be the last person in the race. Your current position would still be second, and the person you overtook would now be in the last position."], "targets": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "turns": [0, 1]}, "score": {"turn1": 10.0, "turn2": 3.0, "overall": 65.0}}
{"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "candidate": {"id": 102, "category": "reasoning", "instructions": ["", ""], "responses": ["The description you provided doesn't mention a white house. If you're looking for a specific location or landmark, please provide more details or context.", "The original question does not contain any clues about the location of the White House. It describes different colored houses and places, but none of them are identified as the White House."], "targets": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "turns": [0, 1]}, "score": {"turn1": 1.0, "turn2": 7.0, "overall": 40.0}}

Sample logs after changes:

{"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "candidate": {"id": 101, "category": "reasoning", "instructions": ["", ""], "responses": ["If you have just overtaken the second person in the race, you are now in second place. The person you just overtook is now in third place.", "If the second person is changed to the last person in the previous question, then the person you just overtook would be the last person in the race. Your current position would still be second, and the person you overtook would now be in the last position."], "targets": ["If you have just overtaken the second person, your current position is now second place. The person you just overtook is now in third place.", "If you have just overtaken the last person, it means you were previously the second to last person in the race. After overtaking the last person, your position remains the same, which is second to last. The person you just overtook is now in the last place."], "turns": [0, 1]}, "score": {"turn1": 10.0, "turn2": 4.0, "overall": 70.0}}
{"dataset": "mtbench_audio", "metric": "mt_bench_llm_judge", "model": "gpt-4o-mini-audio-preview", "instruction": ["", ""], "reference": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "candidate": {"id": 102, "category": "reasoning", "instructions": ["", ""], "responses": ["The description you provided doesn't mention a white house. If you're looking for a specific location or landmark, please provide more details or context.", "The original question does not contain any clues about the location of the White House. It describes different colored houses and places, but none of them are identified as the White House."], "targets": ["The White House is located at 1600 Pennsylvania Avenue NW in Washington, D.C. It is the official residence and workplace of the President of the United States.", "No, the original question does not contain any clues to definitively determine the location of the White House. It only describes a red house, a greenhouse, and a heated pink place, which are unrelated to the White House's location."], "turns": [0, 1]}, "score": {"turn1": 2.0, "turn2": 6.0, "overall": 40.0}}

📋 Checklist

  • Code follows project style guidelines
  • Tests have been added/updated (if applicable)
  • Documentation has been updated (if applicable)
  • Linked relevant issue(s)
  • Self-reviewed my code

🙌 Additional Notes

Copy link
Copy Markdown
Collaborator

@aman-servicenow aman-servicenow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@khyatimahajan khyatimahajan merged commit 6b962df into main Dec 31, 2025
@khyatimahajan khyatimahajan deleted the feat/update_multi_turn branch December 31, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants