What's Changed
- Save more information on judge failures and rename final failed folder by @gabegma in #88
- Modify user simulator prompt and improve user behavioral fidelity metric by @gabegma in #89
- Add elevenlabs framework by @katstankiewicz in #90
- Improvements to the results analysis app by @fanny-riols in #77
- Fix metric bugs by @gabegma in #91
- Improve faithfulness and conversation progression for S2S by @gabegma in #93
- process elevenlabs as cascade by @katstankiewicz in #94
- Run all metrics by default by @JosephMarinier in #96
- Enhance robustness for turn-taking metrics by @nhhoang96 in #95
- Omit new names for pass@k by @gabegma in #98
- individualize override parameters per backend by @raghavm243512 in #97
- Scratch/gemini tool fix by @tara-servicenow in #99
- Bump metric version to account for recent changes by @tara-servicenow in #101
- Update nvidia websocket by @katstankiewicz in #103
- Classify Ultravox as AUDIO_LLM in get_pipeline_type by @tara-servicenow in #104
- Add assets directory in dockerfile so that background noise files are… by @tara-servicenow in #102
- Don't allow multiple domains with existing run id by @tara-servicenow in #105
- add gemini ALM support by @raghavm243512 in #106
- Update OpenAI by @katstankiewicz in #110
- Various bug fixes and improvements to metrics by @gabegma in #112
- Update documentation for ElevenLabs User Simulator and leaderboard model configs by @fanny-riols in #111
- Add metric versionning by @gabegma in #113
- Update website with v2 by @tara-servicenow in #114
- Refactor model config by @JosephMarinier in #109
- Website fixes by @tara-servicenow in #115
- Change how to run multiple domains by @JosephMarinier in #100
- Website fixes: Perturbation significance asterisks; scope domain toggle to scatter plot by @lindsaydbrin in #116
- Elevenlabs not saving by @katstankiewicz in #107
- Pr/tara/website names by @tara-servicenow in #118
- bump version for release by @katstankiewicz in #117
New Contributors
- @lindsaydbrin made their first contribution in #116
Full Changelog: 0.1.3...2.0.0