Skip to content

1.1.0-beta.0

Latest

Choose a tag to compare

@sanchitmehtagit sanchitmehtagit released this 10 Jun 18:23
· 33 commits to main since this release
5492372

Improved Eval Score v1.1

v1.1 expands evaluation coverage from 12 to 13 frameworks with the addition of Flask and adds support for evaluating newer models, including Claude Opus 4.8, Claude Haiku 4.5, Gemini 3.5 Flash, and GPT-5.4 Mini.

Using Flywheel, we identified opportunities to improve our Skills. With the latest Skills and MCP enhancements, the agent-assisted evals score increased from 93.0 in v1 to 96.9 in v1.1 (+3.9 points), demonstrating continued gains in Auth0 SDK integration task completion across frameworks and models and further enhancing the agentic experience for our customers.

Live score: https://auth0.com/agent-experience