Release 1.1.0-beta.0 · auth0/auth0-evals

Improved Eval Score v1.1

v1.1 expands evaluation coverage from 12 to 13 frameworks with the addition of Flask and adds support for evaluating newer models, including Claude Opus 4.8, Claude Haiku 4.5, Gemini 3.5 Flash, and GPT-5.4 Mini.

Using Flywheel, we identified opportunities to improve our Skills. With the latest Skills and MCP enhancements, the agent-assisted evals score increased from 93.0 in v1 to 96.9 in v1.1 (+3.9 points), demonstrating continued gains in Auth0 SDK integration task completion across frameworks and models and further enhancing the agentic experience for our customers.

Live score: https://auth0.com/agent-experience

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.1.0-beta.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Improved Eval Score v1.1

Uh oh!