Releases · auth0/auth0-evals

10 Jun 18:23

1.1.0-beta.0

5492372

1.1.0-beta.0 Latest

Latest

Improved Eval Score v1.1

v1.1 expands evaluation coverage from 12 to 13 frameworks with the addition of Flask and adds support for evaluating newer models, including Claude Opus 4.8, Claude Haiku 4.5, Gemini 3.5 Flash, and GPT-5.4 Mini.

Using Flywheel, we identified opportunities to improve our Skills. With the latest Skills and MCP enhancements, the agent-assisted evals score increased from 93.0 in v1 to 96.9 in v1.1 (+3.9 points), demonstrating continued gains in Auth0 SDK integration task completion across frameworks and models and further enhancing the agentic experience for our customers.

Live score: https://auth0.com/agent-experience

Assets 2

10 Jun 18:15

sanchitmehtagit

v1.0.0

a002456

1.0.0

Built an eval framework to answer that- a system that runs real AI agents through real Auth0 integration tasks, then scores the output across 7 dimensions including correctness, security, and hallucination. We measured the agent experience across 5 models, 12 frameworks, and 60 configurations and made the results public on https://auth0.com/agent-experience

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Improved Eval Score v1.1

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Releases: auth0/auth0-evals

1.1.0-beta.0

Improved Eval Score v1.1

Uh oh!

1.0.0

Uh oh!