1.0.0
Built an eval framework to answer that- a system that runs real AI agents through real Auth0 integration tasks, then scores the output across 7 dimensions including correctness, security, and hallucination. We measured the agent experience across 5 models, 12 frameworks, and 60 configurations and made the results public on https://auth0.com/agent-experience