Skip to content

Conversation

@dominicchapman
Copy link
Member

New "evaluation" content:

  • Updated workflow language from "Measure" to "Evaluate" to better reflect our approach
  • Reorganized evaluation content into a dedicated section with six focused pages (overview, setup, write evaluations, flags & experiments, run evaluations, analyze results)

Other changes:

  • Concepts: Added definitions for flags and experiments; integrated AI capability architecture spectrum (single-turn → workflows → single-agent → multi-agent)
  • Create: De-emphasized experimental prompt management features while clarifying Axiom's current focus on evaluation and observability; added references to Vercel AI SDK examples and Mastra as framework alternatives
  • Iterate: Complete rewrite introducing the systematic improvement loop; added sections on user feedback capture and domain expert annotation workflows (marked as coming soon); reorganized failure categorization by severity for better prioritization
  • Quickstart: Updated to reference evaluation framework and CLI authentication; improved "What's next" guidance

@dominicchapman dominicchapman changed the base branch from mano/evals to main November 18, 2025 22:40
@dominicchapman
Copy link
Member Author

Closing in favor of #473 which should generate a preview

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants