Skip to content

docs: use case playbooks (persona-driven evaluation guides) #96

@placerda

Description

@placerda

Summary

Create persona-driven use case playbooks — short, practical guides that answer specific questions a developer has when they sit down to evaluate their agent or model.

Motivation

Existing tutorials are organized by evaluation scenario (model-direct, RAG, agent workflow). But developers think in terms of their situation: "I just built a Foundry agent, how do I test it?" or "I need to compare two models before choosing one." Playbooks bridge this gap by starting from the developer's context and pointing them to the right bundle, dataset shape, and run config.

Proposed Playbooks

Each playbook should be 1-2 pages, actionable, and link to the detailed tutorial:

  1. "I just created a Foundry agent — how do I evaluate it?"

    • Foundry agent + conversational or model quality bundle
    • Minimal dataset, first eval run, interpret results
  2. "My agent uses RAG — how do I verify groundedness?"

    • RAG bundle + dataset with context field
    • Groundedness, relevance, retrieval evaluators
  3. "I want to compare GPT-4o vs GPT-4o-mini for my use case"

    • Model comparison workflow with agentops eval compare
    • Same dataset, two run configs, side-by-side report
  4. "I need to ensure content safety before deploying"

    • Content safety bundle (violence, sexual, self-harm, hate)
    • Adversarial dataset patterns, threshold recommendations
  5. "I have an HTTP agent (LangGraph / LangChain / ACA)"

    • HTTP backend setup, request_field / response_field mapping
    • Tool calls extraction for agent-with-tools scenarios
  6. "I want to gate PRs on evaluation quality"

    • agentops workflow generate, exit code contract
    • Threshold strategy, AZURE_AI_FOUNDRY_PROJECT_ENDPOINT as secret

Format

Each playbook follows the same structure:

  • Situation — 1-sentence description of the developer's context
  • What you need — prerequisites (agent deployed, model available, etc.)
  • Steps — numbered, with exact CLI commands
  • Expected output — what results.json and report.md will show
  • Next steps — links to detailed tutorials and related playbooks

Acceptance Criteria

  • At least 6 playbooks covering the scenarios above
  • Each playbook is self-contained and actionable
  • Consistent format across all playbooks
  • Links to detailed tutorials for deeper reading

Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions