Welcome to the "Evaluating AI Agents" course! 🎯 This course, created in collaboration with Arize AI and taught by John Gilhuly (Head of Developer Relations) and Aman Khan (Director of Product), will equip you with practical skills to systematically evaluate, debug, and improve AI agents during development and deployment.
This course walks you through the essential tools and techniques to measure and iterate on AI agent performance. You’ll learn to evaluate agent behaviors step-by-step, select the right metrics and evaluators, and run structured experiments to optimize agent outcomes.
- 🔍 Observability & Debugging: Learn how to add tracing to your agents to visualize and debug their actions.
- 🧪 Structured Evaluation: Understand the differences between evaluating traditional software vs. LLM agents, and how to test individual agent components.
- ⚖️ Evaluator Selection: Learn when to use code-based metrics, LLM-as-a-judge, or human annotations for reliable feedback.
- 🔁 Running Experiments: Conduct structured experiments by changing model parameters, prompts, or agent logic to iteratively improve performance.
- 📊 Component-Wise Evaluation: Test agent submodules like skills and router decisions using real examples.
- 📈 Production Monitoring: Deploy these evaluation techniques in production to ensure consistent agent performance over time.
- 🧠 Trace and Debug AI Agents: Add observability to track decision-making steps and identify performance bottlenecks.
- ⚙️ Use Evaluation Metrics Thoughtfully: Apply convergence scores, response accuracy, and step efficiency to fine-tune your agent.
- 🔬 Structured Experimentation: Improve agents with a scientific, reproducible approach.
John Gilhuly
Head of Developer Relations at Arize AI
Aman Khan
Director of Product at Arize AI
Both bring extensive experience in LLM observability, evaluation tooling, and production deployment of intelligent agents.
🚀 Start evaluating smarter AI agents today:
📚 deeplearning.ai