🤖 Evaluating AI Agents

Welcome to the "Evaluating AI Agents" course! 🎯 This course, created in collaboration with Arize AI and taught by John Gilhuly (Head of Developer Relations) and Aman Khan (Director of Product), will equip you with practical skills to systematically evaluate, debug, and improve AI agents during development and deployment.

📘 Course Summary

This course walks you through the essential tools and techniques to measure and iterate on AI agent performance. You’ll learn to evaluate agent behaviors step-by-step, select the right metrics and evaluators, and run structured experiments to optimize agent outcomes.

What You’ll Learn

🔍 Observability & Debugging: Learn how to add tracing to your agents to visualize and debug their actions.
🧪 Structured Evaluation: Understand the differences between evaluating traditional software vs. LLM agents, and how to test individual agent components.
⚖️ Evaluator Selection: Learn when to use code-based metrics, LLM-as-a-judge, or human annotations for reliable feedback.
🔁 Running Experiments: Conduct structured experiments by changing model parameters, prompts, or agent logic to iteratively improve performance.
📊 Component-Wise Evaluation: Test agent submodules like skills and router decisions using real examples.
📈 Production Monitoring: Deploy these evaluation techniques in production to ensure consistent agent performance over time.

🔑 Key Points

🧠 Trace and Debug AI Agents: Add observability to track decision-making steps and identify performance bottlenecks.
⚙️ Use Evaluation Metrics Thoughtfully: Apply convergence scores, response accuracy, and step efficiency to fine-tune your agent.
🔬 Structured Experimentation: Improve agents with a scientific, reproducible approach.

👨‍🏫 About the Instructors

John Gilhuly
Head of Developer Relations at Arize AI

Aman Khan
Director of Product at Arize AI

Both bring extensive experience in LLM observability, evaluation tooling, and production deployment of intelligent agents.

🚀 Start evaluating smarter AI agents today:
📚 deeplearning.ai

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
L11.ipynb		L11.ipynb
L3.ipynb		L3.ipynb
L5.ipynb		L5.ipynb
L7.ipynb		L7.ipynb
L9.ipynb		L9.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 Evaluating AI Agents

📘 Course Summary

What You’ll Learn

🔑 Key Points

👨‍🏫 About the Instructors

About

Uh oh!

Releases

Packages

Languages

ksm26/Evaluating-AI-Agents

Folders and files

Latest commit

History

Repository files navigation

🤖 Evaluating AI Agents

📘 Course Summary

What You’ll Learn

🔑 Key Points

👨‍🏫 About the Instructors

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages