Skip to content

A hands-on course repository for Evaluating AI Agents, created with Arize AI, that teaches you how to systematically evaluate, debug, and improve AI agents using observability tools, structured experiments, and reliable metrics. Learn production-grade techniques to enhance agent performance during development and after deployment.

Notifications You must be signed in to change notification settings

ksm26/Evaluating-AI-Agents

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to the "Evaluating AI Agents" course! 🎯 This course, created in collaboration with Arize AI and taught by John Gilhuly (Head of Developer Relations) and Aman Khan (Director of Product), will equip you with practical skills to systematically evaluate, debug, and improve AI agents during development and deployment.


📘 Course Summary

This course walks you through the essential tools and techniques to measure and iterate on AI agent performance. You’ll learn to evaluate agent behaviors step-by-step, select the right metrics and evaluators, and run structured experiments to optimize agent outcomes.

What You’ll Learn

  1. 🔍 Observability & Debugging: Learn how to add tracing to your agents to visualize and debug their actions.
  2. 🧪 Structured Evaluation: Understand the differences between evaluating traditional software vs. LLM agents, and how to test individual agent components.
  3. ⚖️ Evaluator Selection: Learn when to use code-based metrics, LLM-as-a-judge, or human annotations for reliable feedback.
  4. 🔁 Running Experiments: Conduct structured experiments by changing model parameters, prompts, or agent logic to iteratively improve performance.
  5. 📊 Component-Wise Evaluation: Test agent submodules like skills and router decisions using real examples.
  6. 📈 Production Monitoring: Deploy these evaluation techniques in production to ensure consistent agent performance over time.

🔑 Key Points

  • 🧠 Trace and Debug AI Agents: Add observability to track decision-making steps and identify performance bottlenecks.
  • ⚙️ Use Evaluation Metrics Thoughtfully: Apply convergence scores, response accuracy, and step efficiency to fine-tune your agent.
  • 🔬 Structured Experimentation: Improve agents with a scientific, reproducible approach.

👨‍🏫 About the Instructors

John Gilhuly
Head of Developer Relations at Arize AI

Aman Khan
Director of Product at Arize AI

Both bring extensive experience in LLM observability, evaluation tooling, and production deployment of intelligent agents.


🚀 Start evaluating smarter AI agents today:
📚 deeplearning.ai

About

A hands-on course repository for Evaluating AI Agents, created with Arize AI, that teaches you how to systematically evaluate, debug, and improve AI agents using observability tools, structured experiments, and reliable metrics. Learn production-grade techniques to enhance agent performance during development and after deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published