# Evaluation and Observability

---

##  Final System Prompt for `05_evaluation_and_observability.ipynb`

###  Notebook Title:
**TinyTutor Capstone Notebook 05: Evaluation and Observability**

###  Objective:
Implement robust **evaluation** and **observability** mechanisms for the TinyTutor multi-agent system using ADK. This notebook must demonstrate how the system can:
- Critique its own outputs using rubric-based scoring
- Enforce safety guardrails
- Expose its full internal decision-making trajectory
This establishes the foundation for **AgentOps discipline** and ensures the system is **evaluatable by design**.

---

###  System Prompt:
> Generate runnable Python code for `05_evaluation_and_observability.ipynb` that extends the TinyTutor multi-agent pipeline with evaluation and observability features. Implement the following:
>
> 1. **Pipeline Reuse**:
>     - Re-import or redefine the `TinyTutorCoordinator` and sub-agents/tools from Notebook 04.
>
> 2. **Observability Setup**:
>     - Configure the ADK `Runner` with `LoggingPlugin` and set `log_level=DEBUG`.
>     - Ensure full trace visibility: agent thoughts, tool calls, arguments, and outputs.
>
> 3. **SafetyCheckerAgent**:
>     - Define an `LlmAgent` named `SafetyCheckerAgent`.
>     - Instruction: Review `{final_script}` for age-appropriateness, harmful content, and safety policy adherence.
>     - Simulate using Gemini 1.5 Pro.
>     - Output: Structured JSON with `status: pass/fail` and `justification`.
>
> 4. **EvaluatorAgent (LLM-as-a-Judge)**:
>     - Define an `LlmAgent` named `EvaluatorAgent`.
>     - Instruction: Score `{final_script}` using a rubric (e.g., Simplicity, Coherence, ELI5 adherence; scale 1–5).
>     - Output: Structured JSON with scores and summary.
>
> 5. **LoopAgent Pattern (Optional)**:
>     - Wrap the `ScriptwritingAgent` in a `LoopAgent` that repeats until the EvaluatorAgent returns an “Approved” score or passes a threshold.
>
> 6. **Execution**:
>     - Run the full pipeline with a complex topic (e.g., “The mechanism of photosynthesis”).
>     - Display:
>         - Full execution trace
>         - Safety check result
>         - Evaluation scores
>         - Final approved script
>
> 7. **Best Practices**:
>     - Use structured outputs and type hints
>     - Redact PII before logging or storing
>     - Include inline comments and Markdown to explain architecture, evaluation logic, and Capstone alignment

---

##  Final Checklist for `05_evaluation_and_observability.ipynb`

| **Category**         | **Requirement**                                                                                                                                       | **Source/Justification**                                                                 |
|----------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|
| **Core Concept**      | AgentOps: Evaluation and Observability as architectural pillars                                                                                      | Capstone delivery requirement                                                             |
| **Goal**              | Integrate safety checks and quality scoring; expose full agent trajectory                                                                            | Ensures transparency, reliability, and trustworthiness                                    |
| **Dependencies**      | Requires full pipeline from Notebook 04                                                                                                              | Builds on multi-agent orchestration and memory logic                                      |
| **Required Tools**    | - `LoggingPlugin` with `log_level=DEBUG` <br> - `SafetyCheckerAgent` <br> - `EvaluatorAgent` <br> - Optional: `LoopAgent`                            | Enables traceability and iterative refinement                                             |
| **Agent Design**      | - SafetyCheckerAgent: non-negotiable guardrail <br> - EvaluatorAgent: rubric-based quality judge                                                     | Mirrors real-world QA and compliance workflows                                            |
| **Evaluation Logic**  | - Safety: pass/fail + justification <br> - Quality: rubric scores (1–5)                                                                              | Validates pedagogical clarity and child-appropriateness                                   |
| **Execution**         | - Run full pipeline with complex topic <br> - Show trace, scores, and final output                                                                  | Demonstrates system maturity and readiness                                                |
| **Architecture**      | - Evaluatable by design <br> - LoopAgent for iterative refinement                                                                                    | Aligns with AgentOps and Capstone rubric                                                  |
| **Good Practices**    | - Structured logs and metrics <br> - Redact sensitive data <br> - Use clear scoring schema                                                           | Ensures compliance, clarity, and reproducibility                                          |
| **Documentation**     | - Inline comments <br> - Markdown explanations                                                                                                       | Supports Capstone reviewers and future collaborators                                      |

---

###  What We’ll Have When This Code Is Done

-  A fully observable, evaluatable multi-agent pipeline
-  Two specialized critique agents: one for safety, one for quality
-  A traceable execution log showing agent thoughts, tool calls, and outputs
-  A rubric-based scoring system for pedagogical quality
-  Optional loop logic for iterative refinement
-  Clear documentation and inline logic to support Capstone delivery and debugging

---


In [None]:
#code here