# Mithril: Advanced Agentic FinnGen Workflow

## Overview
This notebook demonstrates the capabilities of **Mithril**, a multi-agent system built with **Google ADK** for the Kaggle Agents Intensive Capstone. It showcases:

1.  **Multi-Agent Orchestration**: Planner, Researcher, Analyst, Coder, Reviewer.
2.  **Advanced Scenarios**: Solving complex biomedical questions (GLP-1, CKD, PheWAS).
3.  **State Management**: Long-term memory and session persistence.
4.  **Observability**: Structured logging and tracing.
5.  **Evaluation**: Automated assessment of agent performance.

## 1. Setup & Initialization

In [None]:
!pip install google-generativeai beautifulsoup4 pandas python-dotenv matplotlib
import os
import sys
import json
import pandas as pd
from dotenv import load_dotenv
import google.generativeai as genai

# Add src to path
sys.path.append(os.path.abspath("../"))
from src.agents.planner import PlannerAgent
from src.memory import FileBasedMemory

# Configure Environment
load_dotenv()
if os.getenv("VERTEX_API_KEY"):
    genai.configure(api_key=os.getenv("VERTEX_API_KEY"))
    print("API Key Configured")
else:
    print("Warning: VERTEX_API_KEY not found in .env")

## 2. Advanced Research Scenarios

### Scenario A: GLP-1 Agonist Weight Loss (Dynamic Code Execution)
**Query**: "Identify all individuals with GLP1 prescription who lost more than 20% their weight 1 year after initiation."

*Capabilities Demonstrated*: 
- **Researcher**: Finds GLP-1 ATC codes.
- **Coder**: Writes R code to join drug purchases with lab measurements (weight) and calculate delta.
- **Reviewer**: Validates the logic.

In [None]:
planner = PlannerAgent()
query_glp1 = "Identify all individuals with GLP1 prescription who lost more than 20% their weight 1 year after initiation of the prescription."
result_glp1 = planner.execute_workflow(query_glp1)
print(result_glp1)

### Scenario B: CKD Trajectories (Standard Analysis Tool)
**Query**: "Calculate the eGFR trajectories for patients with chronic kidney disease upon initiation of ACE inhibitors."

*Capabilities Demonstrated*:
- **Analyst**: Recognizes this as a BLUP analysis task.
- **MCP Tool**: Calls `calculate_blup_slopes` via the MCP server.

In [None]:
query_ckd = "Calculate the eGFR trajectories for patients with chronic kidney disease upon initiation of ACE inhibitors."
result_ckd = planner.execute_workflow(query_ckd)
print(result_ckd)

### Scenario C: Comorbidity Overlap (Set Operations)
**Query**: "What is the overlap of patients diagnosed with high blood pressure and on statin with patients prescribed GLP1-RA?"

*Capabilities Demonstrated*:
- **Complex Logic**: Intersection of 3 cohorts (Diagnosis + Drug A + Drug B).

In [None]:
query_overlap = "What is the overlap of patients diagnosed with high blood pressure and on statin with patients prescribed GLP1-RA?"
result_overlap = planner.execute_workflow(query_overlap)
print(result_overlap)

## 3. Session Management & Long-Term Memory
The agent persists context across turns using `FileBasedMemory`. This allows us to inspect the state of any session.

In [None]:
# Load memory
memory = FileBasedMemory()
with open("agent_memory.json", "r") as f:
    mem_data = json.load(f)

# Display the most recent session context
last_session_id = list(mem_data["sessions"].keys())[-1]
print(f"Session ID: {last_session_id}")
print("Context:", json.dumps(mem_data["sessions"][last_session_id]["context"], indent=2))

## 4. Observability: Logging & Tracing
Every action is logged structurally. We can parse `agent_trace.log` to visualize the agent's decision path.

In [None]:
logs = []
with open("agent_trace.log", "r") as f:
    for line in f:
        try:
            # Extract JSON part from log line (assuming format: TIME - NAME - LEVEL - JSON)
            json_str = line.split(" - ")[-1]
            logs.append(json.loads(json_str))
        except:
            continue

df_logs = pd.DataFrame(logs)
print(f"Total Actions Logged: {len(df_logs)}")
display(df_logs[["agent", "action"]].head(10))

## 5. Agent Evaluation (ADK Style)
We can use an LLM to evaluate the quality of the agent's responses against a rubric, simulating the Google ADK evaluation suite.

In [None]:
def evaluate_response(query, result):
    eval_model = genai.GenerativeModel('gemini-1.5-pro-latest')
    prompt = f"""
    Evaluate the following agent response based on the user query.
    
    User Query: {query}
    Agent Response: {result}
    
    Score (1-5) and explain why.
    Criteria:
    1. Accuracy: Did it answer the specific question?
    2. Completeness: Did it handle all constraints?
    3. Clarity: Is the answer easy to understand?
    """
    return eval_model.generate_content(prompt).text

# Evaluate the GLP-1 result
print(evaluate_response(query_glp1, result_glp1))