# Agno (Phidata) Framework Baseline for Clinical Trial NLP

## Overview

This notebook demonstrates how to use Agno (formerly Phidata) to build a high-performance, full-stack agent system for clinical trial natural language inference (NLI). Agno provides built-in memory, knowledge management, and reasoning capabilities.

### Why Agno?
- **Full-stack platform**: Comprehensive agent development environment
- **Built-in memory**: Persistent context and conversation memory
- **Knowledge integration**: RAG capabilities and knowledge base management
- **High performance**: Optimized for production workloads
- **Rich tooling**: Extensive built-in tools and integrations
- **Enterprise ready**: Designed for large-scale applications

### Agent Architecture
We'll implement a coordinated agent system with:
1. **Clinical Research Assistant**: Main coordinator with medical knowledge
2. **Data Analyst Agent**: Specialized in numerical and statistical analysis
3. **Logic Validator Agent**: Ensures logical consistency
4. **Decision Maker Agent**: Final entailment classification

## Setup and Installation

First, let's set up our environment and import the necessary libraries:

In [None]:
# Load environment variables
from dotenv import load_dotenv
import os

load_dotenv()
print("✅ Environment loaded")

In [None]:
# Import required libraries
import json
import pandas as pd
from tqdm import tqdm
from typing import Dict, List, Any, Optional
import warnings
warnings.filterwarnings('ignore')

# Phidata/Agno imports
from phi.agent import Agent
from phi.model.google import Gemini
from phi.tools.duckduckgo import DuckDuckGo
from phi.storage.agent.sqlite import SqlAgentStorage
from phi.knowledge.text import TextKnowledgeBase
from phi.vectordb.chroma import ChromaDb

print("✅ All libraries imported successfully")

## Data Loading and Utilities

Let's create utility functions for loading and processing clinical trial data:

In [None]:
def load_clinical_trial(trial_id: str) -> Dict[str, Any]:
    """Load clinical trial data from JSON file.
    
    Args:
        trial_id: The NCT identifier for the clinical trial
        
    Returns:
        Dictionary containing trial data or error information
    """
    try:
        file_path = os.path.join("training_data", "CT json", f"{trial_id}.json")
        with open(file_path, "r", encoding="utf-8") as f:
            data = json.load(f)
        return data
    except FileNotFoundError:
        return {"error": f"Clinical trial {trial_id} not found"}
    except Exception as e:
        return {"error": f"Error loading {trial_id}: {str(e)}"}

def load_dataset(filepath: str) -> Dict[str, Any]:
    """Load training or test dataset.
    
    Args:
        filepath: Path to the JSON dataset file
        
    Returns:
        Dictionary containing the dataset
    """
    try:
        with open(filepath, "r", encoding="utf-8") as f:
            return json.load(f)
    except Exception as e:
        print(f"Error loading dataset: {e}")
        return {}

def format_trial_data(trial_data: Dict[str, Any], focus_section: Optional[str] = None) -> str:
    """Format trial data for agent consumption.
    
    Args:
        trial_data: Clinical trial data dictionary
        focus_section: Optional section to focus on
        
    Returns:
        Formatted string containing trial information
    """
    if "error" in trial_data:
        return f"Error: {trial_data['error']}"
    
    # Extract key sections
    sections = {
        "Trial ID": trial_data.get("Clinical Trial ID", "Unknown"),
        "Eligibility": trial_data.get("Eligibility", []),
        "Intervention": trial_data.get("Intervention", []),
        "Results": trial_data.get("Results", []),
        "Adverse Events": trial_data.get("Adverse_Events", [])
    }
    
    # Format output
    formatted = [f"Clinical Trial: {sections['Trial ID']}"]
    
    # Focus on specific section if requested
    if focus_section and focus_section in ["Eligibility", "Intervention", "Results", "Adverse Events"]:
        section_data = sections[focus_section]
        formatted.append(f"\n{focus_section}:")
        if isinstance(section_data, list):
            for item in section_data:
                formatted.append(f"  - {item}")
        else:
            formatted.append(f"  {section_data}")
    else:
        # Include all sections
        for section_name, section_data in list(sections.items())[1:]:  # Skip Trial ID
            if section_data:
                formatted.append(f"\n{section_name}:")
                if isinstance(section_data, list):
                    for item in section_data[:5]:  # Limit to first 5 items for readability
                        formatted.append(f"  - {item}")
                    if len(section_data) > 5:
                        formatted.append(f"  ... ({len(section_data)-5} more items)")
                else:
                    formatted.append(f"  {section_data}")
    
    return "\n".join(formatted)

# Test utilities
sample_trial = load_clinical_trial("NCT00066573")
print(f"✅ Data utilities ready. Sample trial: {sample_trial.get('Clinical Trial ID', 'Error')}")

## Model and Storage Configuration

Set up the Gemini model and storage for agent memory:

In [None]:
# Model configuration
model = Gemini(
    id="gemini-2.5-flash",
    api_key=os.getenv("GEMINI_API_KEY"),
    temperature=0.1  # Low temperature for consistent results
)

# Storage for agent memory (optional)
storage = SqlAgentStorage(
    table_name="clinical_trial_agents",
    db_file="agno_agents.db"
)

print("✅ Model and storage configured")

## Knowledge Base Setup

Create a knowledge base with clinical trial information for RAG capabilities:

In [None]:
# Create knowledge base with clinical trial concepts
clinical_knowledge = TextKnowledgeBase(
    sources=[
        # Add clinical trial domain knowledge
        """
        Clinical Trial Terminology:
        
        Entailment: A statement is entailed by trial data if it is directly supported by the evidence.
        Contradiction: A statement contradicts trial data if it is refuted by the evidence.
        
        Trial Sections:
        - Eligibility: Inclusion and exclusion criteria for participants
        - Intervention: Treatment methods and procedures used
        - Results: Outcome measures and statistical findings
        - Adverse Events: Safety data and side effects reported
        
        Statistical Terms:
        - Percentage: Proportion expressed as parts per hundred
        - Confidence Interval: Range of values likely to contain the true value
        - P-value: Probability of observing results by chance
        - Hazard Ratio: Measure of relative risk over time
        
        Medical Concepts:
        - Efficacy: How well a treatment works under ideal conditions
        - Safety: Absence of harmful effects from treatment
        - Primary Endpoint: Main outcome measure of a trial
        - Secondary Endpoint: Additional outcome measures
        """
    ],
    vector_db=ChromaDb(
        collection="clinical_knowledge",
        path="./agno_knowledge_db"
    )
)

# Load the knowledge base
clinical_knowledge.load(recreate=False)  # Set to True to recreate

print("✅ Knowledge base configured")

## Agent Definitions

Now let's define our specialized agents using Agno's capabilities:

In [None]:
# 1. Clinical Research Assistant - Main coordinator with medical expertise
clinical_assistant = Agent(
    name="Clinical Research Assistant",
    model=model,
    storage=storage,
    knowledge=clinical_knowledge,
    description="Expert clinical researcher specializing in trial analysis and medical reasoning",
    instructions=[
        "You are a Clinical Research Assistant with deep expertise in clinical trials.",
        "Your primary role is to analyze clinical trial data and statements from a medical perspective.",
        "Focus on medical accuracy, clinical significance, and evidence-based reasoning.",
        "Consider the clinical context and medical plausibility of statements.",
        "Identify key medical concepts, terminology, and their implications.",
        "Always ground your analysis in the actual trial data provided.",
        "Provide clear medical reasoning for your assessments."
    ],
    show_tool_calls=False,
    markdown=True
)

print("✅ Clinical Research Assistant created")

In [None]:
# 2. Data Analyst Agent - Specialized in numerical and statistical analysis
data_analyst = Agent(
    name="Data Analyst",
    model=model,
    storage=storage,
    description="Statistical analyst specializing in clinical trial data analysis",
    instructions=[
        "You are a Data Analyst expert in statistical analysis of clinical trials.",
        "Your role is to analyze numerical claims, statistics, and quantitative relationships.",
        "Extract and verify all numerical values, percentages, and statistical measures.",
        "Perform calculations to validate numerical relationships and claims.",
        "Assess statistical significance and clinical meaningfulness of findings.",
        "Identify discrepancies between stated numbers and actual trial data.",
        "Be precise in your calculations and clearly show your work.",
        "Consider confidence intervals, error margins, and statistical uncertainty."
    ],
    show_tool_calls=False,
    markdown=True
)

print("✅ Data Analyst Agent created")

In [None]:
# 3. Logic Validator Agent - Ensures logical consistency
logic_validator = Agent(
    name="Logic Validator",
    model=model,
    storage=storage,
    description="Logic expert specializing in reasoning validation and consistency checking",
    instructions=[
        "You are a Logic Validator expert in logical reasoning and consistency.",
        "Your role is to validate the logical structure and coherence of claims.",
        "Analyze cause-and-effect relationships and logical implications.",
        "Check for internal consistency and logical contradictions.",
        "Evaluate the validity of inferences and conclusions.",
        "Identify logical fallacies or reasoning errors.",
        "Ensure that conclusions follow logically from premises.",
        "Focus on the logical soundness rather than medical or numerical details."
    ],
    show_tool_calls=False,
    markdown=True
)

print("✅ Logic Validator Agent created")

In [None]:
# 4. Decision Maker Agent - Final entailment classification
decision_maker = Agent(
    name="Decision Maker",
    model=model,
    storage=storage,
    description="Final decision authority for entailment classification",
    instructions=[
        "You are the Decision Maker responsible for final entailment classification.",
        "Your role is to synthesize analyses from all other agents and make the final decision.",
        "Consider medical, statistical, and logical evidence equally.",
        "Determine if a statement is ENTAILMENT (supported) or CONTRADICTION (refuted).",
        "ENTAILMENT: Statement is directly supported by the trial evidence.",
        "CONTRADICTION: Statement is refuted or contradicted by the trial evidence.",
        "Weigh evidence carefully and be conservative in your decisions.",
        "Provide clear reasoning for your final classification.",
        "Always end with: FINAL_DECISION: [Entailment/Contradiction]"
    ],
    show_tool_calls=False,
    markdown=True
)

print("✅ Decision Maker Agent created")

## Multi-Agent Analysis Pipeline

Create a coordinated pipeline that leverages all agents:

In [None]:
def agno_analysis_pipeline(statement: str, primary_id: str, secondary_id: Optional[str] = None, 
                          section_id: Optional[str] = None, verbose: bool = False) -> str:
    """
    Run the complete Agno multi-agent analysis pipeline.
    
    Args:
        statement: The natural language statement to analyze
        primary_id: Primary clinical trial ID
        secondary_id: Secondary trial ID for comparison statements
        section_id: Relevant section of the trial
        verbose: Whether to print intermediate results
        
    Returns:
        Final decision: 'Entailment' or 'Contradiction'
    """
    
    try:
        # Step 1: Load and format clinical trial data
        primary_data = load_clinical_trial(primary_id)
        secondary_data = None
        if secondary_id:
            secondary_data = load_clinical_trial(secondary_id)
        
        # Format trial data for analysis
        primary_formatted = format_trial_data(primary_data, section_id)
        secondary_formatted = None
        if secondary_data:
            secondary_formatted = format_trial_data(secondary_data, section_id)
        
        # Step 2: Create analysis prompt
        analysis_prompt = f"""
Analyze the following statement against the clinical trial evidence:

STATEMENT: "{statement}"

PRIMARY TRIAL EVIDENCE:
{primary_formatted}

{f'SECONDARY TRIAL EVIDENCE:\n{secondary_formatted}' if secondary_formatted else ''}

Please provide your expert analysis from your domain perspective.
        """.strip()
        
        if verbose:
            print(f"📄 Analyzing: {statement[:100]}...")
            print(f"🏥 Primary Trial: {primary_id}")
            if secondary_id:
                print(f"🏥 Secondary Trial: {secondary_id}")
        
        # Step 3: Clinical Research Assistant Analysis
        clinical_response = clinical_assistant.run(analysis_prompt)
        clinical_analysis = clinical_response.content
        
        if verbose:
            print(f"🩺 Clinical Analysis: Complete")
        
        # Step 4: Data Analyst Analysis
        data_response = data_analyst.run(analysis_prompt)
        data_analysis = data_response.content
        
        if verbose:
            print(f"📊 Data Analysis: Complete")
        
        # Step 5: Logic Validator Analysis
        logic_response = logic_validator.run(analysis_prompt)
        logic_analysis = logic_response.content
        
        if verbose:
            print(f"🧠 Logic Validation: Complete")
        
        # Step 6: Decision Making
        decision_prompt = f"""
Based on the following expert analyses, make the final entailment decision:

ORIGINAL STATEMENT: "{statement}"

CLINICAL RESEARCH ASSISTANT ANALYSIS:
{clinical_analysis}

DATA ANALYST ANALYSIS:
{data_analysis}

LOGIC VALIDATOR ANALYSIS:
{logic_analysis}

Synthesize these analyses and provide your final decision: Entailment or Contradiction?
        """.strip()
        
        decision_response = decision_maker.run(decision_prompt)
        final_analysis = decision_response.content
        
        # Step 7: Extract final decision
        if "FINAL_DECISION: Entailment" in final_analysis:
            decision = "Entailment"
        elif "FINAL_DECISION: Contradiction" in final_analysis:
            decision = "Contradiction"
        else:
            # Fallback parsing
            if "entailment" in final_analysis.lower() and "contradiction" not in final_analysis.lower():
                decision = "Entailment"
            else:
                decision = "Contradiction"
        
        if verbose:
            print(f"⚖️ Final Decision: {decision}")
            print("-" * 50)
        
        return decision
        
    except Exception as e:
        if verbose:
            print(f"❌ Error in Agno pipeline: {e}")
        return "Contradiction"  # Conservative fallback

print("✅ Agno analysis pipeline ready")

## Test Example

Let's test our Agno system with a sample case:

In [None]:
# Test with a sample statement
test_statement = "there is a 13.2% difference between the results from the two the primary trial cohorts"
test_primary_id = "NCT00066573"

print(f"Testing Agno system with statement:")
print(f"'{test_statement}'")
print(f"Primary trial: {test_primary_id}")
print("\n" + "="*80)

# Run the analysis with verbose output
result = agno_analysis_pipeline(
    statement=test_statement,
    primary_id=test_primary_id,
    section_id="Results",
    verbose=True
)

print(f"\n🎯 AGNO RESULT: {result}")
print("="*80)

## Evaluation on Training Data

Let's evaluate our Agno system on training data:

In [None]:
# Load training data
train_data = load_dataset("training_data/train.json")
print(f"Loaded {len(train_data)} training examples")

# Evaluate on a sample (adjust sample_size as needed)
sample_size = 25
examples = list(train_data.items())[:sample_size]

print(f"\nEvaluating Agno system on {len(examples)} examples...")

results = []
correct = 0

for i, (uuid, example) in enumerate(tqdm(examples, desc="Agno Processing")):
    try:
        statement = example.get("Statement")
        primary_id = example.get("Primary_id")
        secondary_id = example.get("Secondary_id")
        section_id = example.get("Section_id")
        expected = example.get("Label")
        
        if not statement or not primary_id:
            results.append({
                "uuid": uuid,
                "expected": expected,
                "predicted": "SKIPPED",
                "correct": False
            })
            continue
        
        # Get prediction from Agno system
        predicted = agno_analysis_pipeline(
            statement=statement,
            primary_id=primary_id,
            secondary_id=secondary_id,
            section_id=section_id,
            verbose=False
        )
        
        # Check if correct
        is_correct = (predicted.strip() == expected.strip())
        if is_correct:
            correct += 1
            
        results.append({
            "uuid": uuid,
            "statement": statement[:100] + "..." if len(statement) > 100 else statement,
            "expected": expected,
            "predicted": predicted,
            "correct": is_correct
        })
        
        status = "✅" if is_correct else "❌"
        print(f"Example {i+1:2d}: {expected:12} -> {predicted:12} {status}")
        
    except Exception as e:
        print(f"Error processing example {i+1}: {e}")
        results.append({
            "uuid": uuid,
            "expected": expected,
            "predicted": "ERROR",
            "correct": False
        })

# Calculate accuracy
accuracy = correct / len(examples) if examples else 0
print(f"\n📊 Agno Results:")
print(f"Accuracy: {accuracy:.2%} ({correct}/{len(examples)})")

# Store results for comparison
agno_results = results.copy()

## Memory and Knowledge Analysis

Let's analyze how Agno's memory and knowledge features work:

In [None]:
# Demonstrate memory capabilities
print("🧠 Testing Agno Memory Capabilities:")
print("=" * 50)

# Test conversation memory
memory_test_1 = clinical_assistant.run("Remember that NCT00066573 is about breast cancer treatment comparing exemestane and anastrozole.")
print(f"Memory setup: {memory_test_1.content[:100]}...")

memory_test_2 = clinical_assistant.run("What trial were we just discussing?")
print(f"Memory recall: {memory_test_2.content[:200]}...")

# Test knowledge base usage
print("\n📚 Testing Knowledge Base Integration:")
print("=" * 50)

knowledge_test = clinical_assistant.run("What is the difference between entailment and contradiction in clinical trial analysis?")
print(f"Knowledge response: {knowledge_test.content[:300]}...")

print("\n✅ Memory and knowledge features demonstrated")

## Generate Submission File

Let's generate a submission file using our Agno system:

In [None]:
def generate_agno_submission(test_file="test.json", output_file="agno_submission.json", sample_size=None):
    """
    Generate submission file using Agno system.
    
    Args:
        test_file: Path to test data
        output_file: Output submission file
        sample_size: Number of examples to process (None for all)
    """
    
    # Load test data
    test_data = load_dataset(test_file)
    if not test_data:
        print(f"❌ Could not load test data from {test_file}")
        return
    
    examples = list(test_data.items())
    if sample_size:
        examples = examples[:sample_size]
        
    print(f"🚀 Generating Agno predictions for {len(examples)} examples...")
    
    submission = {}
    
    for i, (uuid, example) in enumerate(tqdm(examples, desc="Agno Processing")):
        try:
            statement = example.get("Statement")
            primary_id = example.get("Primary_id")
            secondary_id = example.get("Secondary_id")
            section_id = example.get("Section_id")
            
            if not statement or not primary_id:
                submission[uuid] = {"Prediction": "Contradiction"}  # Default fallback
                continue
                
            # Get prediction from Agno system
            prediction = agno_analysis_pipeline(
                statement=statement,
                primary_id=primary_id,
                secondary_id=secondary_id,
                section_id=section_id,
                verbose=False
            )
            
            submission[uuid] = {"Prediction": prediction}
            
        except Exception as e:
            print(f"Error processing {uuid}: {e}")
            submission[uuid] = {"Prediction": "Contradiction"}  # Conservative fallback
    
    # Save submission file
    with open(output_file, "w", encoding="utf-8") as f:
        json.dump(submission, f, indent=2)
    
    print(f"✅ Agno submission saved to {output_file}")
    return submission

# Generate submission for a small sample
agno_submission = generate_agno_submission(
    test_file="test.json", 
    output_file="agno_submission.json",
    sample_size=10  # Adjust as needed
)

print(f"Generated predictions for {len(agno_submission)} examples")

## Error Analysis

Let's analyze errors to understand system performance:

In [None]:
# Analyze incorrect predictions
incorrect_results = [r for r in agno_results if not r["correct"] and r["predicted"] not in ["SKIPPED", "ERROR"]]

print(f"\n🔍 Agno Error Analysis ({len(incorrect_results)} incorrect predictions):")
print("=" * 80)

# Group errors by type
entailment_to_contradiction = [r for r in incorrect_results if r["expected"] == "Entailment" and r["predicted"] == "Contradiction"]
contradiction_to_entailment = [r for r in incorrect_results if r["expected"] == "Contradiction" and r["predicted"] == "Entailment"]

print(f"Entailment -> Contradiction errors: {len(entailment_to_contradiction)}")
print(f"Contradiction -> Entailment errors: {len(contradiction_to_entailment)}")

# Show some examples
print("\nSample errors:")
for i, result in enumerate(incorrect_results[:3]):
    print(f"\nError #{i+1}:")
    print(f"Statement: {result['statement']}")
    print(f"Expected: {result['expected']} | Predicted: {result['predicted']}")
    print("-" * 40)

## Conclusion and Insights

### Agno Framework Strengths:
1. **Full-stack capabilities**: Comprehensive agent development environment
2. **Built-in memory**: Persistent conversation context and learning
3. **Knowledge integration**: RAG capabilities with vector storage
4. **High performance**: Optimized for production workloads
5. **Rich tooling**: Extensive built-in tools and integrations
6. **Enterprise ready**: Designed for large-scale applications

### Key Features Demonstrated:
- **Multi-agent coordination**: Clinical Assistant, Data Analyst, Logic Validator, Decision Maker
- **Knowledge base**: Domain-specific clinical trial knowledge for RAG
- **Memory persistence**: Conversation history and context retention
- **Storage integration**: SQLite storage for agent state management
- **Structured analysis**: Systematic approach to complex reasoning tasks

### Architecture Benefits:
- **Scalable design**: Easy to add new agents and capabilities
- **Knowledge management**: Built-in RAG for domain expertise
- **Memory continuity**: Persistent context across conversations
- **Production ready**: Enterprise-grade features and performance
- **Comprehensive tooling**: Rich ecosystem for agent development

### Optimization Opportunities:
1. **Knowledge base expansion**: Add more clinical trial domain knowledge
2. **Memory optimization**: Fine-tune memory retention and retrieval
3. **Agent specialization**: Enhance domain expertise for each agent
4. **Tool integration**: Leverage additional Agno tools and capabilities
5. **Performance tuning**: Optimize for speed vs. accuracy trade-offs

### When to Use Agno:
- Applications requiring persistent memory and learning
- Systems needing knowledge base integration (RAG)
- Enterprise applications with complex agent workflows
- Projects requiring comprehensive tooling and infrastructure
- Use cases where agent state persistence is important

### Agno vs Other Frameworks:
- **vs AutoGen**: Better for memory/knowledge, AutoGen better for conversations
- **vs Atomic Agents**: More features but heavier, Atomic better for pure speed
- **vs LangChain**: More agent-focused, LangChain broader ecosystem

Agno excels in scenarios requiring persistent memory, knowledge integration, and enterprise-grade agent management, making it ideal for complex clinical analysis applications where context retention and domain expertise are crucial.