# Thyroid Clinical Decision Support System - Interactive Notebook

This notebook demonstrates the multi-agent architecture for thyroid risk assessment.

## System Components

1. **Agent 1**: Risk Scoring (ML-based prediction)
2. **Agent 2**: Medical Knowledge Retriever (RAG)
3. **Agent 3**: Reasoning & Explainability
4. **Agent 4**: Summary Generation

## Setup and Imports

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from main_system import ThyroidClinicalDecisionSupport
from agent1_risk_scoring import RiskScoringAgent
from agent2_knowledge_retriever import MedicalKnowledgeRetriever
from agent3_reasoning import ReasoningAgent
from agent4_summary import SummaryAgent

# Set display options
pd.set_option('display.max_columns', None)
sns.set_style('whitegrid')

print("✓ All modules imported successfully")

## Step 1: Create Synthetic Training Data

For demonstration, we'll create synthetic thyroid patient data.

In [None]:
def create_synthetic_data(n_samples=1000):
    """Create synthetic thyroid patient data"""
    np.random.seed(42)
    
    data = {
        'age': np.random.randint(18, 80, n_samples),
        'sex': np.random.choice(['m', 'f'], n_samples),
        'on_thyroxine': np.random.choice(['t', 'f'], n_samples, p=[0.3, 0.7]),
        'on_antithyroid_medication': np.random.choice(['t', 'f'], n_samples, p=[0.2, 0.8]),
        'sick': np.random.choice(['t', 'f'], n_samples, p=[0.1, 0.9]),
        'pregnant': np.random.choice(['t', 'f'], n_samples, p=[0.05, 0.95]),
        'thyroid_surgery': np.random.choice(['t', 'f'], n_samples, p=[0.15, 0.85]),
        'lithium': np.random.choice(['t', 'f'], n_samples, p=[0.05, 0.95]),
        'goitre': np.random.choice(['t', 'f'], n_samples, p=[0.2, 0.8]),
        'tumor': np.random.choice(['t', 'f'], n_samples, p=[0.1, 0.9]),
        'hypopituitary': np.random.choice(['t', 'f'], n_samples, p=[0.05, 0.95]),
        'psych': np.random.choice(['t', 'f'], n_samples, p=[0.1, 0.9]),
        'TSH_measured': np.random.choice(['t', 'f'], n_samples, p=[0.9, 0.1]),
        'TSH': np.random.choice(['t', 'f'], n_samples, p=[0.3, 0.7]),
        'T3_measured': np.random.choice(['t', 'f'], n_samples, p=[0.8, 0.2]),
        'T3': np.random.choice(['t', 'f'], n_samples, p=[0.2, 0.8]),
        'TT4_measured': np.random.choice(['t', 'f'], n_samples, p=[0.85, 0.15]),
        'TT4': np.random.choice(['t', 'f'], n_samples, p=[0.25, 0.75]),
        'T4U_measured': np.random.choice(['t', 'f'], n_samples, p=[0.7, 0.3]),
        'T4U': np.random.choice(['t', 'f'], n_samples, p=[0.15, 0.85]),
        'FTI_measured': np.random.choice(['t', 'f'], n_samples, p=[0.75, 0.25]),
        'FTI': np.random.choice(['t', 'f'], n_samples, p=[0.2, 0.8]),
        'referral_source': np.random.choice(['SVI', 'SVHC', 'other'], n_samples)
    }
    
    # Create synthetic target based on risk factors
    risk_scores = []
    for i in range(n_samples):
        score = 0
        if data['TSH'][i] == 't': score += 0.3
        if data['thyroid_surgery'][i] == 't': score += 0.2
        if data['on_antithyroid_medication'][i] == 't': score += 0.2
        if data['T3'][i] == 't': score += 0.15
        if data['goitre'][i] == 't': score += 0.15
        risk_scores.append(min(score + np.random.normal(0, 0.1), 1.0))
    
    data['class'] = ['positive' if score > 0.5 else 'negative' for score in risk_scores]
    
    return pd.DataFrame(data)

# Create dataset
df = create_synthetic_data(1000)
df.to_csv('synthetic_thyroid_data.csv', index=False)

print(f"Created dataset with {len(df)} patients")
print(f"\nClass distribution:")
print(df['class'].value_counts())
print(f"\nFirst few rows:")
df.head()

## Step 2: Train the System

Train multiple ML models and select the best one.

In [None]:
# Initialize system
system = ThyroidClinicalDecisionSupport()

# Train models
print("Training ML models...\n")
results = system.train_system(
    'synthetic_thyroid_data.csv',
    'thyroid_model.pkl'
)

print(f"\n{'='*60}")
print(f"Training Complete!")
print(f"Best Model: {results['best_model']}")
print(f"CV AUC-ROC: {results['cv_mean']:.4f} (+/- {results['cv_std']:.4f})")
print(f"{'='*60}")

## Step 3: Test Agent 2 - Medical Knowledge Retriever

Test the RAG system independently.

In [None]:
# Initialize knowledge retriever
knowledge_agent = MedicalKnowledgeRetriever()

# Test queries
test_queries = [
    "abnormal TSH thyroid surgery",
    "pregnancy thyroid dysfunction",
    "antithyroid medication monitoring"
]

for query in test_queries:
    print(f"\n{'='*60}")
    print(f"Query: {query}")
    print(f"{'='*60}")
    
    evidence = knowledge_agent.retrieve(query, top_k=2)
    
    for i, ev in enumerate(evidence, 1):
        print(f"\n[{i}] {ev.citation_id} - {ev.title}")
        print(f"    Relevance: {ev.relevance_score:.3f}")
        print(f"    Excerpt: {ev.snippet}")

## Step 4: Create Test Patients

Define sample patients with different risk profiles.

In [None]:
# High-risk patient
patient_high = {
    'age': 65, 'sex': 'f',
    'on_thyroxine': 'f', 'on_antithyroid_medication': 't',
    'sick': 'f', 'pregnant': 'f', 'thyroid_surgery': 't',
    'lithium': 'f', 'goitre': 't', 'tumor': 'f',
    'hypopituitary': 'f', 'psych': 'f',
    'TSH_measured': 't', 'TSH': 't',
    'T3_measured': 't', 'T3': 't',
    'TT4_measured': 't', 'TT4': 'f',
    'T4U_measured': 't', 'T4U': 'f',
    'FTI_measured': 't', 'FTI': 'f',
    'referral_source': 'SVHC'
}

# Low-risk patient
patient_low = {
    'age': 35, 'sex': 'm',
    'on_thyroxine': 'f', 'on_antithyroid_medication': 'f',
    'sick': 'f', 'pregnant': 'f', 'thyroid_surgery': 'f',
    'lithium': 'f', 'goitre': 'f', 'tumor': 'f',
    'hypopituitary': 'f', 'psych': 'f',
    'TSH_measured': 't', 'TSH': 'f',
    'T3_measured': 't', 'T3': 'f',
    'TT4_measured': 't', 'TT4': 'f',
    'T4U_measured': 't', 'T4U': 'f',
    'FTI_measured': 't', 'FTI': 'f',
    'referral_source': 'other'
}

print("Test patients created:")
print("1. High-risk: 65yo female, thyroid surgery, on medication, abnormal TSH & T3")
print("2. Low-risk: 35yo male, no history, all tests normal")

## Step 5: Complete Assessment - High Risk Patient

In [None]:
# Assess high-risk patient
assessment_high = system.assess_patient(patient_high)

# Generate full report
report_high = system.generate_report(assessment_high)

print(report_high)

## Step 6: Complete Assessment - Low Risk Patient

In [None]:
# Assess low-risk patient
assessment_low = system.assess_patient(patient_low)

# Generate full report
report_low = system.generate_report(assessment_low)

print(report_low)

## Step 7: Visualize Risk Scores

Compare risk assessments visually.

In [None]:
# Create visualization
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Risk scores comparison
patients = ['High-Risk Patient', 'Low-Risk Patient']
risk_scores = [
    assessment_high.risk_prediction.risk_score,
    assessment_low.risk_prediction.risk_score
]
risk_levels = [
    assessment_high.risk_prediction.risk_level.value,
    assessment_low.risk_prediction.risk_level.value
]

colors = ['red' if level == 'High' else 'orange' if level == 'Moderate' else 'green' 
          for level in risk_levels]

ax1.barh(patients, risk_scores, color=colors, alpha=0.7)
ax1.set_xlabel('Risk Score', fontsize=12)
ax1.set_title('Risk Score Comparison', fontsize=14, fontweight='bold')
ax1.set_xlim(0, 1)
ax1.axvline(0.4, color='orange', linestyle='--', alpha=0.5, label='Moderate threshold')
ax1.axvline(0.7, color='red', linestyle='--', alpha=0.5, label='High threshold')
ax1.legend()

# Confidence intervals
ci_low = [
    [assessment_high.risk_prediction.confidence_lower,
     assessment_high.risk_prediction.confidence_upper],
    [assessment_low.risk_prediction.confidence_lower,
     assessment_low.risk_prediction.confidence_upper]
]

for i, (patient, score, ci) in enumerate(zip(patients, risk_scores, ci_low)):
    ax2.plot([ci[0], ci[1]], [i, i], 'o-', linewidth=3, markersize=8, 
             color=colors[i], alpha=0.7)
    ax2.plot(score, i, 'D', markersize=10, color=colors[i])

ax2.set_yticks(range(len(patients)))
ax2.set_yticklabels(patients)
ax2.set_xlabel('Risk Score', fontsize=12)
ax2.set_title('Risk Scores with Confidence Intervals', fontsize=14, fontweight='bold')
ax2.set_xlim(0, 1)
ax2.grid(axis='x', alpha=0.3)

plt.tight_layout()
plt.savefig('risk_comparison.png', dpi=300, bbox_inches='tight')
plt.show()

print("\n✓ Visualization saved as 'risk_comparison.png'")

## Step 8: Save Reports

In [None]:
# Save reports to files
with open('high_risk_patient_report.txt', 'w') as f:
    f.write(report_high)

with open('low_risk_patient_report.txt', 'w') as f:
    f.write(report_low)

print("✓ Reports saved:")
print("  - high_risk_patient_report.txt")
print("  - low_risk_patient_report.txt")

## Conclusion

This notebook demonstrated:

1. ✅ **Training**: Multiple ML models with proper evaluation
2. ✅ **RAG System**: Context-aware medical guideline retrieval
3. ✅ **Risk Assessment**: Complete patient evaluation pipeline
4. ✅ **Explainability**: Transparent reasoning and evidence linkage
5. ✅ **Clinical Summaries**: Both doctor and patient-friendly outputs

### Key Takeaways:

- The system provides **evidence-based** recommendations with citations
- **Uncertainty** is explicitly quantified and communicated
- Outputs are **tailored** to different audiences (doctors vs patients)
- All predictions include **transparent reasoning chains**
- The system clearly states it is **decision support**, not diagnosis

### Next Steps:

- Validate with real clinical data
- Conduct clinical trials
- Obtain regulatory approvals
- Implement in clinical workflow
- Continuous monitoring and improvement