# SAFE-Gate Quickstart Demo

This notebook demonstrates the core functionality of SAFE-Gate:
- Loading synthetic test data
- Running SAFE-Gate classification
- Examining audit trails
- Comparing with baseline methods

**Paper:** Tritham & Namahoot (2026). SAFE-Gate: Safety-first Abstention-enabled Formal triage Engine with parallel GATEs

In [None]:
import sys
sys.path.insert(0, '../src')

import json
from pathlib import Path
from safegate import SAFEGate
from baselines import ESIGuidelines, SingleXGBoost, EnsembleAverage, ConfidenceThreshold

## 1. Load Test Data

In [None]:
# Load test dataset (804 cases)
test_file = Path('../data/synthetic/test/synthetic_test_804.json')
with open(test_file) as f:
    test_cases = json.load(f)

print(f"Loaded {len(test_cases)} test cases")

# Show example case
example = test_cases[0]
print(f"\nExample patient: {example['patient_id']}")
print(f"Ground truth: {example['ground_truth_tier']}")
print(f"Age: {example['age']}, BP: {example['systolic_bp']}, HR: {example['heart_rate']}")

## 2. SAFE-Gate Classification

In [None]:
# Initialize SAFE-Gate
safegate = SAFEGate()
print(f"SAFE-Gate initialized with {len(safegate.gates)} gates")

# Classify example patient
result = safegate.classify(test_cases[0], patient_id=test_cases[0]['patient_id'])

print(f"\nClassification Result:")
print(f"  Final Tier: {result['final_tier']}")
print(f"  Confidence: {result['confidence']:.2f}")
print(f"  Enforcing Gate: {result['enforcing_gate']}")
print(f"  Latency: {result['latency_ms']:.2f} ms")

## 3. Audit Trail Examination

In [None]:
# Get full audit trail
result_with_audit = safegate.classify(test_cases[10], return_audit_trail=True)

# Print clinical report
safegate.print_audit_trail(result_with_audit['audit_trail'])

## 4. Baseline Comparison

In [None]:
# Initialize baseline methods
baselines = {
    'ESI Guidelines': ESIGuidelines(),
    'Single XGBoost': SingleXGBoost(),
    'Ensemble Average': EnsembleAverage(),
    'Confidence Threshold': ConfidenceThreshold()
}

# Compare on first 100 test cases
comparison_results = []

for i, case in enumerate(test_cases[:100]):
    ground_truth = case['ground_truth_tier']
    
    # SAFE-Gate
    sg_result = safegate.classify(case, return_audit_trail=False)
    
    # Baselines
    esi_tier, _ = baselines['ESI Guidelines'].classify(case)
    xgb_tier, _ = baselines['Single XGBoost'].classify(case)
    ens_tier, _ = baselines['Ensemble Average'].classify(case)
    conf_tier, _ = baselines['Confidence Threshold'].classify(case)
    
    comparison_results.append({
        'ground_truth': ground_truth,
        'SAFE-Gate': sg_result['final_tier'],
        'ESI': esi_tier,
        'XGBoost': xgb_tier,
        'Ensemble': ens_tier,
        'Confidence': conf_tier
    })

print(f"Compared {len(comparison_results)} cases across 5 methods")

## 5. Performance Metrics

In [None]:
import pandas as pd

# Calculate accuracy for each method
methods = ['SAFE-Gate', 'ESI', 'XGBoost', 'Ensemble', 'Confidence']
accuracies = {}

for method in methods:
    correct = sum(1 for r in comparison_results if r[method] == r['ground_truth'])
    accuracies[method] = correct / len(comparison_results) * 100

# Display results
df = pd.DataFrame({
    'Method': methods,
    'Accuracy (%)': [accuracies[m] for m in methods]
})

print("\nAccuracy on 100 test cases:")
print(df.to_string(index=False))

## 6. Batch Processing

In [None]:
# Process all test cases in batch
print("Processing full test set...")
all_results = safegate.batch_classify(test_cases[:100], show_progress=True)

# Calculate overall statistics
r_star_count = sum(1 for r in all_results if r['final_tier'] == 'R*')
mean_latency = sum(r['latency_ms'] for r in all_results) / len(all_results)

print(f"\nBatch Processing Results:")
print(f"  Total cases: {len(all_results)}")
print(f"  R* (Abstention): {r_star_count} ({r_star_count/len(all_results)*100:.1f}%)")
print(f"  Mean latency: {mean_latency:.2f} ms")

## Summary

This notebook demonstrated:
1. ✓ SAFE-Gate classification with 6 parallel gates
2. ✓ Complete audit trail generation
3. ✓ Comparison with 4 baseline methods
4. ✓ Batch processing capability

**Key Findings:**
- SAFE-Gate achieves 95.3% sensitivity on full test set
- Conservative merging provides 2.5% improvement over ensemble averaging
- Mean latency: 1.23ms (real-time performance)
- Zero theorem violations across all test cases

For full reproducibility, see:
- `01_data_generation.ipynb` - SynDX data generation
- `04_theorem_verification.ipynb` - Formal theorem verification
- `05_baseline_comparison.ipynb` - Complete baseline comparison