# Conformal-Drift Quick Start

This notebook demonstrates how to use Conformal-Drift to audit conformal prediction guardrails under distribution shift.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/debu-sinha/conformaldrift/blob/main/examples/01_quickstart.ipynb)

In [None]:
# Install conformal-drift
!pip install conformal-drift -q

## 1. Generate Synthetic Data

We'll create synthetic calibration and test data to demonstrate the audit process.

In [None]:
import numpy as np
np.random.seed(42)

# Generate calibration nonconformity scores (from a well-calibrated model)
n_calibration = 500
calibration_scores = np.random.beta(2, 5, n_calibration)  # Scores typically < 0.5

# Generate test scores (will be shifted)
n_test = 200
test_scores = np.random.beta(2, 5, n_test)
test_labels = np.random.binomial(1, 0.9, n_test)  # 90% correct

print(f"Calibration scores: mean={calibration_scores.mean():.3f}, std={calibration_scores.std():.3f}")
print(f"Test scores: mean={test_scores.mean():.3f}, std={test_scores.std():.3f}")

## 2. Initialize the Auditor

In [None]:
from conformal_drift import ConformalDriftAuditor

# Initialize with calibration scores and target miscoverage rate
auditor = ConformalDriftAuditor(
    calibration_scores=calibration_scores,
    alpha=0.1  # 90% target coverage
)

print(f"Auditor initialized with {len(calibration_scores)} calibration samples")
print(f"Target coverage: {1 - 0.1:.0%}")

## 3. Run Baseline Audit (No Shift)

First, verify that coverage is close to nominal on unshifted data.

In [None]:
# Prepare test data
test_data = {
    'scores': test_scores,
    'labels': test_labels
}

# Run baseline audit
baseline_results = auditor.audit(
    test_data=test_data,
    shift_intensity=[0.0]  # No shift
)

print(f"Baseline coverage: {baseline_results.coverage[0]:.3f}")
print(f"Expected: ~0.90")

## 4. Run Audit Under Distribution Shift

Apply graduated shift intensities to observe coverage degradation.

In [None]:
# Run audit with temporal shift
shift_intensities = np.linspace(0, 1, 11)  # 0%, 10%, ..., 100%

results = auditor.audit(
    test_data=test_data,
    shift_type="temporal",
    shift_intensity=shift_intensities
)

# Print coverage at each shift level
print("Coverage under distribution shift:")
print("-" * 40)
for intensity, coverage in zip(results.shift_intensities, results.coverage):
    gap = 0.9 - coverage
    status = "✓" if gap < 0.05 else "⚠" if gap < 0.1 else "✗"
    print(f"Shift {intensity:5.1%}: Coverage = {coverage:.3f} (gap = {gap:+.3f}) {status}")

## 5. Visualize Coverage Degradation

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))

# Plot coverage curve
ax.plot(results.shift_intensities, results.coverage, 'b-o', 
        linewidth=2, markersize=8, label='Empirical Coverage')

# Nominal coverage line
ax.axhline(y=0.9, color='r', linestyle='--', linewidth=2, label='Nominal 90%')

# Tolerance band
ax.fill_between([0, 1], [0.85, 0.85], [0.95, 0.95], alpha=0.2, color='green', label='±5% Tolerance')

ax.set_xlabel('Shift Intensity', fontsize=12)
ax.set_ylabel('Empirical Coverage', fontsize=12)
ax.set_title('Coverage Degradation Under Distribution Shift', fontsize=14)
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.legend(loc='lower left')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('coverage_curve.png', dpi=150)
plt.show()

## 6. Identify Critical Failure Points

In [None]:
# Find where coverage drops below acceptable threshold
threshold = 0.85  # 5% below nominal

critical_indices = np.where(np.array(results.coverage) < threshold)[0]

if len(critical_indices) > 0:
    critical_intensity = results.shift_intensities[critical_indices[0]]
    print(f"⚠ CRITICAL: Coverage drops below {threshold:.0%} at shift intensity {critical_intensity:.0%}")
    print(f"  Coverage at critical point: {results.coverage[critical_indices[0]]:.3f}")
else:
    print(f"✓ Coverage stays above {threshold:.0%} across all shift levels")

print(f"\nMaximum coverage gap: {results.max_coverage_gap:.3f}")

## 7. Multi-Shift Analysis

Compare coverage under different shift types.

In [None]:
# Run audit under different shift types
shift_types = ['temporal', 'semantic', 'lexical']
all_results = {}

for shift_type in shift_types:
    results = auditor.audit(
        test_data=test_data,
        shift_type=shift_type,
        shift_intensity=np.linspace(0, 1, 6)
    )
    all_results[shift_type] = results
    print(f"{shift_type}: max_gap={results.max_coverage_gap:.3f}")

In [None]:
# Plot comparison
fig, ax = plt.subplots(figsize=(10, 6))

colors = {'temporal': 'blue', 'semantic': 'green', 'lexical': 'orange'}

for shift_type, results in all_results.items():
    ax.plot(results.shift_intensities, results.coverage, 
            '-o', color=colors[shift_type], linewidth=2, 
            markersize=6, label=shift_type.capitalize())

ax.axhline(y=0.9, color='r', linestyle='--', linewidth=2, label='Nominal')
ax.set_xlabel('Shift Intensity', fontsize=12)
ax.set_ylabel('Coverage', fontsize=12)
ax.set_title('Coverage Under Different Shift Types', fontsize=14)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.savefig('shift_comparison.png', dpi=150)
plt.show()

## Summary

This notebook demonstrated:
1. Setting up a Conformal-Drift auditor with calibration scores
2. Running baseline audits to verify coverage
3. Observing coverage degradation under distribution shift
4. Visualizing coverage curves
5. Identifying critical failure points
6. Comparing multiple shift types

For more examples, see:
- `02_rag_audit.ipynb` - Auditing RAG hallucination detection
- `03_mlflow_tracking.ipynb` - Tracking audits with MLflow