# SynDX Quick Start Tutorial

**⚠️ IMPORTANT NOTICE:**  
This is **preliminary work without clinical validation**.  
Do **NOT** use for clinical decision-making.  
All metrics are based on synthetic-to-synthetic validation only.

---

## Overview

This notebook demonstrates the basic usage of SynDX framework for generating synthetic vestibular disorder patient data.

### What SynDX Does

1. **Phase 1**: Extract 8,400 clinical archetypes from TiTrATE guidelines
2. **Phase 2**: Generate synthetic patients using XAI-driven synthesis
3. **Phase 3**: Validate statistical realism and diagnostic coherence

### Prerequisites

```bash
pip install -e .
```


In [None]:
# Import required libraries
import sys
sys.path.append('..')  # Add parent directory to path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Import SynDX modules
from syndx import SynDXPipeline
from syndx.phase1_knowledge import ArchetypeGenerator, TiTrATEFormalizer
from syndx.utils import DataLoader

# Set random seed for reproducibility
np.random.seed(42)

# Configure plotting
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

print("✓ Imports successful")
print("⚠️  WARNING: Preliminary work without clinical validation")

## Step 1: Initialize SynDX Pipeline

Create a pipeline with default parameters:
- 100 archetypes (reduced for quick demo)
- NMF components: 20
- Differential privacy ε = 1.0

In [None]:
# Initialize pipeline with small dataset for quick demo
pipeline = SynDXPipeline(
    n_archetypes=100,      # Reduced from 8400 for demo
    nmf_components=20,
    vae_latent_dim=50,
    epsilon=1.0,
    random_seed=42
)

print("SynDX Pipeline initialized:")
print(f"  Target archetypes: {pipeline.n_archetypes}")
print(f"  NMF components (r): {pipeline.nmf_components}")
print(f"  VAE latent dim (d): {pipeline.vae_latent_dim}")
print(f"  Privacy budget (ε): {pipeline.epsilon}")

## Step 2: Phase 1 - Extract Clinical Archetypes

Generate computational archetypes from TiTrATE diagnostic framework:

In [None]:
# Extract archetypes from clinical guidelines
archetypes = pipeline.extract_archetypes(
    guidelines=['titrate', 'barany_icvd_2025']
)

print(f"\nGenerated {len(archetypes)} valid archetypes")
print(f"Archetype matrix shape: {pipeline.archetype_matrix.shape}")

### Visualize Archetype Statistics

In [None]:
# Get statistics
stats = pipeline.archetype_generator.get_statistics()

# Plot diagnosis distribution
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Diagnosis distribution
dx_dist = pd.Series(stats['diagnosis_distribution']).sort_values(ascending=False)
axes[0].bar(range(len(dx_dist)), dx_dist.values)
axes[0].set_xticks(range(len(dx_dist)))
axes[0].set_xticklabels(dx_dist.index, rotation=45, ha='right')
axes[0].set_title('Diagnosis Distribution')
axes[0].set_ylabel('Count')

# Timing distribution
timing_dist = pd.Series(stats['timing_distribution'])
axes[1].pie(timing_dist.values, labels=timing_dist.index, autopct='%1.1f%%')
axes[1].set_title('Timing Pattern Distribution')

# Age distribution
ages = [arch.age for arch in archetypes]
axes[2].hist(ages, bins=20, edgecolor='black', alpha=0.7)
axes[2].set_title('Age Distribution')
axes[2].set_xlabel('Age (years)')
axes[2].set_ylabel('Frequency')
axes[2].axvline(stats['age_stats']['mean'], color='red', 
                linestyle='--', label=f"Mean: {stats['age_stats']['mean']:.1f}")
axes[2].legend()

plt.tight_layout()
plt.savefig('figures/archetype_statistics.png', dpi=300, bbox_inches='tight')
plt.show()

print(f"\nAge statistics:")
print(f"  Mean: {stats['age_stats']['mean']:.1f} years")
print(f"  Std:  {stats['age_stats']['std']:.1f} years")
print(f"  Range: {stats['age_stats']['min']}-{stats['age_stats']['max']} years")

## Step 3: Phase 2 - Generate Synthetic Patients

Generate 1,000 synthetic patients using XAI-driven synthesis:

In [None]:
# Generate synthetic patients
n_patients = 1000
synthetic_patients = pipeline.generate(
    n_patients=n_patients,
    convergence_threshold=0.05
)

print(f"\nGenerated {len(synthetic_patients)} synthetic patients")
print(f"\nFirst 5 patients:")
print(synthetic_patients.head())

## Step 4: Phase 3 - Validate Synthetic Data

Compute statistical realism metrics:

In [None]:
# Validate synthetic data
validation_results = pipeline.validate(
    synthetic_patients,
    metrics=['statistical', 'diagnostic', 'xai']
)

print("\nValidation Results:")
print("="*60)
for metric, value in validation_results.get('statistical', {}).items():
    if isinstance(value, float):
        print(f"{metric:20s}: {value:.4f}")
    else:
        print(f"{metric:20s}: {value}")

## Step 5: Export to FHIR Format

Export synthetic patients to HL7 FHIR R4 format:

In [None]:
# Create output directory
output_dir = Path('../outputs/synthetic_patients')
output_dir.mkdir(parents=True, exist_ok=True)

# Export to FHIR
fhir_path = output_dir / 'synthetic_patients_fhir.json'
pipeline.export_fhir(synthetic_patients, str(fhir_path))

# Also save as CSV
csv_path = output_dir / 'synthetic_patients.csv'
synthetic_patients.to_csv(csv_path, index=False)

print(f"\nSynthetic patients exported:")
print(f"  FHIR: {fhir_path}")
print(f"  CSV:  {csv_path}")

## Summary

This quick start demonstrated:

1. ✓ Initializing SynDX pipeline
2. ✓ Extracting clinical archetypes from TiTrATE guidelines
3. ✓ Generating synthetic patients
4. ✓ Validating statistical realism
5. ✓ Exporting to FHIR format

### Next Steps

- See `02_Full_Pipeline_Tutorial.ipynb` for complete 8,400 archetype generation
- See `03_Statistical_Validation.ipynb` for detailed validation analysis
- See `04_Publication_Figures.ipynb` for high-resolution figures

### ⚠️ Important Reminders

- **No clinical validation**: All metrics are synthetic-to-synthetic
- **Not for clinical use**: This is a research tool only
- **Prospective validation required**: Real patient studies needed

---

**Citation:**

```bibtex
@article{tritham2025syndx,
  title={SynDX: Explainable AI-Driven Synthetic Data Generation},
  author={Tritham, Chatchai and Namahoot, Chakkrit Snae},
  journal={IEEE Access},
  year={2025},
  note={Preliminary work without clinical validation}
}
```