# NeuroFetal AI — SOTA Training Pipeline

**Version 4.0** — TimeGAN Augmentation Phase (AUC 0.90+ Target)

This notebook orchestrates the full SOTA pipeline on Google Colab with GPU acceleration.

### V4.0 Upgrade: TimeGAN Replaces SMOTE
Instead of tabular SMOTE (linear interpolation), V4.0 uses a **1D Convolutional WGAN-GP** to generate
synthetic pathological FHR/UC traces that preserve temporal dynamics. The generator was trained in
`TimeGAN_Colab.ipynb` and produced 1,410 synthetic traces saved to `Datasets/synthetic/`.

### Pipeline Steps
| # | Phase | Script | Expected AUC Lift |
|---|-------|--------|-------------------|
| 1 | Setup | Clone repo (`feat/v4.0-timegan`), install deps | — |
| 2 | Data Ingestion | `data_ingestion.py` — 18 features, pH 7.15, quality filter | +5–8 pts |
| 3 | SSL Pretraining | `pretrain.py` — Masked Autoencoder on FHR | +2–3 pts |
| 4 | Primary Training (TimeGAN) | `train.py --augmentation timegan` | +3–5 pts |
| 4b | Primary Training (SMOTE baseline) | `train.py --augmentation smote` | baseline |
| 5 | Ensemble Training | `train_diverse_ensemble.py` — InceptionNet + XGB + Stacking | +3–5 pts |
| 6 | Evaluation | `evaluate_ensemble.py` — Temp scaling, TTA, calibration | +1–2 pts |
| 7 | Deployment | `convert_to_tflite.py` — TFLite & auto-push | — |

## 1. Setup Environment

In [None]:
from google.colab import userdata
import os

# 1. GitHub Authentication
GITHUB_REPO = "Krishna200608/NeuroFetal-AI"

try:
    GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
    print("✓ GitHub Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Error loading GITHUB_TOKEN from Secrets. Falling back to manual input.")
    from getpass import getpass
    GITHUB_TOKEN = getpass("Enter GitHub Personal Access Token (PAT): ")

os.environ['GITHUB_TOKEN'] = GITHUB_TOKEN
os.environ['GITHUB_REPO'] = GITHUB_REPO

In [None]:
# 2. Clone Repository & Checkout V4.0 Branch
import shutil
import os

# Reset to /content before deleting the repo folder
try:
    os.chdir("/content")
except:
    pass

# Clean up any previous clone
if os.path.exists("/content/NeuroFetal-AI"):
    shutil.rmtree("/content/NeuroFetal-AI")

print("Cloning repository...")
!git clone https://{GITHUB_TOKEN}@github.com/{GITHUB_REPO}.git

os.chdir("/content/NeuroFetal-AI")

# Checkout V4.0 TimeGAN branch
!git checkout feat/v4.0-timegan
!git pull origin feat/v4.0-timegan
print("✓ Cloned and checked out feat/v4.0-timegan!")

### 1.5 Git Credentials

In [None]:
!git config --global user.email "krishnasikheriya001@gmail.com"
!git config --global user.name "Krishna200608"
print("✓ Git credentials set.")

### 1.6 Install Dependencies
Installs all packages required for the full SOTA pipeline (including XGBoost/LightGBM for ensemble).

In [None]:
print("Installing libraries...")
!pip install -q wfdb shap scipy imbalanced-learn pyngrok filterpy \
    scikit-learn matplotlib seaborn pandas numpy tensorflow \
    streamlit plotly python-dotenv xgboost lightgbm
print("✓ Dependencies installed.")

---
## 2. Data Ingestion (Phase 1–2)

Processes raw `.dat`/`.hea` files into clean `.npy` arrays.

**SOTA enhancements:**
- 18 tabular features (13 signal-derived: STV, LTV, accels/decels, baseline, variability…)
- FHR normalization excluding 0-gaps
- pH threshold relaxed to 7.15 (FIGO)
- Signal quality filter (skip >50% loss)
- Feature standardization (Z-score) with saved scaler

In [None]:
!python Code/scripts/data_ingestion.py

---
## 3. Self-Supervised Pretraining

Train the Masked Autoencoder (MAE) on unlabelled FHR data to learn robust temporal representations.

Saves encoder weights → `Code/models/pretrained_fhr_encoder.weights.keras`

In [None]:
!python Code/scripts/pretrain.py

---
## 4. Primary Model Training (V4.0 TimeGAN)

Train the **AttentionFusionResNet** using 5-Fold Cross-Validation with **TimeGAN augmentation**.

**V4.0 upgrade:** Replaces tabular SMOTE with pre-generated synthetic pathological traces from WGAN-GP.

**SOTA enhancements (carried from V3.0):**
- 200 epochs with cosine annealing + warmup
- Focal Loss (α=0.65, γ=2.0)
- 4x data augmentation (SpecAugment + CutMix + time-warp + jitter + mixup)
- AdamW with weight decay 5e-4
- SSL pretrained backbone
- Early stopping patience = 40

In [None]:
# Pull latest changes from V4.0 branch
!git pull origin feat/v4.0-timegan

In [None]:
# V4.0: TimeGAN augmentation (default)
!python Code/scripts/train.py --augmentation timegan --epochs 150

### 4b. SMOTE Baseline Comparison (Optional)

Run this cell to compare TimeGAN vs SMOTE augmentation. Skip if you only need TimeGAN results.

In [None]:
# Optional: Run SMOTE baseline for comparison
# !python Code/scripts/train.py --augmentation smote --epochs 150

In [None]:
# Auto-push trained models to GitHub
import os

for fold in range(1, 6):
    model_path = f"Code/models/enhanced_model_fold_{fold}.keras"
    if os.path.exists(model_path):
        print(f"Pushing model for Fold {fold}...")
        !git add {model_path}
        !git commit -m "Auto-save: Trained SOTA model Fold {fold}"
        !git push origin main
        print(f"✓ Fold {fold} pushed.")
    else:
        print(f"⚠️ Not found: {model_path}")

In [None]:
!git pull origin main

---
## 5. Diverse Ensemble Training (Phase 5)

Train three diverse model families and combine with a stacking meta-learner:

1. **AttentionFusionResNet** — primary (already trained above)
2. **1D-InceptionNet** — multi-scale temporal patterns (kernel 5/15/40)
3. **XGBoost / LightGBM** — gradient boosting on tabular + CSP + FHR features

Out-of-fold predictions across 5 folds → Logistic Regression stacking

**Expected additional AUC lift: +3–5 pts**

In [None]:
!python Code/scripts/train_diverse_ensemble.py

In [None]:
# Push ensemble artifacts
import os

ensemble_files = [
    "Code/models/stacking_meta_learner.pkl",
    "Code/models/xgb_model.pkl",
]

# Also push any InceptionNet fold models
for fold in range(1, 6):
    inception_path = f"Code/models/inception_fold_{fold}.keras"
    if os.path.exists(inception_path):
        ensemble_files.append(inception_path)

pushed = []
for f in ensemble_files:
    if os.path.exists(f):
        !git add {f}
        pushed.append(f)

if pushed:
    !git commit -m "Auto-save: Diverse ensemble models (InceptionNet + XGB + meta-learner)"
    !git push origin main
    print(f"✓ Pushed {len(pushed)} ensemble artifacts.")
else:
    print("⚠️ No ensemble files found to push.")

---
## 6. Evaluation & Calibration (Phase 6)

**Stacking Ensemble Evaluation** with:
- Temperature scaling (Guo et al., 2017)
- Optimal threshold search (Youden's J / F1 / cost-sensitive)
- Enhanced 3-pass TTA (original + flip + noise)
- AUPRC reporting for imbalanced data

**Uncertainty Quantification** via MC Dropout.

In [None]:
print("\nRunning Stacking Ensemble Evaluation...")
!python Code/scripts/evaluate_ensemble.py

print("\nRunning Uncertainty Quantification (MC Dropout)...")
!python Code/scripts/evaluate_uncertainty.py

---
## 7. Launch Dashboard (Optional)

Run the Streamlit dashboard from Colab via **ngrok** tunnel.

> Requires `NGROK_AUTH_TOKEN` in Colab Secrets.

In [None]:
from google.colab import userdata

try:
    auth_token = userdata.get('NGROK_AUTH_TOKEN')
    print("✓ Ngrok Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Error loading NGROK_AUTH_TOKEN from Secrets. Falling back to manual input.")
    from getpass import getpass
    auth_token = getpass("Enter Ngrok Auth Token manually: ")

if auth_token:
    with open("Code/.env", "w") as f:
        f.write(f"NGROK_AUTH_TOKEN={auth_token}\n")

print("Launching Streamlit App...")
!python Code/run_app.py

---
## 8. Convert to TFLite & Auto-Push

Convert the best trained model to TFLite format and push to GitHub automatically.

In [None]:
!python Code/scripts/convert_to_tflite.py

In [None]:
# Push TFLite model
import os

tflite_path = "Code/models/tflite/neurofetal_model_quant_int8.tflite"
if os.path.exists(tflite_path):
    !git add {tflite_path}
    !git commit -m "Auto-save: TFLite model"
    !git push origin main
    print("✓ TFLite model pushed.")
else:
    print("⚠️ TFLite model not found.")

---
## ✅ Pipeline Complete

All 6 SOTA phases have been executed. Check the evaluation output above for final AUC and calibration metrics.