# NeuroFetal AI — SOTA Training Pipeline

**Version 4.0** — TimeGAN Augmentation Phase (AUC 0.90+ Target)

This notebook orchestrates the full SOTA pipeline on Google Colab with GPU acceleration.

### V4.0 Upgrade: TimeGAN Replaces SMOTE
Instead of tabular SMOTE (linear interpolation), V4.0 uses a **1D Convolutional WGAN-GP** to generate
synthetic pathological FHR/UC traces that preserve temporal dynamics. The generator was trained in
`TimeGAN_Colab.ipynb` and produced 1,410 synthetic traces saved to `Datasets/synthetic/`.

### Pipeline Steps
| # | Phase | Script | Expected AUC Lift |
|---|-------|--------|-------------------|
| 1 | Setup | Clone repo (`feat/v4.0-timegan`), install deps | — |
| 2 | Data Ingestion | `data_ingestion.py` — 18 features, pH 7.15, quality filter | +5–8 pts |
| 3 | SSL Pretraining | `pretrain.py` — Masked Autoencoder on FHR | +2–3 pts |
| 4 | Primary Training (TimeGAN) | `train.py --augmentation timegan` | +3–5 pts |
| 4b | Primary Training (SMOTE baseline) | `train.py --augmentation smote` | baseline |
| 5 | Ensemble Training | `train_diverse_ensemble.py` — InceptionNet + XGB + Stacking | +3–5 pts |
| 6 | Evaluation | `evaluate_ensemble.py` — Temp scaling, TTA, calibration | +1–2 pts |
| 7 | Deployment | `convert_to_tflite.py` — TFLite & auto-push | — |

## 1. Setup Environment

In [2]:
from google.colab import userdata
import os

# 1. GitHub Authentication
GITHUB_REPO = "Krishna200608/NeuroFetal-AI"

try:
    GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
    print("✓ GitHub Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Error loading GITHUB_TOKEN from Secrets. Falling back to manual input.")
    from getpass import getpass
    GITHUB_TOKEN = getpass("Enter GitHub Personal Access Token (PAT): ")

os.environ['GITHUB_TOKEN'] = GITHUB_TOKEN
os.environ['GITHUB_REPO'] = GITHUB_REPO

✓ GitHub Token loaded from Secrets.


In [3]:
# 2. Clone Repository & Checkout V4.0 Branch
import shutil
import os

# Reset to /content before deleting the repo folder
try:
    os.chdir("/content")
except:
    pass

# Clean up any previous clone
if os.path.exists("/content/NeuroFetal-AI"):
    shutil.rmtree("/content/NeuroFetal-AI")

print("Cloning repository...")
!git clone https://{GITHUB_TOKEN}@github.com/{GITHUB_REPO}.git

os.chdir("/content/NeuroFetal-AI")

# Checkout V4.0 TimeGAN branch
!git checkout feat/v4.0-timegan
!git pull origin feat/v4.0-timegan
print("✓ Cloned and checked out feat/v4.0-timegan!")

Cloning repository...
Cloning into 'NeuroFetal-AI'...
remote: Enumerating objects: 2361, done.[K
remote: Counting objects: 100% (142/142), done.[K
remote: Compressing objects: 100% (54/54), done.[K
remote: Total 2361 (delta 107), reused 99 (delta 88), pack-reused 2219 (from 2)[K
Receiving objects: 100% (2361/2361), 809.40 MiB | 33.73 MiB/s, done.
Resolving deltas: 100% (1381/1381), done.
Updating files: 100% (1220/1220), done.
Branch 'feat/v4.0-timegan' set up to track remote branch 'feat/v4.0-timegan' from 'origin'.
Switched to a new branch 'feat/v4.0-timegan'
From https://github.com/Krishna200608/NeuroFetal-AI
 * branch            feat/v4.0-timegan -> FETCH_HEAD
Already up to date.
✓ Cloned and checked out feat/v4.0-timegan!


### 1.5 Git Credentials

In [4]:
!git config --global user.email "krishnasikheriya001@gmail.com"
!git config --global user.name "Krishna200608"
print("✓ Git credentials set.")

✓ Git credentials set.


### 1.6 Install Dependencies
Installs all packages required for the full SOTA pipeline (including XGBoost/LightGBM for ensemble).

In [5]:
print("Installing libraries...")
!pip install -q wfdb shap scipy imbalanced-learn pyngrok filterpy \
    scikit-learn matplotlib seaborn pandas numpy tensorflow \
    streamlit plotly python-dotenv xgboost lightgbm
print("✓ Dependencies installed.")

Installing libraries...
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.0/178.0 kB[0m [31m10.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m9.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.9/163.9 kB[0m [31m19.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m118.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m107.7 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for filterpy (setup.py) ... [?25l[?25hdone
[31mERROR:

---
## 2. Data Ingestion (Phase 1–2)

Processes raw `.dat`/`.hea` files into clean `.npy` arrays.

**SOTA enhancements:**
- 18 tabular features (13 signal-derived: STV, LTV, accels/decels, baseline, variability…)
- FHR normalization excluding 0-gaps
- pH threshold relaxed to 7.15 (FIGO)
- Signal quality filter (skip >50% loss)
- Feature standardization (Z-score) with saved scaler

In [6]:
!python Code/scripts/data_ingestion.py

Found 552 records.
pH threshold: 7.15
Max signal loss: 50%
Processed 100 records...
Processed 200 records...
Processed 300 records...
Processed 400 records...
Processed 500 records...

Processing complete.
  Patients: 552
  Total windows: 2546
  Skipped (quality): 214
  Shapes: X_fhr=(2546, 1200), X_uc=(2546, 1200), X_tabular=(2546, 18), y=(2546,)
  Tabular features (18): ['Age', 'Parity', 'Gestation', 'Gravidity', 'Weight', 'fhr_baseline', 'fhr_stv', 'fhr_ltv', 'fhr_accel_count', 'fhr_decel_count', 'fhr_decel_area', 'fhr_range', 'fhr_iqr', 'fhr_entropy', 'uc_freq', 'uc_intensity_mean', 'fhr_uc_lag', 'signal_loss_pct']
  Class balance: 470.0 compromised / 2546 total (18.5%)


---
## 3. Self-Supervised Pretraining

Train the Masked Autoencoder (MAE) on unlabelled FHR data to learn robust temporal representations.

Saves encoder weights → `Code/models/pretrained_fhr_encoder.weights.keras`

In [7]:
!python Code/scripts/pretrain.py

2026-02-21 08:31:37.664905: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771662697.685697    1627 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771662697.692585    1627 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771662697.709324    1627 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771662697.709347    1627 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771662697.709351    1627 computation_placer.cc:177] computation placer alr

---
## 4. Primary Model Training (V4.0 TimeGAN)

Train the **AttentionFusionResNet** using 5-Fold Cross-Validation with **TimeGAN augmentation**.

**V4.0 upgrade:** Replaces tabular SMOTE with pre-generated synthetic pathological traces from WGAN-GP.

**SOTA enhancements (carried from V3.0):**
- 200 epochs with cosine annealing + warmup
- Focal Loss (α=0.65, γ=2.0)
- 4x data augmentation (SpecAugment + CutMix + time-warp + jitter + mixup)
- AdamW with weight decay 5e-4
- SSL pretrained backbone
- Early stopping patience = 40

In [7]:
# Pull latest changes from V4.0 branch
!git pull origin feat/v4.0-timegan

From https://github.com/Krishna200608/NeuroFetal-AI
 * branch            feat/v4.0-timegan -> FETCH_HEAD
Already up to date.


In [8]:
# V4.0: TimeGAN augmentation (default)
!python Code/scripts/train.py --augmentation timegan --epochs 150

2026-02-21 08:34:18.482456: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771662858.503655    2809 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771662858.510652    2809 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771662858.527201    2809 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771662858.527229    2809 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771662858.527233    2809 computation_placer.cc:177] computation placer alr

### 4b. SMOTE Baseline Comparison (Optional)

Run this cell to compare TimeGAN vs SMOTE augmentation. Skip if you only need TimeGAN results.

In [None]:
# Optional: Run SMOTE baseline for comparison
# !python Code/scripts/train.py --augmentation smote --epochs 150

In [12]:
# Auto-push trained models to GitHub
import os

for fold in range(1, 6):
    model_path = f"Code/models/enhanced_model_fold_{fold}.keras"
    if os.path.exists(model_path):
        print(f"Pushing model for Fold {fold}...")
        !git add {model_path}
        !git commit -m "Auto-save: Trained SOTA model Fold {fold}"
        !git push origin feat/v4.0-timegan
        print(f"✓ Fold {fold} pushed.")
    else:
        print(f"⚠️ Not found: {model_path}")

Pushing model for Fold 1...
On branch feat/v4.0-timegan
Your branch is ahead of 'origin/feat/v4.0-timegan' by 5 commits.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Code/models/pretrained_fhr_encoder.weights.keras[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)
	[31mReports/training_logs/training_log_20260221_102159.json[m

no changes added to commit (use "git add" and/or "git commit -a")
Enumerating objects: 33, done.
Counting objects: 100% (33/33), done.
Delta compression using up to 2 threads
Compressing objects: 100% (25/25), done.
Writing objects: 100% (25/25), 117.51 MiB | 10.29 MiB/s, done.
Total 25 (delta 15), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (15/15), completed with 3 local objects.[K
To https://github.com/Krishna200608/N

In [13]:
!git pull origin feat/v4.0-timegan


remote: Enumerating objects: 5, done.[K
remote: Counting objects:  20% (1/5)[Kremote: Counting objects:  40% (2/5)[Kremote: Counting objects:  60% (3/5)[Kremote: Counting objects:  80% (4/5)[Kremote: Counting objects: 100% (5/5)[Kremote: Counting objects: 100% (5/5), done.[K
remote: Total 5 (delta 4), reused 5 (delta 4), pack-reused 0 (from 0)[K
Unpacking objects:  20% (1/5)Unpacking objects:  40% (2/5)Unpacking objects:  60% (3/5)Unpacking objects:  80% (4/5)Unpacking objects: 100% (5/5)Unpacking objects: 100% (5/5), 1.35 KiB | 689.00 KiB/s, done.
From https://github.com/Krishna200608/NeuroFetal-AI
 * branch            feat/v4.0-timegan -> FETCH_HEAD
   97ae102..ad2241e  feat/v4.0-timegan -> origin/feat/v4.0-timegan
Updating 97ae102..ad2241e
Fast-forward
 Code/scripts/train_diverse_ensemble.py | 36 [32m++++++++++++++++++++++++[m[31m----------[m
 1 file changed, 26 insertions(+), 10 deletions(-)


---
## 5. Diverse Ensemble Training (Phase 5)

Train three diverse model families and combine with a stacking meta-learner:

1. **AttentionFusionResNet** — primary (already trained above)
2. **1D-InceptionNet** — multi-scale temporal patterns (kernel 5/15/40)
3. **XGBoost / LightGBM** — gradient boosting on tabular + CSP + FHR features

Out-of-fold predictions across 5 folds → Logistic Regression stacking

**Expected additional AUC lift: +3–5 pts**

In [14]:
!python Code/scripts/train_diverse_ensemble.py

NeuroFetal AI — Diverse Ensemble Training (Phase 5)

Data: FHR=(2546, 1200, 1), Tab=(2546, 18), y=(2546,)
Class balance: 18.5% positive
2026-02-21 10:46:47.377580: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771670807.398267   45986 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771670807.404943   45986 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771670807.420862   45986 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771670807.420887   45986 computation_placer.cc:177] computation placer already registered. Please check linkage a

In [17]:
# Push ensemble artifacts
import os

ensemble_files = [
    "Code/models/stacking_meta_learner.pkl",
    "Code/models/xgb_model.pkl",
]

# Also push any InceptionNet fold models
for fold in range(1, 6):
    inception_path = f"Code/models/inception_fold_{fold}.keras"
    if os.path.exists(inception_path):
        ensemble_files.append(inception_path)

pushed = []
for f in ensemble_files:
    if os.path.exists(f):
        !git add {f}
        pushed.append(f)

if pushed:
    !git commit -m "Auto-save: Diverse ensemble models (InceptionNet + XGB + meta-learner)"
    !git push origin feat/v4.0-timegan
    print(f"✓ Pushed {len(pushed)} ensemble artifacts.")
else:
    print("⚠️ No ensemble files found to push.")

On branch feat/v4.0-timegan
Your branch is ahead of 'origin/feat/v4.0-timegan' by 1 commit.
  (use "git push" to publish your local commits)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
	[31mmodified:   Code/models/pretrained_fhr_encoder.weights.keras[m
	[31mmodified:   Reports/uncertainty_analysis/fold_1/calibration_curve.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_1/uncertainty_histogram.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_2/calibration_curve.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_2/uncertainty_histogram.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_3/calibration_curve.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_3/uncertainty_histogram.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_4/calibration_curve.png[m
	[31mmodified:   Reports/uncertainty_analysis/fold_4/uncer

---
## 6. Evaluation & Calibration (Phase 6)

**Stacking Ensemble Evaluation** with:
- Temperature scaling (Guo et al., 2017)
- Optimal threshold search (Youden's J / F1 / cost-sensitive)
- Enhanced 3-pass TTA (original + flip + noise)
- AUPRC reporting for imbalanced data

**Uncertainty Quantification** via MC Dropout.

In [16]:
print("\nRunning Stacking Ensemble Evaluation...")
!python Code/scripts/evaluate_ensemble.py

print("\nRunning Uncertainty Quantification (MC Dropout)...")
!python Code/scripts/evaluate_uncertainty.py


Running Stacking Ensemble Evaluation...
2026-02-21 11:00:43.132845: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771671643.168453   56912 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771671643.178397   56912 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771671643.203490   56912 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771671643.203525   56912 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771671643.203528   56912 computat

---
## 7. Launch Dashboard (Optional)

Run the Streamlit dashboard from Colab via **ngrok** tunnel.

> Requires `NGROK_AUTH_TOKEN` in Colab Secrets.

In [None]:
from google.colab import userdata

try:
    auth_token = userdata.get('NGROK_AUTH_TOKEN')
    print("✓ Ngrok Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Error loading NGROK_AUTH_TOKEN from Secrets. Falling back to manual input.")
    from getpass import getpass
    auth_token = getpass("Enter Ngrok Auth Token manually: ")

if auth_token:
    with open("Code/.env", "w") as f:
        f.write(f"NGROK_AUTH_TOKEN={auth_token}\n")

print("Launching Streamlit App...")
!python Code/run_app.py

✓ Ngrok Token loaded from Secrets.
Launching Streamlit App...
Authenticating with ngrok...
Starting Streamlit Server...
Using system python: /usr/bin/python3
Attempting to open public tunnel...

   DASHBOARD LIVE AT: https://beauteously-uncaped-dario.ngrok-free.dev
   LOCAL ADDRESS:     http://localhost:8501

Press Ctrl+C to stop the server.


---
## 8. Convert to TFLite & Auto-Push

Convert the best trained model to TFLite format and push to GitHub automatically.

In [18]:
!python Code/scripts/convert_to_tflite.py

2026-02-21 11:03:36.910062: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1771671816.930562   57981 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1771671816.937280   57981 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1771671816.953537   57981 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771671816.953569   57981 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1771671816.953573   57981 computation_placer.cc:177] computation placer alr

In [19]:
# Push TFLite model
import os

tflite_path = "Code/models/tflite/neurofetal_model_quant_int8.tflite"
if os.path.exists(tflite_path):
    !git add {tflite_path}
    !git commit -m "Auto-save: TFLite model"
    !git push origin feat/v4.0-timegan
    print("✓ TFLite model pushed.")
else:
    print("⚠️ TFLite model not found.")

[feat/v4.0-timegan 4251b54] Auto-save: TFLite model
 1 file changed, 0 insertions(+), 0 deletions(-)
 rewrite Code/models/tflite/neurofetal_model_quant_int8.tflite (73%)
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 2 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.65 MiB | 3.05 MiB/s, done.
Total 6 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.[K
To https://github.com/Krishna200608/NeuroFetal-AI.git
   a65dc13..4251b54  feat/v4.0-timegan -> feat/v4.0-timegan
✓ TFLite model pushed.


---
## ✅ Pipeline Complete

All 6 SOTA phases have been executed. Check the evaluation output above for final AUC and calibration metrics.