# NeuroFetal AI — SOTA Training Pipeline

**Version 4.0** — 6-Phase SOTA Strategy (AUC 0.84+ Target)

This notebook orchestrates the full SOTA pipeline on Google Colab with GPU acceleration.

### Pipeline Steps
| # | Phase | Script | Expected AUC Lift |
|---|-------|--------|-------------------|
| 1 | Setup | Clone repo, install deps | — |
| 2 | Data Ingestion | `data_ingestion.py` — 18 features, pH 7.15, quality filter | +5–8 pts |
| 3 | SSL Pretraining | `pretrain.py` — Masked Autoencoder on FHR | +2–3 pts |
| 4 | Primary Training | `train.py` — 200 epochs, focal loss, 4x augment | +3–5 pts |
| 5 | Ensemble Training | `train_diverse_ensemble.py` — InceptionNet + XGB + Stacking | +3–5 pts |
| 6 | Evaluation | `evaluate_ensemble.py` — Temp scaling, TTA, calibration | +1–2 pts |
| 7 | Deployment | `convert_to_tflite.py` — TFLite & auto-push | — |

## 1. Setup Environment

In [1]:
from google.colab import userdata
import os

# 1. GitHub Authentication
GITHUB_REPO = "Krishna200608/NeuroFetal-AI"

try:
    GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
    print("✓ GitHub Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Error loading GITHUB_TOKEN from Secrets. Falling back to manual input.")
    from getpass import getpass
    GITHUB_TOKEN = getpass("Enter GitHub Personal Access Token (PAT): ")

os.environ['GITHUB_TOKEN'] = GITHUB_TOKEN
os.environ['GITHUB_REPO'] = GITHUB_REPO

✓ GitHub Token loaded from Secrets.


In [2]:
# 2. Clone Repository
import shutil
import os

# Reset to /content before deleting the repo folder
try:
    os.chdir("/content")
except:
    pass

# Clean up any previous clone
if os.path.exists("/content/NeuroFetal-AI"):
    shutil.rmtree("/content/NeuroFetal-AI")

print("Cloning repository...")
!git clone https://{GITHUB_TOKEN}@github.com/{GITHUB_REPO}.git

os.chdir("/content/NeuroFetal-AI")
print("✓ Cloned successfully!")

Cloning repository...
Cloning into 'NeuroFetal-AI'...
remote: Enumerating objects: 2112, done.[K
remote: Counting objects: 100% (120/120), done.[K
remote: Compressing objects: 100% (36/36), done.[K
remote: Total 2112 (delta 97), reused 85 (delta 84), pack-reused 1992 (from 3)[K
Receiving objects: 100% (2112/2112), 645.05 MiB | 16.80 MiB/s, done.
Resolving deltas: 100% (1204/1204), done.
Updating files: 100% (1211/1211), done.
✓ Cloned successfully!


### 1.5 Git Credentials

In [3]:
!git config --global user.email "krishnasikheriya001@gmail.com"
!git config --global user.name "Krishna200608"
print("✓ Git credentials set.")

✓ Git credentials set.


### 1.6 Install Dependencies
Installs all packages required for the full SOTA pipeline (including XGBoost/LightGBM for ensemble).

In [4]:
print("Installing libraries...")
!pip install -q wfdb shap scipy imbalanced-learn pyngrok filterpy \
    scikit-learn matplotlib seaborn pandas numpy tensorflow \
    streamlit plotly python-dotenv xgboost lightgbm
print("✓ Dependencies installed.")

Installing libraries...
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m178.0/178.0 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m79.5/79.5 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m91.2/91.2 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m163.9/163.9 kB[0m [31m16.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.1/9.1 MB[0m [31m21.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.4/12.4 MB[0m [31m110.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m102.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for filterpy (setup.py) ... [?25l[?25hdone
[31mERROR:

---
## 2. Data Ingestion (Phase 1–2)

Processes raw `.dat`/`.hea` files into clean `.npy` arrays.

**SOTA enhancements:**
- 18 tabular features (13 signal-derived: STV, LTV, accels/decels, baseline, variability…)
- FHR normalization excluding 0-gaps
- pH threshold relaxed to 7.15 (FIGO)
- Signal quality filter (skip >50% loss)
- Feature standardization (Z-score) with saved scaler

In [5]:
!python Code/scripts/data_ingestion.py

Found 552 records.
pH threshold: 7.15
Max signal loss: 50%
Processed 100 records...
Processed 200 records...
Processed 300 records...
Processed 400 records...
Processed 500 records...

Processing complete.
  Patients: 552
  Total windows: 2546
  Skipped (quality): 214
  Shapes: X_fhr=(2546, 1200), X_uc=(2546, 1200), X_tabular=(2546, 18), y=(2546,)
  Tabular features (18): ['Age', 'Parity', 'Gestation', 'Gravidity', 'Weight', 'fhr_baseline', 'fhr_stv', 'fhr_ltv', 'fhr_accel_count', 'fhr_decel_count', 'fhr_decel_area', 'fhr_range', 'fhr_iqr', 'fhr_entropy', 'uc_freq', 'uc_intensity_mean', 'fhr_uc_lag', 'signal_loss_pct']
  Class balance: 470.0 compromised / 2546 total (18.5%)


---
## 3. Self-Supervised Pretraining

Train the Masked Autoencoder (MAE) on unlabelled FHR data to learn robust temporal representations.

Saves encoder weights → `Code/models/pretrained_fhr_encoder.weights.keras`

In [7]:
!python Code/scripts/pretrain.py

2026-02-12 17:45:18.685272: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770918318.706202    3003 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770918318.713749    3003 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770918318.730595    3003 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770918318.730621    3003 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770918318.730664    3003 computation_placer.cc:177] computation placer alr

---
## 4. Primary Model Training (Phase 3–4)

Train the **AttentionFusionResNet** using 5-Fold Cross-Validation.

**SOTA enhancements:**
- 200 epochs with cosine annealing + warmup
- Focal Loss (α=0.65, γ=2.0) — less aggressive for better calibration
- 4x data augmentation (SpecAugment + CutMix + time-warp + jitter + mixup)
- AdamW with weight decay 5e-4
- Backbone right-sized to 192-dim with stochastic depth
- Auxiliary pH regression head for multi-task learning
- Early stopping patience = 40

In [None]:
!git pull origin main

In [8]:
# Full 5-fold training — approx 2-3 hours on T4 GPU
!python Code/scripts/train.py

2026-02-12 17:48:40.590413: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770918520.610998    4361 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770918520.618996    4361 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770918520.634991    4361 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770918520.635013    4361 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770918520.635016    4361 computation_placer.cc:177] computation placer alr

In [10]:
# Auto-push trained models to GitHub
import os

for fold in range(1, 6):
    model_path = f"Code/models/enhanced_model_fold_{fold}.keras"
    if os.path.exists(model_path):
        print(f"Pushing model for Fold {fold}...")
        !git add {model_path}
        !git commit -m "Auto-save: Trained SOTA model Fold {fold}"
        !git push origin main
        print(f"✓ Fold {fold} pushed.")
    else:
        print(f"⚠️ Not found: {model_path}")

Pushing model for Fold 1...
[main 6043d7e] Auto-save: Trained SOTA model Fold 1
 1 file changed, 0 insertions(+), 0 deletions(-)
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 2 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 23.47 MiB | 8.49 MiB/s, done.
Total 5 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed with 3 local objects.[K
To https://github.com/Krishna200608/NeuroFetal-AI.git
   b215adf..6043d7e  main -> main
✓ Fold 1 pushed.
Pushing model for Fold 2...
[main 261fd2b] Auto-save: Trained SOTA model Fold 2
 1 file changed, 0 insertions(+), 0 deletions(-)
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 2 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 23.48 MiB | 6.46 MiB/s, done.
Total 5 (delta 3), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (3/3), completed wi

In [15]:
!git pull origin main

From https://github.com/Krishna200608/NeuroFetal-AI
 * branch            main       -> FETCH_HEAD
Already up to date.


---
## 5. Diverse Ensemble Training (Phase 5)

Train three diverse model families and combine with a stacking meta-learner:

1. **AttentionFusionResNet** — primary (already trained above)
2. **1D-InceptionNet** — multi-scale temporal patterns (kernel 5/15/40)
3. **XGBoost / LightGBM** — gradient boosting on tabular + CSP + FHR features

Out-of-fold predictions across 5 folds → Logistic Regression stacking

**Expected additional AUC lift: +3–5 pts**

In [16]:
!python Code/scripts/train_diverse_ensemble.py

NeuroFetal AI — Diverse Ensemble Training (Phase 5)

Data: FHR=(2546, 1200, 1), Tab=(2546, 18), y=(2546,)
Class balance: 18.5% positive
2026-02-12 19:44:49.692368: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770925489.713556   49678 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770925489.720456   49678 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770925489.736540   49678 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770925489.736566   49678 computation_placer.cc:177] computation placer already registered. Please check linkage a

In [17]:
# Push ensemble artifacts
import os

ensemble_files = [
    "Code/models/stacking_meta_learner.pkl",
    "Code/models/xgb_model.pkl",
]

# Also push any InceptionNet fold models
for fold in range(1, 6):
    inception_path = f"Code/models/inception_fold_{fold}.keras"
    if os.path.exists(inception_path):
        ensemble_files.append(inception_path)

pushed = []
for f in ensemble_files:
    if os.path.exists(f):
        !git add {f}
        pushed.append(f)

if pushed:
    !git commit -m "Auto-save: Diverse ensemble models (InceptionNet + XGB + meta-learner)"
    !git push origin main
    print(f"✓ Pushed {len(pushed)} ensemble artifacts.")
else:
    print("⚠️ No ensemble files found to push.")

[main 71a7b62] Auto-save: Diverse ensemble models (InceptionNet + XGB + meta-learner)
 1 file changed, 0 insertions(+), 0 deletions(-)
Enumerating objects: 9, done.
Counting objects: 100% (9/9), done.
Delta compression using up to 2 threads
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 515 bytes | 515.00 KiB/s, done.
Total 5 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.[K
To https://github.com/Krishna200608/NeuroFetal-AI.git
   4779678..71a7b62  main -> main
✓ Pushed 1 ensemble artifacts.


---
## 6. Evaluation & Calibration (Phase 6)

**Stacking Ensemble Evaluation** with:
- Temperature scaling (Guo et al., 2017)
- Optimal threshold search (Youden's J / F1 / cost-sensitive)
- Enhanced 3-pass TTA (original + flip + noise)
- AUPRC reporting for imbalanced data

**Uncertainty Quantification** via MC Dropout.

In [18]:
print("\nRunning Stacking Ensemble Evaluation...")
!python Code/scripts/evaluate_ensemble.py

print("\nRunning Uncertainty Quantification (MC Dropout)...")
!python Code/scripts/evaluate_uncertainty.py


Running Stacking Ensemble Evaluation...
2026-02-12 19:56:11.055900: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770926171.085410   59865 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770926171.095424   59865 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770926171.111810   59865 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770926171.111836   59865 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770926171.111840   59865 computat

---
## 7. Launch Dashboard (Optional)

Run the Streamlit dashboard from Colab via **ngrok** tunnel.

> Requires `NGROK_AUTH_TOKEN` in Colab Secrets.

In [None]:
from google.colab import userdata

try:
    auth_token = userdata.get('NGROK_AUTH_TOKEN')
    print("✓ Ngrok Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Error loading NGROK_AUTH_TOKEN from Secrets. Falling back to manual input.")
    from getpass import getpass
    auth_token = getpass("Enter Ngrok Auth Token manually: ")

if auth_token:
    with open("Code/.env", "w") as f:
        f.write(f"NGROK_AUTH_TOKEN={auth_token}\n")

print("Launching Streamlit App...")
!python Code/run_app.py

✓ Ngrok Token loaded from Secrets.
Launching Streamlit App...
Authenticating with ngrok...
Starting Streamlit Server...
Attempting to open public tunnel...

   DASHBOARD LIVE AT: https://beauteously-uncaped-dario.ngrok-free.dev
   LOCAL ADDRESS:     http://localhost:8501

Press Ctrl+C to stop the server.

🛑 Stopping NeuroFetal AI Dashboard...
   -> Terminating Streamlit process...

🛑 Stopping NeuroFetal AI Dashboard...
   -> Terminating Streamlit process...
t=2026-02-12T20:02:34+0000 lvl=warn msg="Stopping forwarder" name=http-8501-bf8927d8-0823-4deb-858b-13443a08d196 acceptErr="failed to accept connection: Listener closed"
t=2026-02-12T20:02:34+0000 lvl=warn msg="Error restarting forwarder" name=http-8501-bf8927d8-0823-4deb-858b-13443a08d196 err="failed to start tunnel: session closed"


---
## 8. Convert to TFLite & Auto-Push

Convert the best trained model to TFLite format and push to GitHub automatically.

In [19]:
!python Code/scripts/convert_to_tflite.py

2026-02-12 19:59:03.848454: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770926343.871034   60893 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770926343.882783   60893 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770926343.906066   60893 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770926343.906091   60893 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770926343.906096   60893 computation_placer.cc:177] computation placer alr

In [21]:
# Push TFLite model
import os

tflite_path = "Code/models/tflite/neurofetal_model_quant_int8.tflite"
if os.path.exists(tflite_path):
    !git add {tflite_path}
    !git commit -m "Auto-save: TFLite model"
    !git push origin main
    print("✓ TFLite model pushed.")
else:
    print("⚠️ TFLite model not found.")

[main 5e7a852] Auto-save: TFLite model
 1 file changed, 0 insertions(+), 0 deletions(-)
 rewrite Code/models/tflite/neurofetal_model_quant_int8.tflite (75%)
Enumerating objects: 11, done.
Counting objects: 100% (11/11), done.
Delta compression using up to 2 threads
Compressing objects: 100% (6/6), done.
Writing objects: 100% (6/6), 1.64 MiB | 4.14 MiB/s, done.
Total 6 (delta 4), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (4/4), completed with 4 local objects.[K
To https://github.com/Krishna200608/NeuroFetal-AI.git
   71a7b62..5e7a852  main -> main
✓ TFLite model pushed.


---
## ✅ Pipeline Complete

All 6 SOTA phases have been executed. Check the evaluation output above for final AUC and calibration metrics.