# NeuroFetal AI — V4.0 TimeGAN Synthetic CTG Generation

**Branch:** `feat/v4.0-timegan`

**Objective:** Train a 1D Convolutional GAN on the minority-class (Pathological) FHR+UC signals to generate realistic synthetic fetal distress traces, replacing SMOTE for class balancing.

### Pipeline
| # | Phase | Description |
|---|-------|-------------|
| 1 | Setup | Clone repo, install deps, authenticate |
| 2 | Data Prep | Load `.npy`, isolate pathological traces, stack FHR+UC |
| 3 | Architecture | Build 1D Conv Generator + Discriminator |
| 4 | Training | Custom `tf.GradientTape` loop with WGAN-GP |
| 5 | Visualization | Compare real vs synthetic traces |
| 6 | Export | Save synthetic `.npy` files |
| 7 | Push | Commit results back to GitHub |

---
## 1. Setup Environment

In [None]:
from google.colab import userdata
import os

GITHUB_REPO = "Krishna200608/NeuroFetal-AI"

try:
    GITHUB_TOKEN = userdata.get('GITHUB_TOKEN')
    print("✓ GitHub Token loaded from Secrets.")
except Exception as e:
    print("⚠️ Falling back to manual input.")
    from getpass import getpass
    GITHUB_TOKEN = getpass("Enter GitHub PAT: ")

os.environ['GITHUB_TOKEN'] = GITHUB_TOKEN
os.environ['GITHUB_REPO'] = GITHUB_REPO

In [None]:
import shutil, os

try:
    os.chdir("/content")
except:
    pass

if os.path.exists("/content/NeuroFetal-AI"):
    shutil.rmtree("/content/NeuroFetal-AI")

print("Cloning repository...")
!git clone https://{GITHUB_TOKEN}@github.com/{GITHUB_REPO}.git
os.chdir("/content/NeuroFetal-AI")

!git config --global user.email "krishnasikheriya001@gmail.com"
!git config --global user.name "Krishna200608"

!git fetch origin
!git checkout feat/v4.0-timegan
print("✓ Cloned and checked out feat/v4.0-timegan!")

In [None]:
print("Installing dependencies...")
!pip install -q wfdb scipy imbalanced-learn scikit-learn matplotlib seaborn pandas numpy tensorflow
print("✓ Dependencies installed.")

---
## 2. Data Ingestion & Pathological Isolation

Run the existing data pipeline, then isolate **only** the minority class (`y == 1`, Compromised pH < 7.15). We stack FHR + UC into a `(N, 1200, 2)` tensor to preserve their physiological cross-correlation (late decelerations following contractions).

In [None]:
# Run standard data ingestion first
!python Code/scripts/data_ingestion.py

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# ── Load preprocessed data ──
X_fhr = np.load("Datasets/processed/X_fhr.npy")
X_uc  = np.load("Datasets/processed/X_uc.npy")
X_tab = np.load("Datasets/processed/X_tabular.npy")
y     = np.load("Datasets/processed/y.npy")

print(f"Full dataset — FHR: {X_fhr.shape}, UC: {X_uc.shape}, Tab: {X_tab.shape}, y: {y.shape}")
print(f"Class balance: {y.sum():.0f} pathological / {len(y)} total ({y.mean()*100:.2f}%)")

# ── Isolate minority class ──
patho_idx = np.where(y == 1)[0]
X_fhr_patho = X_fhr[patho_idx]  # (N_patho, 1200, 1)
X_uc_patho  = X_uc[patho_idx]   # (N_patho, 1200, 1)

# ── Stack FHR + UC into 2-channel signal ──
# Shape: (N_patho, 1200, 2) — Channel 0 = FHR, Channel 1 = UC
X_patho_stacked = np.concatenate([X_fhr_patho, X_uc_patho], axis=-1)

print(f"\nPathological samples isolated: {X_patho_stacked.shape[0]}")
print(f"Stacked signal shape: {X_patho_stacked.shape}  (timesteps=1200, channels=2)")

# ── Quick visualization of a real pathological trace ──
fig, axes = plt.subplots(2, 1, figsize=(14, 5), sharex=True)
sample_idx = 0
axes[0].plot(X_patho_stacked[sample_idx, :, 0], color='crimson', linewidth=0.8)
axes[0].set_ylabel('FHR (normalized)')
axes[0].set_title(f'Real Pathological Trace #{sample_idx}')
axes[1].plot(X_patho_stacked[sample_idx, :, 1], color='steelblue', linewidth=0.8)
axes[1].set_ylabel('UC (normalized)')
axes[1].set_xlabel('Timestep (1 Hz, 20 min window)')
plt.tight_layout()
plt.show()

---
## 3. TimeGAN Architecture (1D Convolutional WGAN-GP)

We use a **Wasserstein GAN with Gradient Penalty (WGAN-GP)** architecture built entirely with 1D Convolutions.

### Design Rationale
- **Why not RNNs?** At sequence length 1200, LSTMs/GRUs suffer from vanishing gradients and are extremely slow to train.
- **Why WGAN-GP?** Standard GAN training with BCE loss is notoriously unstable. WGAN-GP uses the Wasserstein distance + gradient penalty for much more stable convergence on small datasets.
- **Why 2-channel?** By generating FHR and UC simultaneously, we preserve their physiological cross-correlation (e.g., late decelerations follow contractions).

### Architecture Summary
| Component | Input | Output | Key Layers |
|-----------|-------|--------|------------|
| **Generator** | `(batch, 128)` noise | `(batch, 1200, 2)` | Dense → Reshape → 5x Conv1DTranspose (upsampling) → tanh |
| **Critic** | `(batch, 1200, 2)` signal | `(batch, 1)` score | 5x Conv1D (downsampling) → Dense → linear |

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# ═══════════════════════════════════════════════════════════════════
# Hyperparameters
# ═══════════════════════════════════════════════════════════════════
NOISE_DIM      = 128       # Latent space dimension
SEQ_LEN        = 1200      # 20 min at 1 Hz
N_CHANNELS     = 2         # FHR + UC
BATCH_SIZE     = 16        # Small for Colab T4 VRAM
GP_WEIGHT      = 10.0      # Gradient penalty coefficient
N_CRITIC       = 5         # Critic updates per generator update
EPOCHS         = 500       # Total training epochs
LR_G           = 1e-4      # Generator learning rate
LR_D           = 1e-4      # Critic learning rate

# ═══════════════════════════════════════════════════════════════════
# GENERATOR: Noise → (1200, 2) synthetic CTG signal
# Strategy: Dense → Reshape to (75, 256) → 4 Conv1DTranspose blocks
# to progressively upsample: 75 → 150 → 300 → 600 → 1200
# ═══════════════════════════════════════════════════════════════════
def build_generator():
    model = keras.Sequential(name="Generator")

    # Project noise to a small temporal tensor
    model.add(layers.Dense(75 * 256, input_dim=NOISE_DIM))
    model.add(layers.Reshape((75, 256)))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(0.2))

    # Block 1: 75 → 150
    model.add(layers.Conv1DTranspose(256, kernel_size=5, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(0.2))

    # Block 2: 150 → 300
    model.add(layers.Conv1DTranspose(128, kernel_size=5, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(0.2))

    # Block 3: 300 → 600
    model.add(layers.Conv1DTranspose(64, kernel_size=5, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(0.2))

    # Block 4: 600 → 1200
    model.add(layers.Conv1DTranspose(32, kernel_size=5, strides=2, padding='same'))
    model.add(layers.BatchNormalization())
    model.add(layers.LeakyReLU(0.2))

    # Output: (1200, 2) — tanh for normalized signals
    model.add(layers.Conv1D(N_CHANNELS, kernel_size=7, padding='same', activation='tanh'))

    return model

# ═══════════════════════════════════════════════════════════════════
# CRITIC (Discriminator): (1200, 2) signal → scalar Wasserstein score
# Strategy: 4 Conv1D blocks to downsample: 1200 → 600 → 300 → 150 → 75
# ═══════════════════════════════════════════════════════════════════
def build_critic():
    model = keras.Sequential(name="Critic")

    # Block 1: 1200 → 600
    model.add(layers.Conv1D(32, kernel_size=5, strides=2, padding='same', input_shape=(SEQ_LEN, N_CHANNELS)))
    model.add(layers.LeakyReLU(0.2))
    model.add(layers.Dropout(0.25))

    # Block 2: 600 → 300
    model.add(layers.Conv1D(64, kernel_size=5, strides=2, padding='same'))
    model.add(layers.LeakyReLU(0.2))
    model.add(layers.Dropout(0.25))

    # Block 3: 300 → 150
    model.add(layers.Conv1D(128, kernel_size=5, strides=2, padding='same'))
    model.add(layers.LeakyReLU(0.2))
    model.add(layers.Dropout(0.25))

    # Block 4: 150 → 75
    model.add(layers.Conv1D(256, kernel_size=5, strides=2, padding='same'))
    model.add(layers.LeakyReLU(0.2))
    model.add(layers.Dropout(0.25))

    # Flatten → score (no sigmoid for WGAN)
    model.add(layers.Flatten())
    model.add(layers.Dense(1))  # Linear output for Wasserstein distance

    return model

# ── Build and inspect ──
generator = build_generator()
critic    = build_critic()

print("="*60)
generator.summary()
print("\n" + "="*60)
critic.summary()

# Quick sanity check
test_noise = tf.random.normal((1, NOISE_DIM))
test_output = generator(test_noise)
print(f"\n✓ Generator output shape: {test_output.shape}  (expected: (1, 1200, 2))")
test_score = critic(test_output)
print(f"✓ Critic output shape: {test_score.shape}  (expected: (1, 1))")

---
## 4. WGAN-GP Training Loop

Custom training with:
- **Wasserstein loss** (Critic maximizes `E[D(real)] - E[D(fake)]`)
- **Gradient Penalty** for Lipschitz constraint enforcement
- **5:1 Critic-to-Generator update ratio** for stable convergence
- **Visualization** every 50 epochs to track morphological quality

In [None]:
import time
from IPython.display import clear_output

# ── Optimizers ──
gen_optimizer    = keras.optimizers.Adam(LR_G, beta_1=0.0, beta_2=0.9)
critic_optimizer = keras.optimizers.Adam(LR_D, beta_1=0.0, beta_2=0.9)

# ── Prepare dataset ──
# Normalize data to [-1, 1] range for tanh generator output
data_min = X_patho_stacked.min(axis=(0, 1), keepdims=True)
data_max = X_patho_stacked.max(axis=(0, 1), keepdims=True)
data_range = data_max - data_min
data_range[data_range == 0] = 1.0  # Avoid division by zero

X_normalized = 2.0 * (X_patho_stacked - data_min) / data_range - 1.0
X_normalized = X_normalized.astype(np.float32)

dataset = tf.data.Dataset.from_tensor_slices(X_normalized)
dataset = dataset.shuffle(buffer_size=len(X_normalized)).batch(BATCH_SIZE, drop_remainder=True)

print(f"Training data: {X_normalized.shape} normalized to [-1, 1]")
print(f"Batches per epoch: {len(X_normalized) // BATCH_SIZE}")

# ═══════════════════════════════════════════════════════════════════
# Gradient Penalty
# ═══════════════════════════════════════════════════════════════════
@tf.function
def gradient_penalty(real_samples, fake_samples):
    """Computes gradient penalty for WGAN-GP."""
    batch_size = tf.shape(real_samples)[0]
    alpha = tf.random.uniform([batch_size, 1, 1], 0.0, 1.0)
    interpolated = real_samples + alpha * (fake_samples - real_samples)

    with tf.GradientTape() as gp_tape:
        gp_tape.watch(interpolated)
        pred = critic(interpolated, training=True)

    grads = gp_tape.gradient(pred, interpolated)
    norm = tf.sqrt(tf.reduce_sum(tf.square(grads), axis=[1, 2]) + 1e-8)
    gp = tf.reduce_mean((norm - 1.0) ** 2)
    return gp

# ═══════════════════════════════════════════════════════════════════
# Single Training Step
# ═══════════════════════════════════════════════════════════════════
@tf.function
def train_critic_step(real_batch):
    """One critic update step."""
    noise = tf.random.normal((tf.shape(real_batch)[0], NOISE_DIM))

    with tf.GradientTape() as tape:
        fake_batch = generator(noise, training=True)
        real_score  = critic(real_batch, training=True)
        fake_score  = critic(fake_batch, training=True)

        # Wasserstein loss: maximize E[D(real)] - E[D(fake)]
        # Critic minimizes: E[D(fake)] - E[D(real)] + GP
        w_loss = tf.reduce_mean(fake_score) - tf.reduce_mean(real_score)
        gp = gradient_penalty(real_batch, fake_batch)
        critic_loss = w_loss + GP_WEIGHT * gp

    grads = tape.gradient(critic_loss, critic.trainable_variables)
    critic_optimizer.apply_gradients(zip(grads, critic.trainable_variables))
    return critic_loss, w_loss

@tf.function
def train_generator_step():
    """One generator update step."""
    noise = tf.random.normal((BATCH_SIZE, NOISE_DIM))

    with tf.GradientTape() as tape:
        fake_batch = generator(noise, training=True)
        fake_score = critic(fake_batch, training=True)
        # Generator wants critic to score fakes highly
        gen_loss = -tf.reduce_mean(fake_score)

    grads = tape.gradient(gen_loss, generator.trainable_variables)
    gen_optimizer.apply_gradients(zip(grads, generator.trainable_variables))
    return gen_loss

# ═══════════════════════════════════════════════════════════════════
# Visualization Helper
# ═══════════════════════════════════════════════════════════════════
def visualize_comparison(epoch, real_data, n_samples=3):
    """Plot real vs generated traces side by side."""
    noise = tf.random.normal((n_samples, NOISE_DIM))
    generated = generator(noise, training=False).numpy()

    fig, axes = plt.subplots(n_samples, 2, figsize=(16, 3 * n_samples))
    fig.suptitle(f'Epoch {epoch}: Real (Left) vs Generated (Right)', fontsize=14, fontweight='bold')

    for i in range(n_samples):
        # Pick a random real sample
        real_idx = np.random.randint(0, len(real_data))

        # Real trace
        axes[i, 0].plot(real_data[real_idx, :, 0], color='crimson', linewidth=0.7, label='FHR')
        axes[i, 0].plot(real_data[real_idx, :, 1], color='steelblue', linewidth=0.7, alpha=0.7, label='UC')
        axes[i, 0].set_ylabel(f'Sample {i+1}')
        if i == 0:
            axes[i, 0].set_title('REAL Pathological')
            axes[i, 0].legend(loc='upper right', fontsize=8)

        # Generated trace
        axes[i, 1].plot(generated[i, :, 0], color='crimson', linewidth=0.7, label='FHR')
        axes[i, 1].plot(generated[i, :, 1], color='steelblue', linewidth=0.7, alpha=0.7, label='UC')
        if i == 0:
            axes[i, 1].set_title('GENERATED Synthetic')
            axes[i, 1].legend(loc='upper right', fontsize=8)

    axes[-1, 0].set_xlabel('Timestep (1 Hz)')
    axes[-1, 1].set_xlabel('Timestep (1 Hz)')
    plt.tight_layout()
    plt.savefig(f'Code/models/gan_comparison_epoch_{epoch}.png', dpi=100, bbox_inches='tight')
    plt.show()

# ═══════════════════════════════════════════════════════════════════
# Main Training Loop
# ═══════════════════════════════════════════════════════════════════
print(f"\n{'='*60}")
print(f"Starting WGAN-GP Training — {EPOCHS} epochs")
print(f"Generator LR: {LR_G}, Critic LR: {LR_D}")
print(f"Critic updates per G update: {N_CRITIC}")
print(f"Batch size: {BATCH_SIZE}")
print(f"{'='*60}\n")

history = {'d_loss': [], 'g_loss': [], 'w_dist': []}
start_time = time.time()

for epoch in range(1, EPOCHS + 1):
    epoch_d_loss = []
    epoch_g_loss = []
    epoch_w_dist = []

    for step, real_batch in enumerate(dataset):
        # ── Train Critic N_CRITIC times ──
        d_loss, w_dist = train_critic_step(real_batch)
        epoch_d_loss.append(float(d_loss))
        epoch_w_dist.append(float(w_dist))

        # ── Train Generator once every N_CRITIC steps ──
        if step % N_CRITIC == 0:
            g_loss = train_generator_step()
            epoch_g_loss.append(float(g_loss))

    # Record epoch averages
    avg_d = np.mean(epoch_d_loss)
    avg_g = np.mean(epoch_g_loss) if epoch_g_loss else 0
    avg_w = np.mean(epoch_w_dist)
    history['d_loss'].append(avg_d)
    history['g_loss'].append(avg_g)
    history['w_dist'].append(avg_w)

    # Print progress
    if epoch % 10 == 0 or epoch == 1:
        elapsed = time.time() - start_time
        print(f"Epoch {epoch:>4d}/{EPOCHS} | D Loss: {avg_d:>8.4f} | G Loss: {avg_g:>8.4f} | W Dist: {avg_w:>8.4f} | Time: {elapsed:.0f}s")

    # Visualize every 50 epochs
    if epoch % 50 == 0:
        visualize_comparison(epoch, X_normalized, n_samples=3)

total_time = time.time() - start_time
print(f"\n✓ Training complete in {total_time/60:.1f} minutes!")

---
## 5. Training Diagnostics

Plot the loss curves to verify stable convergence. For WGAN-GP:
- The **Wasserstein distance** (W Dist) should decrease over time, indicating the generator is fooling the critic.
- Loss curves should **not** diverge wildly.

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(18, 4))

axes[0].plot(history['d_loss'], color='orange', linewidth=0.8)
axes[0].set_title('Critic Loss')
axes[0].set_xlabel('Epoch')
axes[0].grid(alpha=0.3)

axes[1].plot(history['g_loss'], color='green', linewidth=0.8)
axes[1].set_title('Generator Loss')
axes[1].set_xlabel('Epoch')
axes[1].grid(alpha=0.3)

axes[2].plot(history['w_dist'], color='purple', linewidth=0.8)
axes[2].set_title('Wasserstein Distance')
axes[2].set_xlabel('Epoch')
axes[2].grid(alpha=0.3)

plt.suptitle('WGAN-GP Training Diagnostics', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.savefig('Code/models/gan_training_diagnostics.png', dpi=150, bbox_inches='tight')
plt.show()

---
## 6. Generate & Export Synthetic Data

Generate **3x** the original pathological count as new synthetic samples. These will be used to replace SMOTE in the V4.0 training pipeline.

We also perform a statistical sanity check: compare mean/std of real vs synthetic per channel.

In [None]:
# ── Configuration ──
N_SYNTHETIC = len(X_fhr_patho) * 3  # 3x oversampling
print(f"Generating {N_SYNTHETIC} synthetic pathological traces...")

# ── Batch generation to avoid OOM ──
synthetic_batches = []
gen_batch_size = 64
for i in range(0, N_SYNTHETIC, gen_batch_size):
    batch_n = min(gen_batch_size, N_SYNTHETIC - i)
    noise = tf.random.normal((batch_n, NOISE_DIM))
    synthetic_batch = generator(noise, training=False).numpy()
    synthetic_batches.append(synthetic_batch)

X_synthetic_normalized = np.concatenate(synthetic_batches, axis=0)  # (N_SYNTHETIC, 1200, 2)

# ── De-normalize back to original scale ──
X_synthetic = (X_synthetic_normalized + 1.0) / 2.0 * data_range + data_min

# ── Split back into FHR and UC ──
X_fhr_synthetic = X_synthetic[:, :, 0:1]  # (N, 1200, 1)
X_uc_synthetic  = X_synthetic[:, :, 1:2]   # (N, 1200, 1)

print(f"\nSynthetic FHR shape: {X_fhr_synthetic.shape}")
print(f"Synthetic UC  shape: {X_uc_synthetic.shape}")

# ── Statistical Validation ──
print(f"\n{'='*50}")
print(f"Statistical Comparison: Real vs Synthetic")
print(f"{'='*50}")
for ch, name in enumerate(['FHR', 'UC']):
    real_mean = X_patho_stacked[:, :, ch].mean()
    real_std  = X_patho_stacked[:, :, ch].std()
    syn_mean  = X_synthetic[:, :, ch].mean()
    syn_std   = X_synthetic[:, :, ch].std()
    print(f"  {name} — Real: mean={real_mean:.4f} std={real_std:.4f} | Synth: mean={syn_mean:.4f} std={syn_std:.4f}")

# ── Save synthetic arrays ──
os.makedirs("Datasets/synthetic", exist_ok=True)
np.save("Datasets/synthetic/X_fhr_synthetic.npy", X_fhr_synthetic)
np.save("Datasets/synthetic/X_uc_synthetic.npy", X_uc_synthetic)

# ── Save generator weights ──
generator.save("Code/models/generator_v4.keras")

# ── Save normalization params for reproducibility ──
np.save("Datasets/synthetic/data_min.npy", data_min)
np.save("Datasets/synthetic/data_max.npy", data_max)

print(f"\n✓ Saved synthetic data to Datasets/synthetic/")
print(f"✓ Saved generator weights to Code/models/generator_v4.keras")

---
## 7. Final Visual Comparison (Publication Quality)

Generate a clean figure comparing 5 real and 5 synthetic traces — suitable for your mid-semester presentation or paper.

In [None]:
n_show = 5
fig, axes = plt.subplots(n_show, 2, figsize=(16, 2.5 * n_show))
fig.suptitle('TimeGAN Output: Real vs Synthetic Pathological CTG Traces',
             fontsize=15, fontweight='bold', y=1.01)

for i in range(n_show):
    real_idx = np.random.randint(0, len(X_fhr_patho))
    syn_idx  = np.random.randint(0, len(X_fhr_synthetic))

    # Real
    axes[i, 0].plot(X_fhr_patho[real_idx, :, 0], color='crimson', linewidth=0.6, label='FHR')
    ax2 = axes[i, 0].twinx()
    ax2.plot(X_uc_patho[real_idx, :, 0], color='steelblue', linewidth=0.6, alpha=0.6, label='UC')
    ax2.set_ylabel('UC', color='steelblue', fontsize=8)
    axes[i, 0].set_ylabel('FHR', color='crimson', fontsize=8)

    # Synthetic
    axes[i, 1].plot(X_fhr_synthetic[syn_idx, :, 0], color='crimson', linewidth=0.6, label='FHR')
    ax2s = axes[i, 1].twinx()
    ax2s.plot(X_uc_synthetic[syn_idx, :, 0], color='steelblue', linewidth=0.6, alpha=0.6, label='UC')
    ax2s.set_ylabel('UC', color='steelblue', fontsize=8)
    axes[i, 1].set_ylabel('FHR', color='crimson', fontsize=8)

axes[0, 0].set_title('REAL Pathological', fontsize=12, fontweight='bold')
axes[0, 1].set_title('SYNTHETIC (TimeGAN)', fontsize=12, fontweight='bold')
axes[-1, 0].set_xlabel('Timestep (1 Hz, 20 min)')
axes[-1, 1].set_xlabel('Timestep (1 Hz, 20 min)')

plt.tight_layout()
plt.savefig('Code/models/timegan_final_comparison.png', dpi=200, bbox_inches='tight')
plt.show()
print("✓ Publication-quality comparison saved to Code/models/timegan_final_comparison.png")

---
## 8. Integration Stub: V4.0 Ensemble Training

The following code shows exactly how to inject the synthetic data into the existing training pipeline. This replaces SMOTE in the K-Fold loop inside `train_diverse_ensemble.py`.

In [None]:
# ═══════════════════════════════════════════════════════════════════
# Integration Example: Replace SMOTE with TimeGAN Synthetic Data
# Paste this logic into the K-Fold loop of train_diverse_ensemble.py
# ═══════════════════════════════════════════════════════════════════

def load_and_augment_with_timegan(X_fhr_train, X_uc_train, X_tab_train, y_train):
    """
    Replaces SMOTE by injecting TimeGAN-generated synthetic pathological
    traces into the training fold.

    Returns augmented arrays with balanced classes.
    """
    # Load synthetic data
    X_fhr_syn = np.load("Datasets/synthetic/X_fhr_synthetic.npy")
    X_uc_syn  = np.load("Datasets/synthetic/X_uc_synthetic.npy")

    # Calculate how many synthetic samples we need to balance
    n_positive = int(y_train.sum())
    n_negative = len(y_train) - n_positive
    n_needed   = n_negative - n_positive  # Number to add to balance

    # Sample from synthetic pool (with replacement if needed)
    syn_indices = np.random.choice(len(X_fhr_syn), size=min(n_needed, len(X_fhr_syn)), replace=False)

    # Create synthetic tabular features by sampling from existing pathological
    patho_tab_idx = np.where(y_train == 1)[0]
    syn_tab_indices = np.random.choice(patho_tab_idx, size=len(syn_indices), replace=True)
    X_tab_syn = X_tab_train[syn_tab_indices]

    # Concatenate
    X_fhr_aug = np.concatenate([X_fhr_train, X_fhr_syn[syn_indices]], axis=0)
    X_uc_aug  = np.concatenate([X_uc_train, X_uc_syn[syn_indices]], axis=0)
    X_tab_aug = np.concatenate([X_tab_train, X_tab_syn], axis=0)
    y_aug     = np.concatenate([y_train, np.ones(len(syn_indices))], axis=0)

    # Shuffle
    perm = np.random.permutation(len(y_aug))
    X_fhr_aug = X_fhr_aug[perm]
    X_uc_aug  = X_uc_aug[perm]
    X_tab_aug = X_tab_aug[perm]
    y_aug     = y_aug[perm]

    print(f"  TimeGAN Aug: {n_positive} → {int(y_aug.sum())} positives / {len(y_aug)} total ({y_aug.mean()*100:.1f}%)")
    return X_fhr_aug, X_uc_aug, X_tab_aug, y_aug

print("✓ Integration function defined. Ready for V4.0 ensemble training.")
print("  Usage: Replace the SMOTE block in train.py/train_diverse_ensemble.py")
print("  with a call to load_and_augment_with_timegan().")

---
## 9. Push Results to GitHub

In [None]:
!git add Datasets/synthetic/ Code/models/generator_v4.keras Code/models/gan_*.png Code/models/timegan_*.png
!git commit -m "feat(v4.0): TimeGAN synthetic CTG generation — {N_SYNTHETIC} pathological traces"
!git push origin feat/v4.0-timegan
print("\n✓ All results pushed to feat/v4.0-timegan branch!")