# End-to-End DFC-TCN Model (FC Generation + Temporal Convolutional Network)
This notebook implements the **trainable end-to-end pipeline** described in the paper:
1) **Windowed ROI time series** (precomputed)  
2) **FC Generation layer**: CNN-based window feature extraction + bilinear pooling + upper-triangle vectorization  
3) **Temporal model**: Temporal Convolutional Network (TCN) over window-level DFC features  
Unlike preprocessing-only pipelines, the **CNN used inside FC Generation is trained jointly**
with the downstream TCN classifier.
## Inputs
- `data/processed/windows_manifest.csv`
- Window tensors saved by the windowing notebook:
  - `data/processed/windows/train/<SITE>/<SUBJECT_ID>.npy`
  - `data/processed/windows/val/<SITE>/<SUBJECT_ID>.npy`
Each `.npy` file stores a tensor of shape `(T, N, L)`:
- `T` = number of non-overlapping windows
- `N = 116` AAL ROIs
- `L = 20` timepoints per window
## Outputs
- Trained model checkpoint (best on validation):
  - `models/dfc_tcn_best.keras`
- Training history and evaluation outputs:
  - `results/history.csv`
  - `results/val_predictions.csv`
  - optional: `results/metrics.json`, plots
## Note on evaluation
The official ADHD-200 test set has `DX="withheld"`, so quantitative evaluation is performed
using the labeled training subjects via a train/validation split.

## 1. Imports and Configuration

In [1]:
import tensorflow as tf
import numpy as np

# Create trivial dataset
simple_data = np.random.rand(10, 5).astype(np.float32)
simple_labels = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1], dtype=np.int32)

ds_test = tf.data.Dataset.from_tensor_slices((simple_data, simple_labels))
ds_test = ds_test.batch(2)

print("Testing basic TF iteration...")
for x, y in ds_test.take(1):
    print("✅ Basic TF works! Shape:", x.shape, y.shape)
    break

  if not hasattr(np, "object"):


Testing basic TF iteration...
✅ Basic TF works! Shape: (2, 5) (2,)


In [4]:
import os
import random
from pathlib import Path
from dataclasses import dataclass
from typing import Dict, Tuple, Optional, List

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers, Model

# ---- Reproducibility ----
SEED = 42
os.environ["PYTHONHASHSEED"] = str(SEED)
random.seed(SEED)
np.random.seed(SEED)
tf.random.set_seed(SEED)

# ---- Paths ----
PROJECT_ROOT = Path("..").resolve()
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"
MODELS_DIR = PROJECT_ROOT / "models"
RESULTS_DIR = PROJECT_ROOT / "results"
MODELS_DIR.mkdir(parents=True, exist_ok=True)
RESULTS_DIR.mkdir(parents=True, exist_ok=True)

WINDOWS_MANIFEST = DATA_PROCESSED / "windows_manifest.csv"

# ---- Data constants (paper-faithful) ----
N_ROIS = 116
L_WIN = 20

# ---- Training hyperparams (adjust later during experiments) ----
BATCH_SIZE = 16
EPOCHS = 100
LEARNING_RATE = 1e-3

# ---- Model hyperparams (placeholders; refine when implementing model) ----
DROPOUT = 0.3

# CNN window encoder (example defaults from your earlier draft)
CNN_FILTERS_1 = 4
CNN_FILTERS_2 = 2
CNN_FILTERS_3 = 1
CNN_KERNEL_1 = 3
CNN_KERNEL_2 = 3
CNN_KERNEL_3 = 1

# TCN (placeholder; refine later)
TCN_CHANNELS = 64
TCN_KERNEL_SIZE = 3
TCN_DILATIONS = [1, 2, 4, 8]  # common TCN pattern

print("PROJECT_ROOT:", PROJECT_ROOT)
print("WINDOWS_MANIFEST exists:", WINDOWS_MANIFEST.exists(), "->", WINDOWS_MANIFEST)
print("TensorFlow:", tf.__version__)


PROJECT_ROOT: /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication
WINDOWS_MANIFEST exists: True -> /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/data/processed/windows_manifest.csv
TensorFlow: 2.20.0


## 2. Load Window Manifest

In [5]:
df = pd.read_csv(WINDOWS_MANIFEST)
required_cols = {"split", "site", "subject_id", "label", "windows_path", "T", "N", "L"}
missing = required_cols - set(df.columns)
if missing:
    raise ValueError(f"windows_manifest.csv is missing columns: {missing}")

# Basic checks
print("Rows:", len(df))
print("Splits:", df["split"].value_counts().to_dict())
print("Sites :", df["site"].value_counts().to_dict())
print("Labels:", df["label"].value_counts().to_dict())

# Enforce expected ROI/window sizes (paper-faithful)
bad_shape = df[(df["N"] != N_ROIS) | (df["L"] != L_WIN)]
if len(bad_shape) > 0:
    print("[WARN] Found rows with unexpected N or L. Example:")
    display(bad_shape.head())

# Split
df_train = df[df["split"] == "train"].copy()
df_val   = df[df["split"] == "val"].copy()

# Ensure no subject overlap between train/val
overlap = set(df_train["subject_id"]).intersection(set(df_val["subject_id"]))
if overlap:
    raise ValueError(f"Train/val subject_id overlap detected: {len(overlap)} subjects")

# Compute T_max for padding (needed because different sites can have different T)
T_max_train = int(df_train["T"].max())
T_max_val   = int(df_val["T"].max())
T_MAX = max(T_max_train, T_max_val)

print("T_max(train):", T_max_train)
print("T_max(val)  :", T_max_val)
print("T_MAX used  :", T_MAX)

# Show distribution by site + label (useful for report)
print("\nTrain site x label:\n", pd.crosstab(df_train["site"], df_train["label"]))
print("\nVal site x label:\n", pd.crosstab(df_val["site"], df_val["label"]))

# Check that paths exist
def resolve_path(rel_path: str) -> Path:
    p = Path(rel_path)
    return p if p.is_absolute() else (PROJECT_ROOT / p).resolve()

missing_files = []
for rel in df["windows_path"].head(200):  # spot-check first 200 to keep it fast
    if not resolve_path(rel).exists():
        missing_files.append(rel)

if missing_files:
    print("[WARN] Some windows files missing (spot-check):", missing_files[:5])
else:
    print("Windows path spot-check: OK")

Rows: 510
Splits: {'train': 407, 'val': 103}
Sites : {'NYU': 216, 'Peking_1': 85, 'KKI': 83, 'OHSU': 78, 'NeuroIMAGE': 48}
Labels: {0: 285, 1: 225}
T_max(train): 12
T_max(val)  : 12
T_MAX used  : 12

Train site x label:
 label        0   1
site              
KKI         50  18
NYU         78  92
NeuroIMAGE  18  19
OHSU        32  31
Peking_1    50  19

Val site x label:
 label        0   1
site              
KKI         11   4
NYU         20  26
NeuroIMAGE   5   6
OHSU        10   5
Peking_1    11   5
Windows path spot-check: OK


## 3. Input Pipeline

In [10]:
def make_dataset_preloaded(df_split, project_root: Path, T_MAX: int, n_rois: int, l_win: int,
                           batch_size: int, training: bool) -> tf.data.Dataset:
    """
    Load all windows into memory and create dataset.
    Shuffles in numpy (not TensorFlow) to avoid macOS hanging issue.
    """
    print(f"Preloading {len(df_split)} subjects into memory...")
    
    all_X = []
    all_y = []
    
    for idx, row in df_split.iterrows():
        path = project_root / row["windows_path"]
        windows = np.load(path).astype(np.float32)  # (T, N, L)
        windows = windows[..., np.newaxis]  # (T, N, L, 1)
        
        # Pad to T_MAX
        T = windows.shape[0]
        if T < T_MAX:
            pad = np.zeros((T_MAX - T, n_rois, l_win, 1), dtype=np.float32)
            windows = np.concatenate([windows, pad], axis=0)
        else:
            windows = windows[:T_MAX]
        
        all_X.append(windows)
        all_y.append(row["label"])
    
    X_array = np.array(all_X, dtype=np.float32)
    y_array = np.array(all_y, dtype=np.int32)
    
    print(f"Loaded: X={X_array.shape}, y={y_array.shape}")
    
    # SHUFFLE IN NUMPY (not TensorFlow) to avoid hanging
    if training:
        print("Shuffling in numpy...")
        indices = np.arange(len(X_array))
        np.random.shuffle(indices)
        X_array = X_array[indices]
        y_array = y_array[indices]
    
    # Create dataset from shuffled arrays
    ds = tf.data.Dataset.from_tensor_slices((X_array, y_array))
    ds = ds.batch(batch_size, drop_remainder=False)
    ds = ds.prefetch(2)  # Small prefetch is fine
    
    return ds

In [11]:
print("Creating training dataset...")
train_ds = make_dataset_preloaded(df_train, PROJECT_ROOT, T_MAX, N_ROIS, L_WIN, BATCH_SIZE, training=True)

print("\nCreating validation dataset...")
val_ds = make_dataset_preloaded(df_val, PROJECT_ROOT, T_MAX, N_ROIS, L_WIN, BATCH_SIZE, training=False)

# Test both
print("\nTesting training dataset...")
for xb, yb in train_ds.take(1):
    print("✅ Train DS works! batch:", xb.shape, yb.shape, "labels:", yb.numpy()[:5])
    break

print("\nTesting validation dataset...")
for xb, yb in val_ds.take(1):
    print("✅ Val DS works! batch:", xb.shape, yb.shape)
    break

Creating training dataset...
Preloading 407 subjects into memory...
Loaded: X=(407, 12, 116, 20, 1), y=(407,)
Shuffling in numpy...

Creating validation dataset...
Preloading 103 subjects into memory...
Loaded: X=(103, 12, 116, 20, 1), y=(103,)

Testing training dataset...
✅ Train DS works! batch: (16, 12, 116, 20, 1) (16,) labels: [1 0 1 1 0]

Testing validation dataset...
✅ Val DS works! batch: (16, 12, 116, 20, 1) (16,)


## 4. Model Arhitecture

### 4.1 FC Generation
#### 4.1.1 CNN Window Encouder

In [12]:
random.seed(42)
np.random.seed(42)
tf.random.set_seed(42)

def tdnet_cnn_block(N=116, L=20, dropout=0.3):
    inputs = layers.Input(shape=(N, L, 1))

    x = layers.Conv2D(filters=CNN_FILTERS_1, kernel_size=(1, CNN_KERNEL_1), padding='valid', use_bias=True)(inputs)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.Dropout(dropout)(x)

    x = layers.Conv2D(filters=CNN_FILTERS_2, kernel_size=(1, CNN_KERNEL_2), padding='valid', use_bias=True)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.Dropout(dropout)(x)

    x = layers.Conv2D(filters=CNN_FILTERS_3, kernel_size=(1, CNN_KERNEL_3), padding='valid', use_bias=True)(x)
    x = layers.BatchNormalization()(x)
    x = layers.ReLU()(x)
    x = layers.Dropout(dropout)(x)

    return Model(inputs, x, name="TDNet_CNN_Block")

### 4.1.2 CNN + Bilinear Layer + FC

In [13]:
def lower_triangle_vectorize(A: tf.Tensor) -> tf.Tensor:
    """
    A: (B, N, N) symmetric FC matrices
    Returns: (B, N*(N-1)/2) lower-triangular (excluding diagonal)
    """
    N = tf.shape(A)[-1]
    # boolean mask for lower triangle excluding diagonal
    mask = tf.linalg.band_part(tf.ones((N, N), dtype=tf.int32), -1, 0)
    mask = tf.cast(mask, tf.bool)
    mask = tf.logical_and(mask, tf.logical_not(tf.eye(N, dtype=tf.bool)))  # exclude diag
    # flatten last 2 dims then boolean mask
    A_flat = tf.reshape(A, (tf.shape(A)[0], -1))           # (B, N*N)
    mask_flat = tf.reshape(mask, (-1,))                    # (N*N,)
    return tf.boolean_mask(A_flat, mask_flat, axis=1)      # (B, N*(N-1)/2)

class FCGenerationLayer(layers.Layer):
    def __init__(self, cnn_model: tf.keras.Model, n_rois=116, **kwargs):
        super().__init__(**kwargs)
        self.cnn_model = cnn_model
        self.n_rois = n_rois

    def call(self, window_batch: tf.Tensor, training=None) -> tf.Tensor:
        """
        window_batch: (B, N, L, 1) where N=116, L=20
        Returns: (B, 6670) vectorized dynamic FC feature
        """
        # CNN features: (B, N, d, 1) if you used filters=1 final conv (as in paper)
        feat = self.cnn_model(window_batch, training=training)

        # squeeze last channel -> (B, N, d)
        if feat.shape.rank == 4 and feat.shape[-1] == 1:
            feat = tf.squeeze(feat, axis=-1)
        elif feat.shape.rank == 4:
            B = tf.shape(feat)[0]
            N = tf.shape(feat)[1]
            feat = tf.reshape(feat, (B, N, -1))

        # Bilinear FC: A = feat @ feat^T -> (B, N, N)
        A = tf.matmul(feat, feat, transpose_b=True)

        # Flatten (lower triangle) -> (B, 6670)
        f = lower_triangle_vectorize(A)

        return f
    
    def compute_output_shape(self, input_shape):
        # input_shape: (batch, N, L, C)
        n = input_shape[1]
        if n is None:
            out_dim = self.n_rois * (self.n_rois - 1) // 2
            return (input_shape[0], out_dim)
        out_dim = n * (n - 1) // 2
        return (input_shape[0], out_dim)

    def compute_output_spec(self, input_spec, **kwargs):
        # Helps Keras build graphs reliably (esp. Keras 3)
        out_shape = self.compute_output_shape(input_spec.shape)
        return tf.TensorSpec(shape=out_shape, dtype=self.compute_dtype)

### 4.2 Temporal Convolutional Network (TCN) Residual Block

In [14]:
def tcn_residual_block(x, filters=3, kernel_size=3, dilation=1, dropout=0.3, name="res"):
    shortcut = x

    # 1st dilated causal conv
    y = layers.Conv1D(filters, kernel_size, padding="causal",
                      dilation_rate=dilation, use_bias=True,
                      name=f"{name}_conv1")(x)
    y = layers.BatchNormalization(name=f"{name}_bn1")(y)
    y = layers.ReLU(name=f"{name}_relu1")(y)
    y = layers.Dropout(dropout, name=f"{name}_drop1")(y)

    # 2nd dilated causal conv (same dilation)
    y = layers.Conv1D(filters, kernel_size, padding="causal",
                      dilation_rate=dilation, use_bias=True,
                      name=f"{name}_conv2")(y)
    y = layers.BatchNormalization(name=f"{name}_bn2")(y)
    y = layers.ReLU(name=f"{name}_relu2")(y)
    y = layers.Dropout(dropout, name=f"{name}_drop2")(y)

    # W3 * H̃q  (Eq. 3): 1×1 conv on residual branch
    y = layers.Conv1D(filters, 1, padding="same", use_bias=True, name=f"{name}_w3")(y)

    # Match channels for residual add if needed
    if shortcut.shape[-1] != filters:
        shortcut = layers.Conv1D(filters, 1, padding="same", use_bias=True,
                                 name=f"{name}_proj")(shortcut)

    out = layers.Add(name=f"{name}_add")([shortcut, y])
    return out

### 4.3 TDNET End 2 End

In [15]:
def build_tdnet_end2end(
        T,
        N=116,
        L=20,
        dropout=0.3,
        tcn_filters=3,
        tcn_kernel_size=3,
):
    """
    End-to-end TDNet.
    Input : (T, N, L, 1)
    Output: (2,) softmax
    """
    inputs = layers.Input(shape=(T, N, L, 1), name="windows")

    cnn = tdnet_cnn_block(N=N, L=L, dropout=dropout)
    fc_gen = FCGenerationLayer(cnn_model=cnn, n_rois=N, name="fc_generation")

    H = layers.TimeDistributed(fc_gen, name="H_seq")(inputs)  # (B, T, 6670)

    x = H
    x = tcn_residual_block(x, filters=tcn_filters, kernel_size=tcn_kernel_size, dilation=1, dropout=dropout, name="tcn_b1")
    x = tcn_residual_block(x, filters=tcn_filters, kernel_size=tcn_kernel_size, dilation=2, dropout=dropout, name="tcn_b2")
    x = tcn_residual_block(x, filters=tcn_filters, kernel_size=tcn_kernel_size, dilation=4, dropout=dropout, name="tcn_b3")
    HQ = x  # shape: (B, T, D) where D=tcn_filters

    P = layers.Conv1D(1, 1, padding="same", activation="tanh", name="time_weight_conv")(HQ)  # (B,T,1)

    weighted = layers.Multiply(name="elemwise_mul_PH")([P, H])  # (B,T,6670)
    H_fused = layers.Lambda(lambda z: tf.reduce_sum(z, axis=1), name="sum_over_time")(weighted)  # (B,6670)

    x = layers.Dense(512, activation="relu", name="fc1")(H_fused)
    x = layers.Dense(128, activation="relu", name="fc2")(x)
    out = layers.Dense(2, activation="softmax", name="fc3")(x)

    return Model(inputs, out, name=f"TDNet_End2End_T{T}")

## 5. Training Setup

In [16]:
model = build_tdnet_end2end(
    T=T_MAX,
    N=N_ROIS,
    L=L_WIN,
    dropout=DROPOUT,
    tcn_filters=3,
    tcn_kernel_size=3,
)
model.summary()                           

## 6. Training

In [17]:
opt = tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE)

model.compile(
    optimizer=opt,
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="acc")]
)

callbacks = [
    tf.keras.callbacks.ModelCheckpoint(
        filepath=str(MODELS_DIR / f"tdnet_T{T_MAX}_best.weights.h5"),
        save_weights_only=True,
        monitor="val_acc",
        mode="max",
        save_best_only=True,
        verbose=1
    ),
    tf.keras.callbacks.EarlyStopping(
        monitor="val_acc",
        mode="max",
        patience=15,
        restore_best_weights=True,
        verbose=1
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor="val_loss",
        factor=0.5,
        patience=5,
        min_lr=1e-6,
        verbose=1
    )
]

n0 = (df_train["label"] == 0).sum()
n1 = (df_train["label"] == 1).sum()
class_weight = {0: (n0+n1)/(2*n0), 1: (n0+n1)/(2*n1)}

history = model.fit(
    train_ds,
    validation_data=val_ds,
    epochs=EPOCHS,
    callbacks=callbacks,
    class_weight=class_weight,
)


Epoch 1/100
[1m 6/26[0m [32m━━━━[0m[37m━━━━━━━━━━━━━━━━[0m [1m0s[0m 24ms/step - acc: 0.5193 - loss: 93.2694

2026-01-13 20:45:12.557020: I external/local_xla/xla/service/service.cc:163] XLA service 0xc62a32200 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2026-01-13 20:45:12.557054: I external/local_xla/xla/service/service.cc:171]   StreamExecutor device (0): Host, Default Version
I0000 00:00:1768329912.574373 6945560 device_compiler.h:196] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.


[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 24ms/step - acc: 0.5450 - loss: 79.7987
Epoch 1: val_acc improved from None to 0.44660, saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_best.weights.h5

Epoch 1: finished saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_best.weights.h5
[1m26/26[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 58ms/step - acc: 0.5577 - loss: 54.1073 - val_acc: 0.4466 - val_loss: 30.3627 - learning_rate: 0.0010
Epoch 2/100
[1m25/26[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m0s[0m 22ms/step - acc: 0.5300 - loss: 25.4907
Epoch 2: val_acc improved from 0.44660 to 0.55340, saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_best.weights.h5

Epoch 2: finished saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/

## 7. Validation Evaluation

## 8. Save Model and Results

In [18]:
hist_df = pd.DataFrame(history.history)
hist_path = RESULTS_DIR / f"history_tdnet_T{T_MAX}.csv"
hist_df.to_csv(hist_path, index=False)
print("Saved training history:", hist_path)

Saved training history: /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/results/history_tdnet_T12.csv


In [19]:
# Majority-class baseline on validation
val_counts = df_val["label"].value_counts()
baseline_acc = val_counts.max() / len(df_val)

print("Val label counts:\n", val_counts)
print("Majority baseline acc:", round(baseline_acc, 4))


Val label counts:
 label
0    57
1    46
Name: count, dtype: int64
Majority baseline acc: 0.5534


In [20]:
model.load_weights(
    str(MODELS_DIR / f"tdnet_T{T_MAX}_best.weights.h5")
)


In [22]:
val_ds_noshuf = make_dataset_preloaded(
    df_val,
    PROJECT_ROOT,
    T_MAX,
    N_ROIS,
    L_WIN,
    BATCH_SIZE,
    training=False
)

Preloading 103 subjects into memory...
Loaded: X=(103, 12, 116, 20, 1), y=(103,)


In [23]:
import numpy as np
import pandas as pd

probs = model.predict(val_ds_noshuf, verbose=0)
y_pred = probs.argmax(axis=1)

# Trim in case last batch padded
y_true = df_val["label"].to_numpy()
y_pred = y_pred[:len(y_true)]
sites  = df_val["site"].to_numpy()

results = pd.DataFrame({
    "site": sites,
    "y_true": y_true,
    "y_pred": y_pred
})


In [24]:
overall_acc = (results.y_true == results.y_pred).mean()
print("Overall val accuracy:", round(overall_acc, 4))


Overall val accuracy: 0.5922


In [25]:
per_site_acc = (
    results
    .groupby("site")
    .apply(lambda g: (g.y_true == g.y_pred).mean())
)

print("Per-site validation accuracy:")
print(per_site_acc)


Per-site validation accuracy:
site
KKI           0.733333
NYU           0.500000
NeuroIMAGE    0.545455
OHSU          0.733333
Peking_1      0.625000
dtype: float64


  .apply(lambda g: (g.y_true == g.y_pred).mean())


In [26]:
from sklearn.metrics import confusion_matrix

print("Confusion matrix (val):")
print(confusion_matrix(y_true, y_pred))


Confusion matrix (val):
[[52  5]
 [37  9]]


In [27]:
from sklearn.metrics import balanced_accuracy_score, f1_score, classification_report

# using y_true and y_pred you already computed
print("Balanced acc:", balanced_accuracy_score(y_true, y_pred))
print("F1 (ADHD=1):", f1_score(y_true, y_pred, pos_label=1))
print(classification_report(y_true, y_pred, digits=4))


Balanced acc: 0.5539664378337147
F1 (ADHD=1): 0.3
              precision    recall  f1-score   support

           0     0.5843    0.9123    0.7123        57
           1     0.6429    0.1957    0.3000        46

    accuracy                         0.5922       103
   macro avg     0.6136    0.5540    0.5062       103
weighted avg     0.6104    0.5922    0.5282       103



In [29]:
SEEDS = [42, 123, 456, 789, 2024]
results_by_seed = []

for seed in SEEDS:
    print(f"\n{'='*50}")
    print(f"Training with seed={seed}")
    print('='*50)
    
    # Set all seeds
    import random
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    
    # Recreate datasets
    train_ds = make_dataset_preloaded(df_train, PROJECT_ROOT, T_MAX, N_ROIS, L_WIN, BATCH_SIZE, training=True)
    val_ds = make_dataset_preloaded(df_val, PROJECT_ROOT, T_MAX, N_ROIS, L_WIN, BATCH_SIZE, training=False)
    
    # Build fresh model
    model = build_tdnet_end2end(T=T_MAX, N=N_ROIS, L=L_WIN, dropout=DROPOUT, tcn_filters=3, tcn_kernel_size=3)
    
    # Compile
    model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=LEARNING_RATE),
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="acc")]
    )
    
    # UNIQUE checkpoint file for each seed
    checkpoint_path = MODELS_DIR / f"tdnet_T{T_MAX}_seed{seed}.weights.h5"
    
    callbacks = [
        tf.keras.callbacks.ModelCheckpoint(
            filepath=str(checkpoint_path),
            save_weights_only=True,
            monitor="val_acc",
            mode="max",
            save_best_only=True,
            verbose=1
        ),
        tf.keras.callbacks.EarlyStopping(
            monitor="val_acc",
            mode="max",
            patience=15,
            restore_best_weights=True,
            verbose=1
        ),
        tf.keras.callbacks.ReduceLROnPlateau(
            monitor="val_loss",
            factor=0.5,
            patience=5,
            min_lr=1e-6,
            verbose=1
        )
    ]
    
    # Train
    history = model.fit(
        train_ds,
        validation_data=val_ds,
        epochs=EPOCHS,
        callbacks=callbacks,
        class_weight=class_weight,
        verbose=0
    )
    
    # Evaluate
    probs = model.predict(val_ds, verbose=0)
    y_pred = probs.argmax(axis=1)[:len(df_val)]
    y_true = df_val["label"].to_numpy()
    
    acc = (y_true == y_pred).mean()
    bal_acc = balanced_accuracy_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred, pos_label=1)
    
    results_by_seed.append({
        'seed': seed,
        'accuracy': acc,
        'balanced_acc': bal_acc,
        'f1_adhd': f1
    })
    
    print(f"✅ Seed {seed}: Acc={acc:.4f}, Balanced={bal_acc:.4f}, F1={f1:.4f}")

# Summary
results_df = pd.DataFrame(results_by_seed)
print("\n" + "="*50)
print("SUMMARY ACROSS SEEDS:")
print("="*50)
print(results_df)
print(f"\nMean Accuracy: {results_df['accuracy'].mean():.4f} ± {results_df['accuracy'].std():.4f}")
print(f"Best Accuracy: {results_df['accuracy'].max():.4f} (seed {results_df.loc[results_df['accuracy'].idxmax(), 'seed']})")


Training with seed=42
Preloading 407 subjects into memory...
Loaded: X=(407, 12, 116, 20, 1), y=(407,)
Shuffling in numpy...
Preloading 103 subjects into memory...
Loaded: X=(103, 12, 116, 20, 1), y=(103,)

Epoch 1: val_acc improved from None to 0.44660, saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_seed42.weights.h5

Epoch 1: finished saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_seed42.weights.h5

Epoch 2: val_acc improved from 0.44660 to 0.55340, saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_seed42.weights.h5

Epoch 2: finished saving model to /Users/mariaborca/Documents/AI_2023-2026/Semestrul_5/KBS/Report_3/adhd-tcn-replication/models/tdnet_T12_seed42.weights.h5

Epoch 3: val_acc did not improve from 0.55340

Epoch 4: val_acc did not improve from 0.55340

Epoch 5

In [30]:
from sklearn.metrics import confusion_matrix, classification_report, balanced_accuracy_score, f1_score
import pandas as pd
import numpy as np

# Load the best model (seed 123)
best_seed = 123
model.load_weights(str(MODELS_DIR / f"tdnet_T{T_MAX}_seed{best_seed}.weights.h5"))

# Predict on validation set
probs = model.predict(val_ds, verbose=0)
y_pred = probs.argmax(axis=1)[:len(df_val)]
y_true = df_val["label"].to_numpy()
sites = df_val["site"].to_numpy()

print(f"{'='*60}")
print(f"BEST MODEL (Seed {best_seed}) - DETAILED ANALYSIS")
print('='*60)

# Overall Metrics
print(f"\n{'='*60}")
print("OVERALL PERFORMANCE")
print('='*60)
print(f"Accuracy: {(y_true == y_pred).mean():.4f}")
print(f"Balanced Accuracy: {balanced_accuracy_score(y_true, y_pred):.4f}")
print(f"F1 (ADHD): {f1_score(y_true, y_pred, pos_label=1):.4f}")

# Confusion Matrix
print(f"\nConfusion Matrix:")
cm = confusion_matrix(y_true, y_pred)
print(cm)
print(f"\nInterpretation:")
print(f"  True Negatives (Control→Control): {cm[0,0]}")
print(f"  False Positives (Control→ADHD):   {cm[0,1]}")
print(f"  False Negatives (ADHD→Control):   {cm[1,0]}")
print(f"  True Positives (ADHD→ADHD):       {cm[1,1]}")

# Per-Class Metrics
print(f"\n{'='*60}")
print("PER-CLASS PERFORMANCE")
print('='*60)
print(classification_report(y_true, y_pred, 
                          target_names=['Control (0)', 'ADHD (1)'], 
                          digits=4))

# Class-specific metrics
control_acc = cm[0,0] / (cm[0,0] + cm[0,1])  # Sensitivity for Control
adhd_acc = cm[1,1] / (cm[1,0] + cm[1,1])      # Sensitivity for ADHD (Recall)
control_precision = cm[0,0] / (cm[0,0] + cm[1,0])
adhd_precision = cm[1,1] / (cm[0,1] + cm[1,1])

print(f"\nClass-Specific Accuracies:")
print(f"  Control: {control_acc:.4f} (correctly identified {cm[0,0]}/{cm[0,0]+cm[0,1]} controls)")
print(f"  ADHD:    {adhd_acc:.4f} (correctly identified {cm[1,1]}/{cm[1,0]+cm[1,1]} ADHD)")

# Per-Site Analysis
print(f"\n{'='*60}")
print("PER-SITE PERFORMANCE")
print('='*60)

results_df = pd.DataFrame({
    "site": sites,
    "y_true": y_true,
    "y_pred": y_pred
})

# Accuracy per site
print("\nAccuracy by Site:")
for site in sorted(results_df["site"].unique()):
    site_data = results_df[results_df["site"] == site]
    acc = (site_data["y_true"] == site_data["y_pred"]).mean()
    n = len(site_data)
    n_control = (site_data["y_true"] == 0).sum()
    n_adhd = (site_data["y_true"] == 1).sum()
    print(f"  {site:12s}: {acc:.4f} (n={n:3d}, Control={n_control:2d}, ADHD={n_adhd:2d})")

# Per-site, per-class breakdown
print(f"\n{'='*60}")
print("PER-SITE, PER-CLASS BREAKDOWN")
print('='*60)

for site in sorted(results_df["site"].unique()):
    site_data = results_df[results_df["site"] == site]
    print(f"\n{site}:")
    
    # Control performance
    control_site = site_data[site_data["y_true"] == 0]
    if len(control_site) > 0:
        control_correct = (control_site["y_pred"] == 0).sum()
        print(f"  Control: {control_correct}/{len(control_site)} correct ({control_correct/len(control_site):.2%})")
    
    # ADHD performance
    adhd_site = site_data[site_data["y_true"] == 1]
    if len(adhd_site) > 0:
        adhd_correct = (adhd_site["y_pred"] == 1).sum()
        print(f"  ADHD:    {adhd_correct}/{len(adhd_site)} correct ({adhd_correct/len(adhd_site):.2%})")

# Prediction confidence analysis
print(f"\n{'='*60}")
print("PREDICTION CONFIDENCE ANALYSIS")
print('='*60)

prob_control = probs[:len(df_val), 0]  # Probability of Control
prob_adhd = probs[:len(df_val), 1]     # Probability of ADHD
confidence = np.max(probs[:len(df_val)], axis=1)  # Max probability

print(f"\nMean confidence: {confidence.mean():.4f}")
print(f"Confidence by correctness:")
correct_mask = (y_true == y_pred)
print(f"  Correct predictions: {confidence[correct_mask].mean():.4f}")
print(f"  Wrong predictions:   {confidence[~correct_mask].mean():.4f}")

print(f"\nConfidence by true class:")
print(f"  Control (true): {confidence[y_true==0].mean():.4f}")
print(f"  ADHD (true):    {confidence[y_true==1].mean():.4f}")

BEST MODEL (Seed 123) - DETAILED ANALYSIS

OVERALL PERFORMANCE
Accuracy: 0.6505
Balanced Accuracy: 0.6590
F1 (ADHD): 0.6538

Confusion Matrix:
[[33 24]
 [12 34]]

Interpretation:
  True Negatives (Control→Control): 33
  False Positives (Control→ADHD):   24
  False Negatives (ADHD→Control):   12
  True Positives (ADHD→ADHD):       34

PER-CLASS PERFORMANCE
              precision    recall  f1-score   support

 Control (0)     0.7333    0.5789    0.6471        57
    ADHD (1)     0.5862    0.7391    0.6538        46

    accuracy                         0.6505       103
   macro avg     0.6598    0.6590    0.6505       103
weighted avg     0.6676    0.6505    0.6501       103


Class-Specific Accuracies:
  Control: 0.5789 (correctly identified 33/57 controls)
  ADHD:    0.7391 (correctly identified 34/46 ADHD)

PER-SITE PERFORMANCE

Accuracy by Site:
  KKI         : 0.6667 (n= 15, Control=11, ADHD= 4)
  NYU         : 0.5652 (n= 46, Control=20, ADHD=26)
  NeuroIMAGE  : 0.7273 (n= 11, Con