# üß† Research-Grade Neuroimaging with Machine Learning & Deep Learning

**Instructor:** Ayaz Ali  
**Platform:** Google Colab  
**Domain:** Structural MRI  
**Level:** Research / PhD-ready

---

## Workshop Objectives
- Understand neuroimaging as a **population-level ML problem**
- Build a **proper ML baseline**
- Train a **3D CNN on volumetric MRI**
- Follow **real research best practices**
- Understand how this maps to **ADNI / OpenNeuro**


## 1. Neuroimaging as a Research Problem

Key principles:

- **One subject = one sample**
- Each subject has a **3D brain volume**
- Labels come from **clinical diagnosis**
- Models learn **population-level patterns**

üö´ We never train ML models on a single brain.


In [1]:
# Install required libraries
!pip install nibabel nilearn torch torchvision scikit-learn matplotlib --quiet

[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m10.6/10.6 MB[0m [31m72.8 MB/s[0m eta [36m0:00:00[0m
[?25h

## 2. Dataset Design (Research Standard)

For teaching, we **simulate** a dataset:

- 100 subjects
- Each subject has a 3D MRI (32√ó32√ó32)
- Binary classification:
  - 0 ‚Üí Healthy
  - 1 ‚Üí Disease

‚ö†Ô∏è Pipeline is identical to real MRI datasets.

In [2]:
import numpy as np

N_SUBJECTS = 100
IMG_SHAPE = (32, 32, 32)

# Simulated MRI volumes
X = np.random.rand(N_SUBJECTS, *IMG_SHAPE)

# Balanced binary labels
y = np.array([0]*50 + [1]*50)

print("MRI dataset shape:", X.shape)
print("Label distribution:", np.bincount(y))


MRI dataset shape: (100, 32, 32, 32)
Label distribution: [50 50]


## 3. Train / Validation Split

This is **mandatory in real research**.
No evaluation on training data.


In [3]:
from sklearn.model_selection import train_test_split

X_train, X_val, y_train, y_val = train_test_split(
    X,
    y,
    test_size=0.2,
    stratify=y,
    random_state=42
)

print("Train subjects:", X_train.shape[0])
print("Validation subjects:", X_val.shape[0])


Train subjects: 80
Validation subjects: 20


## 4. Classical Machine Learning Baseline

Why we do this:
- Reviewers expect a **baseline**
- Shows why deep learning is needed
- Establishes a reference point


In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

# Flatten 3D MRI ‚Üí feature vector
X_train_flat = X_train.reshape(len(X_train), -1)
X_val_flat = X_val.reshape(len(X_val), -1)

clf = LogisticRegression(max_iter=2000)
clf.fit(X_train_flat, y_train)

val_preds = clf.predict(X_val_flat)
val_acc = accuracy_score(y_val, val_preds)

print("Validation Accuracy (ML Baseline):", val_acc)


Validation Accuracy (ML Baseline): 0.5


## 5. Why Deep Learning?

Flattening destroys:
- Spatial locality
- Anatomical structure

CNNs learn:
- Local patterns
- Hierarchical brain features

Next: **3D CNN on volumetric MRI**


In [5]:
import torch
from torch.utils.data import Dataset, DataLoader

class MRIDataset(Dataset):
    def __init__(self, X, y):
        self.X = torch.tensor(X).float().unsqueeze(1)  # (N, 1, D, H, W)
        self.y = torch.tensor(y).long()

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return self.X[idx], self.y[idx]

train_ds = MRIDataset(X_train, y_train)
val_ds = MRIDataset(X_val, y_val)

train_loader = DataLoader(train_ds, batch_size=8, shuffle=True)
val_loader = DataLoader(val_ds, batch_size=8)


## 6. 3D Convolutional Neural Network

This is **true neuroimaging deep learning**:
- 3D convolutions
- Volumetric feature learning


In [6]:
import torch.nn as nn

class CNN3D(nn.Module):
    def __init__(self):
        super().__init__()
        self.features = nn.Sequential(
            nn.Conv3d(1, 8, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool3d(2),

            nn.Conv3d(8, 16, kernel_size=3),
            nn.ReLU(),
            nn.MaxPool3d(2)
        )

        self.classifier = nn.Linear(16 * 6 * 6 * 6, 2)

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        return self.classifier(x)

model = CNN3D()
print(model)


CNN3D(
  (features): Sequential(
    (0): Conv3d(1, 8, kernel_size=(3, 3, 3), stride=(1, 1, 1))
    (1): ReLU()
    (2): MaxPool3d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv3d(8, 16, kernel_size=(3, 3, 3), stride=(1, 1, 1))
    (4): ReLU()
    (5): MaxPool3d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (classifier): Linear(in_features=3456, out_features=2, bias=True)
)


## 7. Training Loop (Research Practice)

- GPU support
- Separate training & validation
- Cross-entropy loss


In [7]:
import torch.optim as optim

device = "cuda" if torch.cuda.is_available() else "cpu"
model.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

EPOCHS = 5

for epoch in range(EPOCHS):
    model.train()
    total_loss = 0

    for xb, yb in train_loader:
        xb, yb = xb.to(device), yb.to(device)

        optimizer.zero_grad()
        outputs = model(xb)
        loss = criterion(outputs, yb)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    print(f"Epoch {epoch+1}/{EPOCHS} | Train Loss: {total_loss:.3f}")


Epoch 1/5 | Train Loss: 7.244
Epoch 2/5 | Train Loss: 6.940
Epoch 3/5 | Train Loss: 6.925
Epoch 4/5 | Train Loss: 6.883
Epoch 5/5 | Train Loss: 6.855


## 8. Validation Evaluation

Evaluation is done on **unseen subjects**.


In [8]:
model.eval()
correct, total = 0, 0

with torch.no_grad():
    for xb, yb in val_loader:
        xb, yb = xb.to(device), yb.to(device)
        outputs = model(xb)
        preds = torch.argmax(outputs, dim=1)
        correct += (preds == yb).sum().item()
        total += yb.size(0)

print("Validation Accuracy (3D CNN):", correct / total)


Validation Accuracy (3D CNN): 0.5


## 9. Research Extensions (PhD-Level)

You can now extend this to:

- Real MRI loading (NiBabel + NIfTI)
- Intensity normalization
- Skull stripping
- Transfer learning
- Explainable AI (Grad-CAM)
- fMRI time-series (LSTM / Transformers)
- Graph Neural Networks (connectomes)

This notebook now follows **real neuroimaging research standards**.
