# 03 · CNN Baseline Model
**Purpose** This notebook builds a baseline convolutional neural network (CNN) for classifying lung nodules from the LUNA16 dataset. The aim is not to optimize performance but to establish a reference model that later experiments can improve upon.

Key steps include:
- Loading preprocessed image patches and labels
- Defining a simple CNN architecture using standard layers
- Training and evaluating the model on a train/validation split
- Reporting baseline accuracy and loss curves

This baseline provides a starting point to measure the impact of more advanced architectures, hyperparameter tuning, and the integration of clinical features.

In [1]:
!pip install --quiet fvcore iopath pytorchvideo

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.2/50.2 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.2/42.2 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.7/132.7 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.7/39.7 MB[0m [31m43.0 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[?25h  Building wheel for fvcore (setup.py) ... [?25l[?25hdone
  Building wheel for iopath (setup.py) ... [?25l[?25hdone
  Building wheel for pytorchvideo (setup.py) ... [?25l[?25hdone


In [None]:
import math
import torch
import random
import numpy as np
import pandas as pd
import torch.nn as nn
import torchmetrics as tm

from pathlib import Path
from tqdm.auto import tqdm
from torchvision.models.video import r3d_18
from torch.utils.data import Dataset, DataLoader
from torch.cuda.amp import autocast, GradScaler
from sklearn.model_selection import GroupShuffleSplit
from sklearn.metrics import precision_recall_fscore_support, roc_auc_score

In [6]:
PATCH_DATA = Path("/kaggle/input/patches/")
PATCH_DIR  = PATCH_DATA / "patches_64mm"
patch_df   = pd.read_csv(PATCH_DATA / "patch_index.csv")
profile_df = pd.read_csv(PATCH_DATA / "synthetic_profiles.csv")

This cell checks that the patch directory exists and contains the expected 1186 .npy patch files, then shows the first two rows of the patch index DataFrame as a quick preview to confirm the metadata has been loaded correctly.

In [7]:
assert PATCH_DIR.exists() and len(list(PATCH_DIR.glob("*.npy"))) == 1186
display(patch_df.head(2))

Unnamed: 0,patch_file,seriesuid,diam_mm,center_x,center_y,center_z
0,1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...,1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...,5.651471,-128.699421,-175.319272,-298.387506
1,1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...,1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...,4.224708,103.783651,-211.925149,-227.12125


Now we will build synthetic negative samples to balance the dataset. First, we collect all positive nodule centres per scan, then define a helper (random_bg_coord) that randomly picks a background location in the scan, ensuring it is at least a set distance from any positive nodule. For each patient (seriesuid), one such background coordinate is generated, recorded with label = 0, and stored in a new DataFrame (neg_df). The result is a table of negative examples (non-nodules) that can be combined with the positive samples for training.

In [8]:
centres = {}

for _, r in patch_df.iterrows():
    centres.setdefault(r.seriesuid, []).append(
        np.array([r.center_z, r.center_y, r.center_x])
    )

def random_bg_coord(scan_shape, pos_list, min_dist_vox=15):
    """sample a background centre at least min_dist_vox from all positives"""
    for _ in range(1000):
        z = random.randint(32, scan_shape[0]-32)
        y = random.randint(32, scan_shape[1]-32)
        x = random.randint(32, scan_shape[2]-32)
        c = np.array([z, y, x])
        if all(np.linalg.norm(c - p) >= min_dist_vox for p in pos_list):
            return c
    return c

neg_records = []
for suid, rows in patch_df.groupby("seriesuid"):
    first_patch = np.load(PATCH_DIR / rows.iloc[0].patch_file)
    scan_shape = first_patch.shape
    centre = random_bg_coord(scan_shape, centres[suid])
    neg_records.append({
        "seriesuid": suid,
        "center_z": centre[0], "center_y": centre[1], "center_x": centre[2],
        "diam_mm": 0, "label": 0, "patch_file": f"{suid}_bg.npy"
    })
    
neg_df = pd.DataFrame(neg_records)
neg_df.head()

Unnamed: 0,seriesuid,center_z,center_y,center_x,diam_mm,label,patch_file
0,1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...,32,32,32,0,0,1.3.6.1.4.1.14519.5.2.1.6279.6001.100225287222...
1,1.3.6.1.4.1.14519.5.2.1.6279.6001.100398138793...,32,32,32,0,0,1.3.6.1.4.1.14519.5.2.1.6279.6001.100398138793...
2,1.3.6.1.4.1.14519.5.2.1.6279.6001.100621383016...,32,32,32,0,0,1.3.6.1.4.1.14519.5.2.1.6279.6001.100621383016...
3,1.3.6.1.4.1.14519.5.2.1.6279.6001.100953483028...,32,32,32,0,0,1.3.6.1.4.1.14519.5.2.1.6279.6001.100953483028...
4,1.3.6.1.4.1.14519.5.2.1.6279.6001.102681962408...,32,32,32,0,0,1.3.6.1.4.1.14519.5.2.1.6279.6001.102681962408...


In [None]:
def to_64_cube(cube, size=64):
    cube = cube.astype(np.float32, copy=False)

    def fix_axis(a, axis, target=size):
        s = a.shape[axis]
        if s >= target:                         # center-crop
            start = (s - target) // 2
            sl = [slice(None)]*3
            sl[axis] = slice(start, start+target)
            return a[tuple(sl)]
        else:                                   # pad
            before = (target - s)//2
            after  = target - s - before
            pad = [(0,0)]*3
            pad[axis] = (before, after)
            return np.pad(a, pad, mode="constant")

    cube = fix_axis(cube, 0)
    cube = fix_axis(cube, 1)
    cube = fix_axis(cube, 2)
    return np.ascontiguousarray(cube)           # no negative strides


In this cell we are going to define a custom PyTorch Dataset (LunaPatchDS) that combines positive and negative patches, loads cubes, applies augmentations, and returns tensors with labels.

In [28]:
class LunaPatchDS(Dataset):
    def __init__(self, pos_df, neg_df, patch_dir, augment=True):
        self.df = pd.concat([pos_df.assign(label=1), neg_df]).sample(frac=1, random_state=0).reset_index(drop=True)
        self.patch_dir, self.augment = patch_dir, augment

    def __len__(self): return len(self.df)

    def _load_cube(self, row):
        if row.label == 1:
            cube = np.load(self.patch_dir / row.patch_file)
        else:
            cube = np.random.normal(0, 0.05, (64,64,64)).astype(np.float32)
        return to_64_cube(cube)

    def __getitem__(self, idx):
        row  = self.df.iloc[idx]
        cube = self._load_cube(row)

        if self.augment and row.label == 1:
            if random.random() < .5: cube = cube[::-1]
            if random.random() < .5: cube = np.rot90(cube, 1, (1,2))
            cube = np.ascontiguousarray(cube)

        cube  = torch.from_numpy(cube).float().unsqueeze(0)
        label = torch.tensor(row.label, dtype=torch.float32)
        return cube, label

In this cell we are going to create an instance of the LunaPatchDS class, providing it with the positive and negative DataFrames as well as the patch directory.

In [29]:
train_ds = LunaPatchDS(patch_df, neg_df, PATCH_DIR, augment=True)

In this cell we are going to wrap the dataset into a PyTorch DataLoader so that training can be done in shuffled mini-batches with multiprocessing support.

In [30]:
train_dl = DataLoader(train_ds, batch_size=16, shuffle=True, num_workers=4, pin_memory=True)

In this cell we are going to set up the baseline 3D CNN model for training. First, we check if a GPU is available  as this model development is on kaggle (for its free compute offering) and set the computation device accordingly. We then load the ResNet-18 3D backbone (r3d_18) without pretrained weights and adapt its input layer to accept single-channel CT cubes instead of 3-channel images. Finally, we replace the fully connected head with a small classifier: a linear layer → ReLU activation → dropout for regularization → final linear layer outputting a single value for binary classification. The model is then moved onto the selected device (CPU or GPU).

In [31]:
device = "cuda" if torch.cuda.is_available() else "cpu"

model = r3d_18(weights=None)
model.stem[0] = nn.Conv3d(
    1, 64, kernel_size=7, stride=2, padding=3, bias=False
)

model.fc = nn.Sequential(
    nn.Linear(512, 128),
    nn.ReLU(),
    nn.Dropout(0.3),
    nn.Linear(128, 1)
)

model.to(device)

VideoResNet(
  (stem): BasicStem(
    (0): Conv3d(1, 64, kernel_size=(7, 7, 7), stride=(2, 2, 2), padding=(3, 3, 3), bias=False)
    (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
  )
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Sequential(
        (0): Conv3DSimple(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace=True)
      )
      (conv2): Sequential(
        (0): Conv3DSimple(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (1): BatchNorm3d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (relu): ReLU(inplace=True)
    )
    (1): BasicBlock(
      (conv1): Sequential(
        (0): Conv3DSimple(64, 64, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1), bias=False)
        (1):

In [32]:
xb, yb = next(iter(DataLoader(train_ds, batch_size=8, num_workers=0)))
print(xb.shape, yb.shape) 

torch.Size([8, 1, 64, 64, 64]) torch.Size([8])


In this cell we are going to train the model with a binary cross-entropy with logits loss, using AdamW optimization and mixed precision (PyTorch autocast + GradScaler) for speed and stability. We track performance each epoch with AUROC from torchmetrics, accumulate average loss, and print both metrics. Whenever the epoch’s AUROC exceeds the previous best, we update best_auc and save the checkpoint (cnn_baseline.pt).

In [None]:
loss_fn = nn.BCEWithLogitsLoss()
opt     = torch.optim.AdamW(model.parameters(), lr=3e-4)

scaler  = torch.amp.GradScaler('cuda' if device=='cuda' else 'cpu')
auroc   = tm.AUROC(task="binary").to(device)

best_auc = 0.0
for epoch in range(10):
    model.train(); 
    auroc.reset(); 
    running_loss = 0.0
    
    for x, y in tqdm(train_dl, leave=False):
        x, y = x.to(device, non_blocking=True), y.to(device, non_blocking=True)
        opt.zero_grad(set_to_none=True)
        
        with torch.amp.autocast(device_type='cuda' if device=='cuda' else 'cpu'):
            logits = model(x).squeeze()
            loss   = loss_fn(logits, y)
        scaler.scale(loss).backward()
        scaler.step(opt); scaler.update()
        running_loss += loss.item() * x.size(0)
        auroc.update(torch.sigmoid(logits.detach()), y)
    epoch_loss = running_loss / len(train_dl.dataset)
    epoch_auc  = auroc.compute().item()
    print(f"epoch {epoch:02d} | loss {epoch_loss:.4f} | AUROC {epoch_auc:.3f}")
    if epoch_auc > best_auc:
        best_auc = epoch_auc
        torch.save(model.state_dict(), "cnn_baseline.pt")
        print("  ↳ saved new best model")

  0%|          | 0/112 [00:00<?, ?it/s]

epoch 00 | loss 0.1462 | AUROC 0.979
  ↳ saved new best model


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 01 | loss 0.0505 | AUROC 0.994
  ↳ saved new best model


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 02 | loss 0.0490 | AUROC 0.994
  ↳ saved new best model


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 03 | loss 0.0515 | AUROC 0.994


  0%|          | 0/112 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7bbf983ffa60>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1618, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1601, in _shutdown_workers
    if w.is_alive():
       ^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7bbf983ffa60>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1618, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 16

epoch 04 | loss 0.0588 | AUROC 0.993


  0%|          | 0/112 [00:00<?, ?it/s]

Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7bbf983ffa60>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1618, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1601, in _shutdown_workers
    if w.is_alive():
       ^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/process.py", line 160, in is_alive
    assert self._parent_pid == os.getpid(), 'can only test a child process'
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: can only test a child process
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7bbf983ffa60>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 1618, in __del__
    self._shutdown_workers()
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/data/dataloader.py", line 16

epoch 05 | loss 0.0492 | AUROC 0.993


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 06 | loss 0.0423 | AUROC 0.995
  ↳ saved new best model


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 07 | loss 0.0744 | AUROC 0.992


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 08 | loss 0.0467 | AUROC 0.993


  0%|          | 0/112 [00:00<?, ?it/s]

epoch 09 | loss 0.0425 | AUROC 0.993


In [None]:
print("saved?", os.path.exists("cnn_baseline.pt"),
      "size MB:", os.path.getsize("cnn_baseline.pt")/1e6 if os.path.exists("cnn_baseline.pt") else 0)

# reload to be sure the state_dict is good
state = torch.load("cnn_baseline.pt", map_location="cpu")
model.load_state_dict(state)

saved? True size MB: 132.984826


<All keys matched successfully>

In this cell we are going to create a scan-grouped train/validation split using GroupShuffleSplit so that patches from the same seriesuid don’t leak across splits, then rebuild matching negative samples per split, and finally construct LunaPatchDS datasets and DataLoaders (with augmentation for train, none for val) to feed the model./

In [None]:
groups = patch_df['seriesuid'].values
gss = GroupShuffleSplit(n_splits=1, test_size=0.2, random_state=42)
train_idx, val_idx = next(gss.split(patch_df, groups=groups))

pos_train = patch_df.iloc[train_idx].reset_index(drop=True)
pos_val   = patch_df.iloc[val_idx].reset_index(drop=True)

# rebuild negatives for each split
def make_negs(pos_df):
    series = pos_df['seriesuid'].unique()
    return (pd.DataFrame({'seriesuid': np.random.choice(series, size=len(pos_df))})
            .assign(patch_file=lambda d: d.seriesuid.map(
                lambda s: pos_df[pos_df.seriesuid==s]
                          .sample(1, random_state=0).patch_file.values[0]))
            .assign(label=0))

neg_train, neg_val = make_negs(pos_train), make_negs(pos_val)

train_ds = LunaPatchDS(pos_train, neg_train, PATCH_DIR, augment=True)
val_ds   = LunaPatchDS(pos_val,   neg_val,   PATCH_DIR, augment=False)

train_dl = DataLoader(train_ds, batch_size=16, shuffle=True,  num_workers=0, pin_memory=True)
val_dl   = DataLoader(val_ds,   batch_size=32, shuffle=False, num_workers=0, pin_memory=True)

In this cell we are going to evaluate on the validation set by switching the model to eval mode, collecting probabilities with torch.no_grad(), and computing the AUROC using torchmetrics, then printing the final grouped validation AUROC.

In [None]:
model.eval()
auroc = tm.AUROC(task="binary")
y_true, y_prob = [], []

with torch.no_grad():
    for xb, yb in val_dl:
        xb = xb.to(device); yb = yb.to(device)
        prob = torch.sigmoid(model(xb).squeeze())
        y_true.append(yb.cpu()); y_prob.append(prob.cpu())
y_true = torch.cat(y_true); y_prob = torch.cat(y_prob)
val_auc = auroc(y_prob, y_true).item()

print(f"VALID AUROC (scan-grouped): {val_auc:.3f}")

VALID AUROC (scan-grouped): 0.986


In this cell we are going to report the validation AUROC and compute threshold-based metrics at thr=0.50 (precision, recall, F1). The AUROC of 0.986 shows the model ranks positives above negatives very well, but at a fixed 0.5 threshold it predicted no positives, so precision/recall/F1 are 0.0 and sklearn warns about “no predicted samples”.

In [None]:
thr = 0.5
y_pred = (y_prob.numpy() >= thr).astype(np.int32)
p, r, f1, _ = precision_recall_fscore_support(y_true.numpy(), y_pred, average='binary')

print(f"thr={thr:.2f}  Precision={p:.3f} Recall={r:.3f} F1={f1:.3f}  AUROC={val_auc:.3f}")

thr=0.50  Precision=0.000 Recall=0.000 F1=0.000  AUROC=0.986


  _warn_prf(average, modifier, msg_start, len(result))


Conclusion. The baseline CNN separates classes strongly (high AUROC) but is miscalibrated for the 0.5 cutoff—likely due to class imbalance. Next steps: pick a better decision threshold (e.g., maximize F1 or Youden’s J on the val set), inspect the precision–recall curve, and consider class weighting/pos_weight (or focal loss) and probability calibration (temperature scaling or Platt scaling). This will convert the strong ranking into usable operating points for your task.