In [None]:
'''
PyTorch training pipeline — detailed, step-by-step flow

Nice — below is a thorough, practical walk-through of the typical training pipeline when you build and train a neural network with PyTorch. I’ll highlight what to do, why it matters, the common PyTorch API pieces you’ll use, and helpful tips/pitfalls.

High level (one-line)

Prepare data → build model → choose loss & optimizer → training loop (forward → loss → backward → update) → validation & checkpoints → test & deploy.

Detailed steps
1) Problem & data understanding

What: Know the task (classification, regression, segmentation) and inspect the dataset size, labels, imbalance, input shapes.

Why: Architecture, loss, metrics, and augmentation choices depend on this.

Tips: Plot samples, class counts, example inputs/labels.

2) Data preparation & transforms

What: Clean data, split into train/val/test, apply transforms (normalization, augmentation).

PyTorch APIs: torch.utils.data.Dataset (custom) or torchvision.datasets, torchvision.transforms.

Why: Augmentation improves generalization; normalization speeds convergence.

Tips: Keep validation/test transforms deterministic (no random flip in val/test). Use random_split or sklearn's train_test_split.

3) DataLoader & batching

What: Wrap Dataset with DataLoader for batching, shuffling, parallel loading.

PyTorch APIs: DataLoader(dataset, batch_size=..., shuffle=True, num_workers=...)

Why: Efficient I/O and GPU utilization.

Tips: pin_memory=True for CUDA, tune num_workers. For distributed training use DistributedSampler.

4) Define model (nn.Module)

What: Create network by subclassing torch.nn.Module and implementing forward.

PyTorch APIs: torch.nn.Module, torch.nn.Sequential, layers in torch.nn (Conv2d, Linear, etc.)

Why: Encapsulates weights, forward pass, and state_dict() for saving.

Tips: Keep forward pure (no optimizer steps). Use model.to(device).

5) Loss function & metrics

What: Choose an appropriate loss (CrossEntropyLoss for multiclass, BCEWithLogitsLoss, MSELoss, etc.) and metrics (accuracy, F1).

PyTorch APIs: torch.nn.CrossEntropyLoss(), custom metric code.

Why: Loss guides training; metrics evaluate real performance.

Tips: For class imbalance, use weights in loss or focal loss.

6) Optimizer & schedulers

What: Choose an optimizer (SGD, Adam) and optionally a learning rate scheduler (StepLR, CosineAnnealing, ReduceLROnPlateau).

PyTorch APIs: torch.optim.SGD, Adam, torch.optim.lr_scheduler.*

Why: Optimizer updates parameters; scheduler helps converge or escape plateaus.

Tips: Use weight_decay for L2 regularization; try optimizer.zero_grad(set_to_none=True) for slight perf gains.

7) Device setup (CPU/GPU/multiple GPUs)

What: Move model and tensors to the correct device: device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Why: GPU drastically speeds training.

Tips: For multi-GPU: nn.DataParallel (simple) or torch.distributed (scalable); use torch.cuda.amp for mixed precision.

8) Training loop (core)

For each epoch:

model.train()

For each batch:

Move inputs/targets to device.

y_pred = model(inputs) ← forward

loss = criterion(y_pred, targets)

loss.backward() ← autograd computes grads

Optionally clip gradients: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)

optimizer.step() ← update parameters

optimizer.zero_grad() or model.zero_grad() ← clear grads

Track running loss and training metrics.

Why: This is the forward/backward/update cycle that trains weights.

Tips & gotchas:

Always zero gradients BEFORE the next backward pass (or right after optimizer.step()).

Use loss.item() for logging (don’t keep tensors on GPU).

For numerical stability with large models, consider gradient clipping and LR warmup.

For speed & memory: use with torch.cuda.amp.autocast(): and GradScaler().

9) Validation loop

What: Periodically run a validation loop using held-out val set.

Steps: model.eval(), run forward under with torch.no_grad(), compute loss & metrics.

Why: Detect overfitting, tune hyperparameters, decide checkpoint saving.

Tips: Don’t call loss.backward() here. Use torch.no_grad() to save memory.

10) Checkpointing & early stopping

What: Save model weights, optimizer state, epoch, best metric: torch.save({'model':model.state_dict(), 'opt': optimizer.state_dict(), ...}, path)

Why: Resume training, keep best model, protect from crashes.

Tips: Save best on validation metric; use EarlyStopping to stop when no improvement.

11) Testing & inference

What: Load best checkpoint, model.eval(), use torch.no_grad() and perform predictions on test set.

APIs: model.load_state_dict(torch.load(PATH)['model'])

Tips: For production, export torch.jit.trace/script or convert to ONNX/TorchScript.

12) Logging, monitoring & reproducibility

Logging: TensorBoard (torch.utils.tensorboard), Weights & Biases, simple CSV logs.

Reproducibility: torch.manual_seed, np.random.seed, random.seed; set torch.backends.cudnn.deterministic=True (may slow training).

Profiling: torch.profiler or nvprof to find bottlenecks.

Minimal training pseudocode (compact)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = MyModel(...).to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scaler = torch.cuda.amp.GradScaler()      # optional, for mixed precision

for epoch in range(num_epochs):
    model.train()
    for X, y in train_loader:
        X, y = X.to(device), y.to(device)
        optimizer.zero_grad(set_to_none=True)

        with torch.cuda.amp.autocast():   # optional
            preds = model(X)
            loss = criterion(preds, y)

        scaler.scale(loss).backward()     # scale grads
        scaler.unscale_(optimizer)        # optionally clip grads
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        scaler.step(optimizer)
        scaler.update()

    # validation
    model.eval()
    with torch.no_grad():
        for X_val, y_val in val_loader:
            ...
    # save checkpoint if improved

Checklist of common pitfalls

Forgetting optimizer.zero_grad() → gradients accumulate.

Using model.train()/model.eval() wrong (affects dropout, batchnorm).

Doing validation without torch.no_grad() → uses extra memory.

Not moving tensors to device or mixing CPU/GPU tensors.

Saving model vs model.state_dict() incorrectly — prefer state_dict() for portability.

Logging GPU tensors directly (use .item() or .cpu().numpy()).

Advanced topics (briefly)

Mixed precision training: torch.cuda.amp for speed & memory.

Distributed training: torch.distributed + DistributedDataParallel.

Grad accumulation: accumulate gradients to emulate larger batch sizes.

Custom losses/metrics: implement carefully and validate gradients (use gradcheck if needed).

Hyperparameter tuning: use Optuna / Ray Tune.
'''

In [None]:
'''
simple workflow
breast cancer detection
'''

In [2]:
import numpy as np
import pandas as pd
import torch
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder

In [3]:
df = pd.read_csv('https://raw.githubusercontent.com/gscdit/Breast-Cancer-Detection/refs/heads/master/data.csv')
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst,Unnamed: 32
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189,
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902,
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758,
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173,
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678,


In [4]:
df.shape

(569, 33)

In [5]:
df.drop(columns=['id', 'Unnamed: 32'], inplace= True)

In [6]:
df.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


In [7]:
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:, 1:], df.iloc[:, 0], test_size=0.2)

In [8]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [9]:
X_train

array([[-0.50930538, -0.62750296, -0.51780784, ..., -0.41253165,
        -0.47072385, -0.41678628],
       [ 0.29962615, -0.65784672,  0.34036895, ...,  0.44625691,
        -0.51572762, -0.32289991],
       [ 0.09666842,  1.81633739,  0.03186302, ..., -0.89003042,
        -0.90951057, -1.04913857],
       ...,
       [-0.73255888, -1.11300324, -0.70888791, ..., -0.68111507,
        -0.93522701, -0.12684309],
       [ 0.87370659,  1.85368357,  0.82943291, ...,  0.64977592,
         0.15932888,  0.38456147],
       [ 0.21554366, -1.0523157 ,  0.13329539, ..., -0.5212293 ,
        -0.92719062, -1.22697039]])

In [10]:
y_train

Unnamed: 0,diagnosis
74,B
128,B
462,B
218,M
544,B
...,...
49,B
269,B
540,B
460,M


### Label Encoding

In [11]:
encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
y_test = encoder.transform(y_test)

In [12]:
y_train

array([0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0,
       0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0,
       1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0,
       1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
       1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1,
       1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0,
       0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0,
       1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0,
       0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1,
       0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1,
       1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1,

### Numpy arrays to PyTorch tensors

In [13]:
X_train_tensor = torch.from_numpy(X_train)
X_test_tensor = torch.from_numpy(X_test)
y_train_tensor = torch.from_numpy(y_train)
y_test_tensor = torch.from_numpy(y_test)

In [14]:
X_train_tensor.shape

torch.Size([455, 30])

In [15]:
y_train_tensor.shape

torch.Size([455])

### Defining the model

In [16]:
class MySimpleNN():

  def __init__(self, X):

    self.weights = torch.rand(X.shape[1], 1, dtype=torch.float64, requires_grad=True)
    self.bias = torch.zeros(1, dtype=torch.float64, requires_grad=True)

  def forward(self, X):
    z = torch.matmul(X, self.weights) + self.bias
    y_pred = torch.sigmoid(z)
    return y_pred

  def loss_function(self, y_pred, y):
    # Clamp predictions to avoid log(0)
    epsilon = 1e-7
    y_pred = torch.clamp(y_pred, epsilon, 1 - epsilon)

    # Calculate loss
    loss = -(y_train_tensor * torch.log(y_pred) + (1 - y_train_tensor) * torch.log(1 - y_pred)).mean()
    return loss

### Important Parameters

In [17]:
learning_rate = 0.1
epochs = 25

### Training Pipeline

In [18]:
# create model
model = MySimpleNN(X_train_tensor)

# define loop
for epoch in range(epochs):

  # forward pass
  y_pred = model.forward(X_train_tensor)

  # loss calculate
  loss = model.loss_function(y_pred, y_train_tensor)

  # backward pass
  loss.backward()

  # parameters update
  with torch.no_grad():
    model.weights -= learning_rate * model.weights.grad
    model.bias -= learning_rate * model.bias.grad

  # zero gradients
  model.weights.grad.zero_()
  model.bias.grad.zero_()

  # print loss in each epoch
  print(f'Epoch: {epoch + 1}, Loss: {loss.item()}')

Epoch: 1, Loss: 4.38720349610774
Epoch: 2, Loss: 4.3064637965838815
Epoch: 3, Loss: 4.219234484340734
Epoch: 4, Loss: 4.127262637117025
Epoch: 5, Loss: 4.029632605264835
Epoch: 6, Loss: 3.925269493199016
Epoch: 7, Loss: 3.8172892820948428
Epoch: 8, Loss: 3.7059938579153586
Epoch: 9, Loss: 3.59126098533363
Epoch: 10, Loss: 3.4724763648630006
Epoch: 11, Loss: 3.351182831927663
Epoch: 12, Loss: 3.2227265382505816
Epoch: 13, Loss: 3.089629329690727
Epoch: 14, Loss: 2.9496049594353906
Epoch: 15, Loss: 2.8063874492780365
Epoch: 16, Loss: 2.6580680387925293
Epoch: 17, Loss: 2.50245246815276
Epoch: 18, Loss: 2.3477335130658012
Epoch: 19, Loss: 2.1936534483779986
Epoch: 20, Loss: 2.040964512114853
Epoch: 21, Loss: 1.8916123178384388
Epoch: 22, Loss: 1.7471636198274318
Epoch: 23, Loss: 1.607759533752965
Epoch: 24, Loss: 1.4723537005755105
Epoch: 25, Loss: 1.347598539001185


In [19]:
model.bias

tensor([-0.0391], dtype=torch.float64, requires_grad=True)

In [20]:
# model evaluation
with torch.no_grad():
  y_pred = model.forward(X_test_tensor)
  y_pred = (y_pred > 0.9).float()
  accuracy = (y_pred == y_test_tensor).float().mean()
  print(f'Accuracy: {accuracy.item()}')

Accuracy: 0.549861490726471
