# Neural Net Demonstration with PyTorch


In this example, we use the breast cancer dataset to show how you can use PyTorch neural networks for a binary classification problem. The goal is to predict breast cancer diagnosis (0 = not diagnosed, 1 = diagnosed) based on cellular features extracted from breast mass images.

## Learning Objectives:
- Set up and verify PyTorch environment
- Load and preprocess medical data for deep learning
- Build a simple neural network for binary classification
- Train and evaluate the model using appropriate classification metrics

## 1. Environment Setup and Imports

First, let's verify our PyTorch installation and import necessary libraries.

In [None]:
# Standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# Sklearn for preprocessing and metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report

# Check PyTorch version and CUDA availability
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"Device: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'CPU'}")

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

## 2. Load Breast Cancer Dataset

For this demo, we'll use the breast cancer dataset which contains features computed from digitised images of breast masses. Each sample describes characteristics of cell nuclei present in the image.

In [None]:
# Load breast cancer dataset from local CSV file
cancer_data = pd.read_csv('./breast_cancer.csv')

# Display basic information
print(f"Dataset shape: {cancer_data.shape}")
print(f"\nColumn names: {cancer_data.columns.tolist()}")
print(f"\nDiagnosis distribution:")
print(cancer_data['diagnosis'].value_counts())
print(f"\nClass balance:")
print(cancer_data['diagnosis'].value_counts(normalize=True))

# Show first few rows
cancer_data.head()

## 3. Data Preprocessing

Preprocessing step is crucial for neural network training. We separate our features (cellular characteristics) from our target variable (diagnosis), split the data into training and testing sets to evaluate model performance, and apply standardisation to normalise all features to have zero mean and unit variance. This standardisation ensures that no single feature dominates the learning process due to its scale. Finally, we convert our data to PyTorch tensors and create DataLoaders that will efficiently batch and shuffle our data during training.

In [None]:
# Separate features and target
X = cancer_data.drop('diagnosis', axis=1).values
y = cancer_data['diagnosis'].values

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardise features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train_scaled)
y_train_tensor = torch.FloatTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test_scaled)
y_test_tensor = torch.FloatTensor(y_test)

# Create PyTorch datasets and dataloaders
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Number of features: {X_train.shape[1]}")
print(f"Training set class distribution: {np.bincount(y_train.astype(int))}")
print(f"Test set class distribution: {np.bincount(y_test.astype(int))}")

**What is a tensor?**

A tensor is a mathematical object that generalises familiar concepts like numbers, vectors, and matrices. Think of it as a container for data that can have different dimensions - a single number (0D), a list of numbers (1D), a table of numbers (2D), or even higher-dimensional arrangements. In machine learning, tensors are fundamental because they provide a unified way to represent and manipulate the complex, multi-dimensional data that neural networks process, from simple input features to the intricate weight matrices that define how networks learn and make predictions.

## 4. Define the Neural Network Model

Create a simple feedforward neural network for breast cancer diagnosis prediction.

**Neural Network Architecture Explanation:**

Our `BreastCancerNet` is a feedforward neural network with four fully connected (linear) layers that progressively reduce the dimensionality: input → 64 → 32 → 16 → 1 neuron. This architecture creates a funnel-like structure that learns to extract increasingly abstract features from the cellular characteristics to predict diagnosis. We use ReLU (Rectified Linear Unit) activation functions between layers to introduce non-linearity, allowing the network to learn complex patterns. Dropout layers (20% probability) are included to prevent overfitting by randomly setting some neurons to zero during training, forcing the network to be more robust and generalisable. The final layer uses a sigmoid activation function to output probabilities between 0 and 1 for binary classification.

In [None]:
class BreastCancerNet(nn.Module):
    def __init__(self, input_dim):
        super(BreastCancerNet, self).__init__()
        
        # Define layers with progressively decreasing dimensions
        self.fc1 = nn.Linear(input_dim, 64)  # First hidden layer
        self.fc2 = nn.Linear(64, 32)         # Second hidden layer  
        self.fc3 = nn.Linear(32, 16)         # Third hidden layer
        self.fc4 = nn.Linear(16, 1)          # Output layer
        
        # Activation and regularisation
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)
        self.sigmoid = nn.Sigmoid()         # For binary classification
        
    def forward(self, x):
        # Forward pass through the network
        x = self.relu(self.fc1(x))  # Apply ReLU activation to first layer
        x = self.dropout(x)         # Apply dropout for regularization
        x = self.relu(self.fc2(x))  # Apply ReLU activation to second layer
        x = self.dropout(x)         # Apply dropout for regularization
        x = self.relu(self.fc3(x))  # Apply ReLU activation to third layer
        x = self.fc4(x)             # Final linear transformation
        x = self.sigmoid(x)         # Sigmoid activation for binary classification
        return x

# Initialise model
input_dim = X_train.shape[1]  # Number of features (30 in our case)
model = BreastCancerNet(input_dim=input_dim).to(device)

# Display model architecture
print(model)
print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")

## 5. Training Setup

Configure loss function, optimiser, and training parameters.

**Key Components Explained:**

**BCE (Binary Cross-Entropy):** Our loss function that measures the difference between predicted probabilities and actual binary labels. BCE is ideal for binary classification problems as it penalises confident wrong predictions more heavily and provides smooth gradients for optimisation.

**Adam Optimiser:** An adaptive learning rate optimiser that combines the benefits of momentum and RMSprop. Adam automatically adjusts the learning rate for each parameter individually, making it very effective for training neural networks with minimal hyperparameter tuning.

**Training Functions:** The `train_epoch` function performs forward propagation (calculating predictions), computes the loss, and uses backpropagation to update model weights. The `validate` function evaluates model performance on unseen data without updating weights, helping us monitor for overfitting.

In [None]:
# Loss function and optimizer
criterion = nn.BCELoss()  # Binary Cross-Entropy for binary classification
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training parameters
num_epochs = 50
train_losses = []
val_losses = []

# Training function
def train_epoch(model, dataloader, criterion, optimizer, device):
    model.train()
    total_loss = 0
    
    for batch_X, batch_y in dataloader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        
        # Forward pass
        outputs = model(batch_X).squeeze()
        loss = criterion(outputs, batch_y)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    return total_loss / len(dataloader)

# Validation function
def validate(model, dataloader, criterion, device):
    model.eval()
    total_loss = 0
    
    with torch.no_grad():
        for batch_X, batch_y in dataloader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)
            outputs = model(batch_X).squeeze()
            loss = criterion(outputs, batch_y)
            total_loss += loss.item()
    
    return total_loss / len(dataloader)

## 6. Train the Model

Train the neural network and monitor performance by calling the previous functions per training epoch.

In [None]:
# Training loop
print("Starting training...")
print("-" * 50)

for epoch in range(num_epochs):
    train_loss = train_epoch(model, train_loader, criterion, optimizer, device)
    val_loss = validate(model, test_loader, criterion, device)
    
    train_losses.append(train_loss)
    val_losses.append(val_loss)
    
    if (epoch + 1) % 10 == 0:
        print(f"Epoch [{epoch+1}/{num_epochs}] - Train Loss: {train_loss:.4f}, Val Loss: {val_loss:.4f}")

print("-" * 50)
print("Training completed!")

## 7. Visualise Training Progress

**What we expect to see:** 
Both training and validation loss should decrease over time, indicating the model is learning. Ideally, both curves should converge to a similar low value. If the training loss continues decreasing while validation loss increases, this indicates overfitting. If both curves plateau at a high value, the model may be underfitting and need more complexity or different hyperparameters.

In [None]:
# Plot training history
plt.figure(figsize=(10, 6))
plt.plot(train_losses, label='Training Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('BCE Loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.grid(True)
plt.show()

## 8. Model Evaluation

Evaluate the model's performance on the test set using classification metrics.

**Metrics Explained:**

**Accuracy:** The proportion of correct predictions out of total predictions. For medical diagnosis, high accuracy is important but not the only consideration.

**Precision:** The proportion of positive predictions that were actually correct. In medical context, this represents how often we correctly identify cancer when we predict it.

**Recall (Sensitivity):** The proportion of actual positive cases that were correctly identified. This is crucial in medical diagnosis as we want to catch as many cancer cases as possible.

**F1-Score:** The harmonic mean of precision and recall, providing a balanced measure when both are important.

**Confusion Matrix:** Shows the breakdown of correct and incorrect predictions for each class, helping identify if the model has bias towards certain predictions.

In [None]:
# Make predictions
model.eval()
with torch.no_grad():
    X_test_device = X_test_tensor.to(device)
    prediction_probs = model(X_test_device).squeeze().cpu().numpy()
    predictions = (prediction_probs > 0.5).astype(int)

# Calculate classification metrics
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)
f1 = f1_score(y_test, predictions)

print(f"Test Set Performance:")
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")

print("\nDetailed Classification Report:")
print(classification_report(y_test, predictions, target_names=['Benign', 'Malignant']))

# Confusion Matrix
cm = confusion_matrix(y_test, predictions)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Benign', 'Malignant'], 
            yticklabels=['Benign', 'Malignant'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# ROC-style visualization: Predicted probabilities distribution
plt.figure(figsize=(10, 6))
plt.hist(prediction_probs[y_test == 0], bins=30, alpha=0.7, label='Benign', color='blue')
plt.hist(prediction_probs[y_test == 1], bins=30, alpha=0.7, label='Malignant', color='red')
plt.axvline(x=0.5, color='black', linestyle='--', label='Decision Threshold')
plt.xlabel('Predicted Probability')
plt.ylabel('Frequency')
plt.title('Distribution of Predicted Probabilities by True Class')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

## 9. Save the Model to the local folder

You would typically save the model to a local file for recall at another time for further training or predictions on new data.

In [None]:
# Save the trained model as a .pth file in the current directory
model_path = 'breast_cancer_model.pth'
torch.save({
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'scaler': scaler,
    'input_dim': X_train.shape[1],
    'train_losses': train_losses,
    'val_losses': val_losses
}, model_path)


print(f"Model saved to {model_path}")

Load in the model as so:

In [None]:
# load in the models parameters 
checkpoint = torch.load('./breast_cancer_model.pth', weights_only=False)

# Grab the input dimensions and initialise the model
input_dim = checkpoint['input_dim']
model = BreastCancerNet(input_dim=input_dim)

# Load the model state dictionary
model.load_state_dict(checkpoint['model_state_dict'])

# Set to evaluation mode for predictions (it switches off the drop-out layers so all neurons are active)
# If you are training further, forgo this step
model.eval()

# Move to appropriate device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

### Potential improvements:
- Experiment with different network architectures
- Try different optimisation techniques
- Implement cross-validation for more robust evaluation
- Add feature importance analysis
- Experiment with different threshold values for classification