# üìò MNIST Curve Adjustment Layers (CALs) ‚Äî Experimental Baseline

## üß† Overview

In this notebook, we establish a **strong and interpretable control model** on the **MNIST handwritten digit classification task**, forming the **foundation for our exploration of Curve Adjustment Layers (CALs)** ‚Äî a novel technique aimed at improving neural network convergence and generalization.

CALs act as *intermediary modules* between neural layers, applying **learned deformation curves** to activations. These curves serve to *bend the output space* toward more optimal configurations, enabling the model to adaptively reshape internal representations.

This notebook provides:
- ‚úÖ A solid CNN-based reference model
- üì• MNIST dataset loading and visualization
- üß∞ Preliminaries for future CAL insertion and ablation studies

---


## Install any packages needed ...

In [15]:
!pip install matplotlib

Collecting matplotlib
  Downloading matplotlib-3.10.3-cp312-cp312-win_amd64.whl.metadata (11 kB)
Collecting contourpy>=1.0.1 (from matplotlib)
  Downloading contourpy-1.3.2-cp312-cp312-win_amd64.whl.metadata (5.5 kB)
Collecting cycler>=0.10 (from matplotlib)
  Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting fonttools>=4.22.0 (from matplotlib)
  Downloading fonttools-4.58.0-cp312-cp312-win_amd64.whl.metadata (106 kB)
     ---------------------------------------- 0.0/106.6 kB ? eta -:--:--
     ------- ----------------------------- 20.5/106.6 kB 682.7 kB/s eta 0:00:01
     ----------------------------- --------- 81.9/106.6 kB 1.2 MB/s eta 0:00:01
     -------------------------------------- 106.6/106.6 kB 1.2 MB/s eta 0:00:00
Collecting kiwisolver>=1.3.1 (from matplotlib)
  Downloading kiwisolver-1.4.8-cp312-cp312-win_amd64.whl.metadata (6.3 kB)
Collecting pyparsing>=2.3.1 (from matplotlib)
  Using cached pyparsing-3.2.3-py3-none-any.whl.metadata (5.0 kB)
Download

In [None]:
import os
from PIL import Image
import torch.optim as optim
import matplotlib.pyplot as plt
from torchvision import datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, Dataset

# Grab the MNIST Dataset:
---

The MNIST dataset is a collection of 70,000 handwritten digits (0-9) that is commonly used for training various image processing systems.


## üìä Dataset Summary: MNIST

- **Type**: Grayscale handwritten digits  
- **Size**: `60,000` training / `10,000` test samples  
- **Resolution**: `28√ó28` pixels  
- **Classes**: Digits `0‚Äì9`  

---

In [None]:
# Download MNIST dataset to a local directory:
# M:\dev\ml\data

data_path = os.path.join('..', 'data')
mnist_train = datasets.MNIST( data_path, train=True, download=True )
mnist_test  = datasets.MNIST( data_path, train=False, download=True )
print( "MNIST dataset downloaded to:", data_path )

count_train = len( mnist_train )
count_test = len( mnist_test )

MNIST dataset downloaded to: ..\data


# Examine the data:
---

Read data samples and view the image and label ...

In [None]:
import torch
print( "Number of training samples:", count_train )
print( "Number of test samples:", count_test )

# Examine data samples:
img, label = mnist_train[ torch.randint( 0, count_train, ( ) ) ]
print( "Image:", img, "Label:", label )

# Display the first image:
img.show( )

Number of training samples: 60000
Number of test samples: 10000
Image: <PIL.Image.Image image mode=L size=28x28 at 0x1627A6B94C0> Label: 1


## üì¶ Data Loader & Hyperparameter Setup

To train our CNN model efficiently, we define a few key **training hyperparameters** and configure a **data preprocessing pipeline** that prepares the MNIST dataset for use in PyTorch.

---

### üéõÔ∏è Training Hyperparameters

| Hyperparameter   | Value      | Description                                                       |
|------------------|------------|-------------------------------------------------------------------|
| `batch_size`     | `64`       | Number of samples per batch ‚Äî balances speed and gradient quality |
| `epochs`         | `5`        | Total passes over the training set                               |
| `lr`             | `0.001`    | Learning rate ‚Äî controls optimizer step size                      |

---


In [None]:
# Hyperparameters
batch_size = 64
epochs = 5
lr = 0.001

# Create the transform function to convert the image to a tensor and normalize it:
# (Note: The normalization values are based on the MNIST dataset statistics.)
transforms = transforms.Compose( [
    transforms.ToTensor( ),
    transforms.Normalize( ( 0.1307, ), ( 0.3081, ) ),
] )

# Create a DataLoader to load the dataset in batches:
data_loader = DataLoader( mnist_train,
                          batch_size=batch_size,
                          shuffle=True,
                          transform=transforms )



# üß† MNIST Control Model Architecture Overview

## üéØ Objective

This notebook establishes a **well-performing, yet minimal and interpretable convolutional neural network (CNN)** designed to solve the **MNIST handwritten digit classification problem**. Our intent is to create a **robust "control" architecture** for later comparative experiments involving more advanced or experimental models. MNIST is a widely used benchmark dataset for evaluating model performance in image recognition tasks.

---


## üß± Model Architecture

A lightweight yet competitive CNN inspired by classical deep learning pipelines.

| Layer              | Type            | Shape In ‚Üí Out      | Activation | Notes                         |
|--------------------|------------------|----------------------|------------|-------------------------------|
| üîπ Input           | Image            | `1 √ó 28 √ó 28`        | ‚Äì          | Grayscale digit image         |
| üî∏ Conv1           | Conv2d(1‚Üí32, 3√ó3) | `28 √ó 28 ‚Üí 28 √ó 28`  | ReLU       | Padding preserves resolution  |
| üî∏ MaxPool1        | MaxPool2d(2√ó2)   | `28 √ó 28 ‚Üí 14 √ó 14`  | ‚Äì          | Downsamples feature maps      |
| üî∏ Conv2           | Conv2d(32‚Üí64, 3√ó3)| `14 √ó 14 ‚Üí 14 √ó 14` | ReLU       | Captures mid-level patterns   |
| üî∏ MaxPool2        | MaxPool2d(2√ó2)   | `14 √ó 14 ‚Üí 7 √ó 7`    | ‚Äì          | Further spatial reduction     |
| üî∏ Flatten         | ‚Äì                | `64 √ó 7 √ó 7 ‚Üí 3136`  | ‚Äì          | Prep for dense layers         |
| üî∏ FC1             | Linear(3136‚Üí128) | `3136 ‚Üí 128`         | ReLU       | Dense representation          |
| üî∏ Dropout         | Dropout(p=0.25)  | `128 ‚Üí 128`          | ‚Äì          | Regularization against overfit|
| üî∏ FC2             | Linear(128‚Üí10)   | `128 ‚Üí 10`           | ‚Äì          | Raw class logits (pre-Softmax)|

---

## ‚öôÔ∏è Design Rationale

### ‚úÖ Simplicity with Performance
This architecture balances **simplicity**, **speed**, and **performance**, achieving **~99% accuracy** on MNIST while remaining **transparent and modifiable**.

### üß© Key Design Choices

- **2√ó Convolution Layers**: Enough for capturing local and mid-level features on 28√ó28 inputs.
- **MaxPooling**: Reduces computation, introduces translational invariance.
- **Dropout**: Prevents overfitting in dense layers, especially for small datasets.
- **ReLU Activations**: Promote fast convergence and sparse gradients.
- **Minimal FC layers**: Keeps parameter count low without sacrificing accuracy.

---

## üß™ Purpose in Research

This model acts as a **baseline control** for later experiments involving:

- Architectural modifications (residuals, batch norm, transformers, etc.)
- Alternative optimization strategies (SGD vs AdamW)
- Ablation studies
- Regularization and generalization research

It serves as a **trustworthy metric anchor** to assess whether newer approaches offer real improvements or are overly complex.

---

## üìà Performance Baseline

With standard hyperparameters:

```python
epochs = 5
batch_size = 64
optimizer = Adam(lr=1e-3)
loss_fn = CrossEntropyLoss()


In [None]:
# Define a convolutional neural network (CNN) model
import torch
import torch.nn as nn
import torch.nn.functional as F

# Define the CNN model
class MNISTNet( nn.Module ):
    def __init__( self ):
        """
        Initialize the CNN model.
        """
        
        super( ).__init__( )
        
        # Define the layers of the CNN:
        self.conv1 = nn.Conv2d( 1, 32, 3, padding=1 )
        self.conv2 = nn.Conv2d( 32, 64, 3, padding=1 )
        self.pool = nn.MaxPool2d( 2, 2 )
        self.fc1 = nn.Linear( 64 * 7 * 7, 128 )
        self.dropout = nn.Dropout( 0.25 )
        self.fc2 = nn.Linear( 128, 10 )

    def forward( self, x ):
        """
        Define the forward pass of the model.
        Args:
            x (torch.Tensor): Input tensor.
        Returns:
            torch.Tensor: Output tensor.
        """
        # Apply the convolutional layers, activation functions, and pooling:
        # (Note: The input tensor is expected to have shape [batch_size, 1, 28, 28])
        x = self.pool( torch.relu( self.conv1(x) ) )
        x = self.pool( torch.relu( self.conv2(x) ) )
        x = torch.flatten( x, 1 )
        x = self.dropout( torch.relu( self.fc1(x) ) )
        x = self.fc2( x )
        return x
