### Assignment 3: CNNs                                                                                          
##### CSCI 536: Data Mining
##### Huyen Nguyen and Jesse Hapy

### 1. Data Loading and Preprocessing
###### The goal for this portion of the assignment is to implement convolutional neural networks (CNNs) using PyTorch on the MNIST dataset. First,we import the core PyTorch libraries needed for building and training neural networks, as well as torchvision modules for accessing and transforming datasets. The `SummaryWriter` from `torch.utils.tensorboard` is initialized to log training statistics for visualization in TensorBoard, with logs stored in the directory `runs/mnist_cnn_base`. Next, the MNIST dataset is loaded using PyTorch’s built-in `datasets.MNIST API`. This dataset consists of 28×28 grayscale images of handwritten digits ranging from 0 to 9. A preprocessing pipeline is defined using `transforms.Compose`, which first converts the images to tensors and then normalizes them using the dataset’s **mean (0.1307)** and **standard deviation (0.3081)**. This normalization helps accelerate model convergence during training. Two data loaders are created: `train_loader` for batching and shuffling the training set, and `test_loader` for loading the test set without shuffling. A batch size of 64 is used for training and 1000 for evaluation to balance memory efficiency and computational throughput.

In [2]:
# Only for Microsoft GPU Support for non-CUDA/NVIDIA GPUs
#import torch_directml
#dml = torch_directml.device()

In [4]:
# Import necessary libraries for PyTorch and for data
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
#Tensorboard import
from torch.utils.tensorboard import SummaryWriter

In [5]:
# Initialize TensorBoard and save log file path
writer = SummaryWriter(log_dir='runs/mnist_cnn_base')

In [7]:
# 1. Load MNIST Data (10%) and preprocess MNIST dataset

# Convert images to tensors and normalize using mean and std dev
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_data = datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_data = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

train_loader = DataLoader(train_data, batch_size=64, shuffle=True)
test_loader = DataLoader(test_data, batch_size=1000, shuffle=False)

### 2. CNN Model Architecture (Bonus: Batch Normalization and Max Pooling)
###### The convolutional neural network (CNN) used in this project is implemented as a subclass of `nn.Module` and includes a convolutional feature extraction stack followed by a fully connected classification stack. The architecture satisfies the assignment's base requirement of **three convolutional layers** and **two fully connected layers**, and also incorporates the bonus portion of including **Batch Normalization** and **Max Pooling** in the CNN. The first convolutional layer transforms the 1-channel grayscale input into 32 feature maps, followed by batch normalization, ReLU activation, 2d max pooling to reduce the spatial dimensions to 14x14, followed by dropout to prevent overfitting. The second convolutional layer increases the depth to 64 channels and repeats the same sequence of operations, reducing the spatial size to 7×7. The third convolutional layer outputs 128 channels, followed by batch normalization, ReLU, and dropout without additional pooling. The fully connected stack then flattens the output and reduces it with a ReLU layer and dropout, and then maps to the final output layer with 10 neurons—one for each MNIST digit class. 

In [11]:
# 2a. Write CNN Model (40%) with 3 Conv Layers, Dropout, and 2 Fully Connected Layers
# Bonus: Include Max_Pooling and Batch Normalization into CNN 
class MNIST_CNN(nn.Module):
    def __init__(self):
        super(MNIST_CNN, self).__init__()
        self.conv_stack = nn.Sequential(
            #Conv Layer 1
            nn.Conv2d(1, 32, kernel_size=3, padding=1), 
            nn.BatchNorm2d(32),  #Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(2),     #Max_Pooling
            nn.Dropout(0.25),    #Dropout

            #Conv Layer 2
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),  #Batch Normalization
            nn.ReLU(),
            nn.MaxPool2d(2),     #Max_Pooling 
            nn.Dropout(0.25),    #Dropout

            #Conv Layer 3
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128), #Batch Normalization  
            nn.ReLU(),
            nn.Dropout(0.25)
        )

        self.fc_stack = nn.Sequential(
            nn.Flatten(),
            nn.Linear(128 * 7 * 7, 256),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(256, 10)
        )

    def forward(self, x):
        x = self.conv_stack(x)
        x = self.fc_stack(x)
        return x

### 3. Model Setup, Loss Function, and ADAM Optimizer
###### To prepare the model for training, the script first checks whether a CUDA-capable GPU is available using `torch.cuda.is_available()`. If so, the model is moved to the GPU to accelerate training; otherwise, it runs on the CPU. This ensures the code is portable and can leverage available hardware. The loss function used is `CrossEntropyLoss`, which is appropriate for multi-class classification problems like MNIST, where the goal is to assign each input image to one of 10 possible digit classes (0 through 9). For optimization, the model uses the **ADAM optimizer** with a learning rate of *0.001*. ADAM is widely used due to its adaptive learning rate behavior and ability to converge efficiently, especially in deep learning models.

In [14]:
# 3a. Setup Device -- Move model to GPU if available
#AMD DirectML Support for non-NVIDIA GPUs use either torch_DirectML or torch.device('cuda'...
#device = torch_directml.device() 
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MNIST_CNN().to(device)
print(f"Using Device: {device}")

Using Device: cpu


In [16]:
# 3b. Define Loss Function (10%) and Optimizer
criterion = nn.CrossEntropyLoss()                        # Suitable for multi-class classification
optimizer = optim.Adam(model.parameters(), lr=0.001)     # ADAM Optimizer

### 4. Training and Testing Loops with TensorBoard Logging 
#### 4.1 Training Loop
###### The training loop is defined in a function that takes the model, data loader, loss function, optimizer, and epoch number as input. It sets the model to training mode using model.train() and initializes a running total for the loss. For each batch of data, the images and labels are moved to the appropriate computation device (GPU or CPU). The model performs a forward pass to generate predictions, computes the loss using the criterion (CrossEntropyLoss), performs backpropagation with loss.backward(), and updates the weights via optimizer.step(). The batch loss is accumulated to calculate the average loss at the end of each epoch. Additionally, the loss from each batch is logged to TensorBoard using writer.add_scalar("Loss/train", ...), where the global_step uniquely identifies each training iteration. Every 100 batches, the current loss is printed to the console for real-time monitoring. At the end of each epoch, the average loss is also logged to TensorBoard under the tag "Loss/epoch_avg" to help visualize the trend across epochs.

In [19]:
# 4. Train and Test Model (40%)

# 4.1 Training loop
def train(model, loader, criterion, optimizer, epoch):
    model.train()
    total_loss = 0
    for batch_idx, (images, labels) in enumerate(loader):
        images, labels = images.to(device), labels.to(device)

        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

        global_step = epoch * len(loader) + batch_idx
        writer.add_scalar("Loss/train", loss.item(), global_step)  # Tensorboard log loss

        if batch_idx % 100 == 0:
           print(f"Epoch {epoch+1}, Batch {batch_idx} - Loss: {loss.item():.4f}")

    avg_loss = total_loss / len(loader)
    writer.add_scalar("Loss/epoch_avg", avg_loss, epoch)     # Tensorbaord log avg loss per epoch
    print(f"Epoch {epoch+1} Avg Loss: {avg_loss:.4f}")

#### 4.2 Testing Loop
###### The testing loop evaluates the model’s performance on the test dataset at the end of each epoch. The model is set to evaluation mode using `model.eval()`, which disables dropout and batch normalization updates to ensure consistent predictions. Gradient calculations are also disabled using `torch.no_grad()` to reduce memory usage and speed up computation. For each batch of test data, images and labels are moved to the appropriate device. The model makes predictions, and the class with the highest score for each image is selected using `torch.max()`. The number of correct predictions is accumulated across all batches to calculate the total accuracy. After the full test set is processed, accuracy is computed as a percentage and printed to the console. The result is also logged to **TensorBoard** under the tag `"Accuracy/test"` using `writer.add_scalar`, enabling performance visualization across epochs.

In [22]:
#4.2 Testing loop
def test(model, loader, epoch):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)

    # Accuracy logging setup
    accuracy = 100 * correct / total
    writer.add_scalar("Accuracy/test", accuracy, epoch)
    print(f"Test Accuracy: {accuracy:.2f}%")
    return accuracy

### 5. Running Training and Evaluation 
###### The code executes the training and testing process across multiple epochs using a `for` loop. For each epoch, the model is first trained on the MNIST training dataset using the `train()` function, then evaluated on the test dataset using the `test()` function. This cycle repeats for a total of *5 epochs*, although the number can be adjusted as needed by modifying the `range(5)` statement. After the training and evaluation are complete, the **TensorBoard** `SummaryWriter` is properly closed using `writer.close()` to ensure that all logged metrics—such as training loss and test accuracy—are flushed to disk and available for visualization. This structure ensures that model performance improves iteratively while also capturing key metrics for analysis and reporting.

In [25]:
# 5. Run training and testing
for epoch in range(5): # change number of epochs if necessary for more or less training
    train(model, train_loader, criterion, optimizer, epoch)
    test(model, test_loader, epoch)

# Closes TensorBoard writer
writer.close()

Epoch 1, Batch 0 - Loss: 2.3146
Epoch 1, Batch 100 - Loss: 0.3607
Epoch 1, Batch 200 - Loss: 0.2615
Epoch 1, Batch 300 - Loss: 0.1910
Epoch 1, Batch 400 - Loss: 0.0982
Epoch 1, Batch 500 - Loss: 0.0920
Epoch 1, Batch 600 - Loss: 0.0364
Epoch 1, Batch 700 - Loss: 0.2402
Epoch 1, Batch 800 - Loss: 0.0946
Epoch 1, Batch 900 - Loss: 0.1115
Epoch 1 Avg Loss: 0.1862
Test Accuracy: 98.45%
Epoch 2, Batch 0 - Loss: 0.0127
Epoch 2, Batch 100 - Loss: 0.0599
Epoch 2, Batch 200 - Loss: 0.0770
Epoch 2, Batch 300 - Loss: 0.0606
Epoch 2, Batch 400 - Loss: 0.0867
Epoch 2, Batch 500 - Loss: 0.0855
Epoch 2, Batch 600 - Loss: 0.0258
Epoch 2, Batch 700 - Loss: 0.0756
Epoch 2, Batch 800 - Loss: 0.0364
Epoch 2, Batch 900 - Loss: 0.1042
Epoch 2 Avg Loss: 0.0876
Test Accuracy: 98.84%
Epoch 3, Batch 0 - Loss: 0.1534
Epoch 3, Batch 100 - Loss: 0.0519
Epoch 3, Batch 200 - Loss: 0.0320
Epoch 3, Batch 300 - Loss: 0.1527
Epoch 3, Batch 400 - Loss: 0.0414
Epoch 3, Batch 500 - Loss: 0.0285
Epoch 3, Batch 600 - Loss: 0

#### 5.1 MNIST Experiment Results and TensorBoard Scalar Plots 
###### The CNN model trained on the MNIST dataset showed strong convergence and generalization across 5 epochs. The training loss decreased from **0.1862** in epoch 1 to **0.0513** in epoch 5, while test accuracy improved steadily from **98.45%** to **99.27%**.

### 📈 MNIST CNN Scalar Plots
![Training Loss](MNIST_TB_Results_2.png)

### 6. ResNet-18 on CIFAR-100 (Bonus)
#### 6.1 Model Setup and Pretrained Weights
###### To implement this, we pretrained ResNet-18 model was imported from `torchvision.models`. The model was initialized using `weights=ResNet18_Weights.DEFAULT`, which loads the latest ImageNet-trained parameters. The final fully connected layer of ResNet-18 was modified to output 100 logits, corresponding to the 100 classes in the *CIFAR-100* dataset.

In [46]:
from torchvision.models import resnet18, ResNet18_Weights

In [48]:
# 1. Load ResNet-18 and modify the final layer
# Load pretrained ResNet-18 with default ImageNet weights
resnet18 = resnet18(weights=ResNet18_Weights.DEFAULT)
resnet18.fc = nn.Linear(512, 100)  # for CIFAR-100
resnet18 = resnet18.to(device)

#### 6.2 Transformations 
###### The CIFAR-100 dataset, which contains 60,000 color images of size 32×32 across 100 object classes, was loaded using torchvision.datasets.CIFAR100. Each image was normalized using the dataset-specific mean and standard deviation: mean=(0.5071, 0.4865, 0.4409) and std=(0.2673, 0.2564, 0.2762). The training and test sets were batched using PyTorch’s DataLoader, with a batch size of 128 for training and 100 for testing.

In [51]:
# 2. Load CIFAR-100 dataset
transform_cifar = transforms.Compose([
    transforms.Resize(224), #resize CIFAR-100 to match ResNet input in order to boost test accuracy
    transforms.ToTensor(),
    transforms.Normalize((0.5071, 0.4865, 0.4409), (0.2673, 0.2564, 0.2762))  # CIFAR-100 mean/std
])

cifar_train = datasets.CIFAR100(root='./data', train=True, download=True, transform=transform_cifar)
cifar_test = datasets.CIFAR100(root='./data', train=False, download=True, transform=transform_cifar)

cifar_train_loader = DataLoader(cifar_train, batch_size=128, shuffle=True)
cifar_test_loader = DataLoader(cifar_test, batch_size=100, shuffle=False)

Files already downloaded and verified
Files already downloaded and verified


#### 6.3 Loss Function and Optimizer
###### The model was trained using the ADAM optimizer with a learning rate of 0.001, and CrossEntropyLoss as the objective function. These settings are consistent with the base configuration used in the MNIST CNN model. Training and evaluation were logged using a separate TensorBoard writer (runs/cifar100_resnet18) to isolate CIFAR-specific metrics from the MNIST logs.

In [54]:
# 3. Loss function and optimizer
cifar_criterion = nn.CrossEntropyLoss()
cifar_optimizer = optim.Adam(resnet18.parameters(), lr=0.001)

In [56]:
# 4. TensorBoard Writer for CIFAR-100
cifar_writer = SummaryWriter(log_dir='runs/cifar100_resnet18')

#### 6.4 Training Loop
###### Each training epoch looped through the training data and performed a standard forward pass, loss computation, backward pass, and parameter update. Individual batch losses were logged to TensorBoard using the "Loss/train" tag, while epoch-level average loss was logged under "Loss/epoch_avg". This allows for real-time visualization of convergence behavior across training iterations.

In [58]:
# 5. Train and test for CIFAR-100
def train_cifar(model, loader, criterion, optimizer, epoch):
    model.train()
    total_loss = 0
    for batch_idx, (images, labels) in enumerate(loader):
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()

        global_step = epoch * len(loader) + batch_idx
        cifar_writer.add_scalar("Loss/train", loss.item(), global_step)

    avg_loss = total_loss / len(loader)
    cifar_writer.add_scalar("Loss/epoch_avg", avg_loss, epoch)
    print(f"[CIFAR] Epoch {epoch+1} Avg Loss: {avg_loss:.4f}")

def test_cifar(model, loader, epoch):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for images, labels in loader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, preds = torch.max(outputs, 1)
            correct += (preds == labels).sum().item()
            total += labels.size(0)
    acc = 100 * correct / total
    cifar_writer.add_scalar("Accuracy/test", acc, epoch)
    print(f"[CIFAR] Test Accuracy: {acc:.2f}%")
    return acc

#### 6.5 Testing Loop
###### At the end of each epoch, the model was evaluated on the test set using the `test_cifar()` function. The accuracy was calculated by comparing predicted class labels with the true labels and was logged to TensorBoard under the `"Accuracy/test"` tag. A conditional print statement highlighted when accuracy surpassed the 70% threshold.

In [60]:
# 6. Run training + testing loop
for epoch in range(25):  # May need 20+ epochs to hit 70%
    train_cifar(resnet18, cifar_train_loader, cifar_criterion, cifar_optimizer, epoch)
    acc = test_cifar(resnet18, cifar_test_loader, epoch)
    if acc >= 70:
        print(f"70%+ Accuracy Achieved {acc:.2f}% at epoch {epoch+1}")

cifar_writer.close()

[CIFAR] Epoch 1 Avg Loss: 1.7435
[CIFAR] Test Accuracy: 60.48%
[CIFAR] Epoch 2 Avg Loss: 0.9929
[CIFAR] Test Accuracy: 65.13%
[CIFAR] Epoch 3 Avg Loss: 0.6672
[CIFAR] Test Accuracy: 66.29%
[CIFAR] Epoch 4 Avg Loss: 0.4260
[CIFAR] Test Accuracy: 66.92%
[CIFAR] Epoch 5 Avg Loss: 0.2832
[CIFAR] Test Accuracy: 66.54%
[CIFAR] Epoch 6 Avg Loss: 0.1904
[CIFAR] Test Accuracy: 69.22%
[CIFAR] Epoch 7 Avg Loss: 0.1570
[CIFAR] Test Accuracy: 68.06%
[CIFAR] Epoch 8 Avg Loss: 0.1505
[CIFAR] Test Accuracy: 66.51%
[CIFAR] Epoch 9 Avg Loss: 0.1325
[CIFAR] Test Accuracy: 66.97%
[CIFAR] Epoch 10 Avg Loss: 0.1255
[CIFAR] Test Accuracy: 68.03%
[CIFAR] Epoch 11 Avg Loss: 0.1164
[CIFAR] Test Accuracy: 68.73%
[CIFAR] Epoch 12 Avg Loss: 0.0940
[CIFAR] Test Accuracy: 68.17%
[CIFAR] Epoch 13 Avg Loss: 0.0923
[CIFAR] Test Accuracy: 67.81%
[CIFAR] Epoch 14 Avg Loss: 0.0791
[CIFAR] Test Accuracy: 67.91%
[CIFAR] Epoch 15 Avg Loss: 0.0811
[CIFAR] Test Accuracy: 66.34%
[CIFAR] Epoch 16 Avg Loss: 0.0778
[CIFAR] Test Ac

#### 6.6 CIFAR-100 Experiment Results and TensorBoard Scalar Plots
###### The ResNet-18 model was successfully trained on the CIFAR-100 dataset with a modified fully connected output layer (`Linear(512, 100)`) and pretrained ImageNet weights (`ResNet18_Weights.DEFAULT`). CIFAR-100 images were resized to 224×224 to match ResNet-18’s expected input dimensions. The model was trained using the ADAM optimizer with a learning rate of 0.001 and evaluated across **25 epochs**. The training process was monitored using TensorBoard, and the model achieved a peak test accuracy of **70.09%** at epoch 24.

### 📈 CIFAR-100 ResNet-18 Scalar Plots
![Training Loss](CIFAR_TB_Results.png)