# ECE 570 - Research Paper

I recommend buying the TPU for processing, even the given free GPU will not be sufficient to run this code in an acceptable time frame.


**These are the requirements**

torch==2.0.0

torchvision==0.15.0

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision import datasets, transforms
from torch import nn
import torch
import ssl
from torch.utils.data import Subset

In [None]:
if not torch.cuda.is_available():
  raise Exception("GPU not available. CPU training will be too slow.")

Exception: GPU not available. CPU training will be too slow.


This code defines an implementation of an MLP-Mixer model using PyTorch, a deep learning framework. The model consists of three main components: the MLPBlock, MixerBlock, and MLPMixer. The MLPBlock is a simple feedforward neural network module that applies a linear transformation followed by a GELU activation, dropout, and another linear transformation. The MixerBlock is composed of two MLPBlocks responsible for token-mixing and channel-mixing, separated by layer normalization. The MLPMixer class brings everything together, beginning with a patch embedding layer to convert image patches into a lower-dimensional representation, followed by a series of MixerBlock layers to capture relationships between patches and channels. The model concludes with a classification head that applies global average pooling and a final linear layer for class predictions. Throughout the forward pass, the model reshapes and processes the input tensor to learn spatial and channel-wise features, using print statements to track tensor shapes at various stages for debugging purposes.  //LLM PROMPT: Explain this code


In [None]:
class MLPBlock(nn.Module):
    def __init__(self, dim, hidden_dim):
        super(MLPBlock, self).__init__()
        self.fc1 = nn.Linear(dim, hidden_dim)
        self.gelu = nn.GELU()
        self.fc2 = nn.Linear(hidden_dim, dim)
        self.dropout = nn.Dropout(0.1)

    def forward(self, x):
        x = self.fc1(x)
        x = self.gelu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.dropout(x)
        return x


class MixerBlock(nn.Module):
    def __init__(self, num_patches, hidden_dim_token, hidden_dim_channel, embed_dim):
        super(MixerBlock, self).__init__()
        self.norm1 = nn.LayerNorm(embed_dim)
        self.token_mixing = MLPBlock(num_patches, hidden_dim_token)

        self.norm2 = nn.LayerNorm(embed_dim)
        self.channel_mixing = MLPBlock(embed_dim, hidden_dim_channel)

    def forward(self, x):
        # Token-Mixing MLP
        x = x + self.token_mixing(self.norm1(x).transpose(1, 2)).transpose(1, 2)

        # Channel-Mixing MLP
        x = x + self.channel_mixing(self.norm2(x))
        return x


class MLPMixer(nn.Module):
    def __init__(self, image_size, patch_size, num_classes, num_blocks, embed_dim, hidden_dim_token, hidden_dim_channel):
        super(MLPMixer, self).__init__()

        # Parameters
        self.num_patches = (image_size // patch_size) ** 2
        self.patch_dim = 3 * (patch_size ** 2)

        # Patch embedding layer
        self.patch_embed = nn.Linear(self.patch_dim, embed_dim)

        # Mixer blocks
        self.mixer_blocks = nn.Sequential(*[
            MixerBlock(self.num_patches, hidden_dim_token, hidden_dim_channel, embed_dim) for _ in range(num_blocks)
        ])

        # Classification head
        self.norm = nn.LayerNorm(embed_dim)
        self.fc = nn.Linear(embed_dim, num_classes)

    def forward(self, x):
        print(f"MLPMixer Input shape: {x.shape}")

        # Ensure the input is resized to 224x224, then unfold into patches
        x = x.unfold(2, 16, 16).unfold(3, 16, 16)  # Unfold into patches
        print(f"After unfolding patches: {x.shape}")

        x = x.contiguous().view(x.size(0), self.num_patches, -1)  # Flatten patches
        print(f"After flattening patches: {x.shape}")

        # Patch embedding
        x = self.patch_embed(x)
        print(f"After patch embedding: {x.shape}")

        # Pass through mixer blocks
        x = self.mixer_blocks(x)
        print(f"After mixer blocks: {x.shape}")

        # Classification head
        x = self.norm(x)
        x = x.mean(dim=1)  # Global average pooling
        print(f"After pooling: {x.shape}")

        return self.fc(x)

This code defines a convolutional neural network (CNN) model in PyTorch, implemented as a subclass of nn.Module. The model is composed of four convolutional layers (conv1, conv2, conv3, and conv4), each followed by a ReLU activation function and a max pooling layer (pool). The convolutional layers progressively increase the number of feature maps (64, 128, 256, and 512), while reducing the spatial dimensions of the input through pooling. After the convolutional layers, the feature maps are flattened into a 1D vector and passed through two fully connected layers (fc1 and fc2), with the first fully connected layer being followed by a ReLU activation and dropout for regularization. The output layer (fc2) produces the final class predictions, with the number of output units determined by num_classes. The forward method also includes optional debug prints that display the shape of the tensor after each major operation (convolution, pooling, flattening, and output), which is useful for debugging and understanding how the data flows through the network. The model assumes an input image size of 224x224 pixels, which is flattened to 14x14 after pooling, and calculates the appropriate size for the fully connected layer input.  \\LLM Propmt: Expalin this code in a paragraph


In [None]:
class CNNModel(nn.Module):
    def __init__(self, num_classes):
        super(CNNModel, self).__init__()

        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1)
        self.conv2 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.conv3 = nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1)
        self.conv4 = nn.Conv2d(256, 512, kernel_size=3, stride=1, padding=1)

        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0)

        self.fc1 = nn.Linear(512 * 14 * 14, 1024)  # This assumes the input image is 224x224
        self.fc2 = nn.Linear(1024, num_classes)

        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def forward(self, x, print_freq=1, batch_idx=0):
        if batch_idx % print_freq == 0:
            print(f"CNN Input shape: {x.shape}")
        x = self.relu(self.conv1(x))
        x = self.pool(x)
        if batch_idx % print_freq == 0:
            print(f"After conv1: {x.shape}")

        x = self.relu(self.conv2(x))
        x = self.pool(x)
        if batch_idx % print_freq == 0:
            print(f"After conv2: {x.shape}")

        x = self.relu(self.conv3(x))
        x = self.pool(x)
        if batch_idx % print_freq == 0:
            print(f"After conv3: {x.shape}")

        x = self.relu(self.conv4(x))
        x = self.pool(x)
        if batch_idx % print_freq == 0:
            print(f"After conv4: {x.shape}")

        # Flatten the output for the fully connected layers
        x = x.view(x.size(0), -1)
        if batch_idx % print_freq == 0:
            print(f"After flattening: {x.shape}")

        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        if batch_idx % print_freq == 0:
            print(f"Output shape: {x.shape}")

        return x

This code snippet outlines a machine learning pipeline to train and evaluate two different models on the CIFAR-10 dataset. It begins by defining a set of transformations for preprocessing the input images (resizing, normalization, and conversion to tensors). The CIFAR-10 dataset is then loaded for both training and testing, with a subset of 10 training examples selected randomly for training. Two models are initialized: a CNN (Convolutional Neural Network) and an MLP-Mixer (a model that uses mixing operations on patches of the input image). The loss function used is Cross-Entropy Loss, which is suitable for multi-class classification, and both models use the Adam optimizer for training.

In the training loop, the models are trained for 3 epochs. For each epoch, the train function computes the training loss and accuracy, while the evaluate function evaluates the models' performance on the test set. After each epoch, the training and testing losses and accuracies for both the CNN and MLP-Mixer models are printed. The training loop updates model weights based on the computed gradients, and the evaluation loop uses the models in evaluation mode (without gradient updates) to calculate performance metrics. The results are shown after each epoch, providing an overview of how well each model performs in terms of both loss and accuracy on the training and test sets.

In [None]:
# Define dataset and dataloaders
transform = transforms.Compose([
    transforms.Resize(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
subset_size = 10
indices = torch.randperm(len(train_dataset)).tolist()[:subset_size]
train_subset = Subset(train_dataset, indices)
train_loader = DataLoader(train_subset, batch_size=32, shuffle=True)

# Initialize models
cnn_model = CNNModel(num_classes=10)
mixer_model = MLPMixer(image_size=224, patch_size=16, num_classes=10, num_blocks=2, embed_dim=128, hidden_dim_token=64, hidden_dim_channel=512)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
cnn_optimizer = optim.Adam(cnn_model.parameters(), lr=0.001)
mixer_optimizer = optim.Adam(mixer_model.parameters(), lr=0.001)

test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

# device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# cnn_model = cnn_model.to(device)
# mixer_model = mixer_model.to(device)

# Training loop
def train(model, dataloader, optimizer, criterion):
    model.train()
    total_loss = 0
    correct = 0
    total = 0

    for images, labels in dataloader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        total_loss += loss.item()
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    return total_loss / len(dataloader), accuracy

# Evaluation loop
def evaluate(model, dataloader, criterion):
    model.eval()
    total_loss = 0
    correct = 0
    total = 0

    with torch.no_grad():
        for images, labels in dataloader:
            outputs = model(images)
            loss = criterion(outputs, labels)

            total_loss += loss.item()
            _, predicted = torch.max(outputs, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    accuracy = 100 * correct / total
    return total_loss / len(dataloader), accuracy

# Train and evaluate both models
for epoch in range(3):
    cnn_train_loss, cnn_train_acc = train(cnn_model, train_loader, cnn_optimizer, criterion)
    cnn_test_loss, cnn_test_acc = evaluate(cnn_model, test_loader, criterion)

    mixer_train_loss, mixer_train_acc = train(mixer_model, train_loader, mixer_optimizer, criterion)
    mixer_test_loss, mixer_test_acc = evaluate(mixer_model, test_loader, criterion)

    print(f'Epoch {epoch+1}')
    print(f'CNN Train Loss: {cnn_train_loss:.4f}, CNN Train Accuracy: {cnn_train_acc:.2f}%')
    print(f'CNN Test Loss: {cnn_test_loss:.4f}, CNN Test Accuracy: {cnn_test_acc:.2f}%')

    print(f'Mixer Train Loss: {mixer_train_loss:.4f}, Mixer Train Accuracy: {mixer_train_acc:.2f}%')
    print(f'Mixer Test Loss: {mixer_test_loss:.4f}, Mixer Test Accuracy: {mixer_test_acc:.2f}%')

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Output shape: torch.Size([32, 10])
CNN Input shape: torch.Size([32, 3, 224, 224])
After conv1: torch.Size([32, 64, 112, 112])
After conv2: torch.Size([32, 128, 56, 56])
After conv3: torch.Size([32, 256, 28, 28])
After conv4: torch.Size([32, 512, 14, 14])
After flattening: torch.Size([32, 100352])
Output shape: torch.Size([32, 10])
CNN Input shape: torch.Size([32, 3, 224, 224])
After conv1: torch.Size([32, 64, 112, 112])
After conv2: torch.Size([32, 128, 56, 56])
After conv3: torch.Size([32, 256, 28, 28])
After conv4: torch.Size([32, 512, 14, 14])
After flattening: torch.Size([32, 100352])
Output shape: torch.Size([32, 10])
CNN Input shape: torch.Size([32, 3, 224, 224])
After conv1: torch.Size([32, 64, 112, 112])
After conv2: torch.Size([32, 128, 56, 56])
After conv3: torch.Size([32, 256, 28, 28])
After conv4: torch.Size([32, 512, 14, 14])
After flattening: torch.Size([32, 100352])
Output shape: torch.Size([32, 10])
CNN In