# HW1 - Exploring MLPs with PyTorch

# Problem 1: Simple MLP for Binary Classification
In this problem, you will train a simple MLP to classify two handwritten digits: 0 vs 1. We provide some starter codes to do this task with steps. However, you do not need to follow the exact steps as long as you can complete the task in sections marked as <span style="color:red">[YOUR TASK]</span>.

## Dataset Setup
We will use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). The `torchvision` package has supported this dataset. We can load the dataset in this way (the dataset will take up 63M of your disk space):

# HW1 - Exploring MLPs with PyTorch

In [1]:
import torch
from torchvision import transforms, datasets
import numpy as np
import pandas as pd
import sklearn
import torch.nn as nn


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
import platform, time
print(platform.mac_ver() )
torch.has_mps

('14.2.1', ('', '', ''), 'arm64')


  torch.has_mps


True

In [3]:
device = torch.device('cpu')

In [4]:
# if not torch.backends.mps.is_available():
#     if not torch.backends.mps.is_built():
#         print("MPS not available because the current PyTorch install was not "
#               "built with MPS enabled.")
#     else:
#         print("MPS not available because the current MacOS version is not 12.3+ "
#               "and/or you do not have an MPS-enabled device on this machine.")
    
# else:
#     device = torch.device("mps")
#     print('mps enabled')

In [5]:
# define the data pre-processing
# convert the input to the range [-1, 1].
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize(0.5, 0.5)]
    )

# Load the MNIST dataset 
# this command requires Internet to download the dataset
mnist = datasets.MNIST(root='/Users/vashisth/Documents/GitHub/Intro_DL/IDL_hw1/data', 
                       train=True, 
                       download=True, 
                       transform=transform)
mnist_test = datasets.MNIST(root='/Users/vashisth/Documents/GitHub/Intro_DL/IDL_hw1/data',   # './data'
                            train=False, 
                            download=True, 
                            transform=transform)

In [6]:
from torch.utils.data import DataLoader, random_split

print("Frequencies: ", torch.bincount(mnist.targets))
print(len(torch.bincount(mnist.targets)))

Frequencies:  tensor([5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949])
10


In [7]:
# Split training data into training and validation sets
train_len = int(len(mnist) *.8)
val_len = len(mnist) - train_len
train_set, val_set = random_split(mnist, [train_len, val_len])

# Define DataLoaders to access data in batches
train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
val_loader = DataLoader(val_set, batch_size = 64, shuffle=False)
test_loader = DataLoader(mnist_test, batch_size = 64, shuffle=False)

In [8]:
train_len

48000

# Problem 2: MNIST 10-class classification

Now we want to train an MLP to handle multi-class classification for all 10 digits in the MNIST dataset. We will use the full MNIST dataset without filtering for specific digits. You may modify the MLP so that it can be used for multi-class classification.

<span style="color:red">[YOUR TASK]</span>
- Implement the training loop and evaluation section. Report the hyper-parameters you choose.
- Experiment with different numbers of neurons in the hidden layer and note any changes in performance.
- Write a brief analysis of the model's performance, including any challenges faced and how they were addressed.

In our implementations, we trained our network for 10 epochs in about 20 seconds on a laptop.
When you define a new model, remember to update the optimizer!



In [9]:
class MulticlassMLP(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super(MulticlassMLP, self).__init__()
        # Your code goes here
        self.fc1 = nn.Linear(in_dim, hidden_dim)
        self.activation = nn.Sigmoid()
        self.fc2 = nn.Linear(hidden_dim, out_dim)
        
    def forward(self, x):
        # Your code goes here
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x)
        
        return x

# Your code goes here
hidden_dim = int(np.sqrt(28*28*10))
model = MulticlassMLP(in_dim=28 * 28,
                  hidden_dim=hidden_dim,
                  out_dim=10).to(device)
print(model)

MulticlassMLP(
  (fc1): Linear(in_features=784, out_features=88, bias=True)
  (activation): Sigmoid()
  (fc2): Linear(in_features=88, out_features=10, bias=True)
)


In [10]:
def ten_digit(batch_size, hidden_dim, optimizer,  device = 'cpu'): # or mps lr=1e-3,
    device = torch.device(device)
    # Define DataLoaders to access data in batches
    train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
    # Your code goes here
    val_loader = DataLoader(val_set, batch_size = batch_size, shuffle=False)
    test_loader = DataLoader(mnist_test, batch_size = batch_size, shuffle=False)
    
    model = MulticlassMLP(in_dim=28 * 28,
                  hidden_dim=hidden_dim,
                  out_dim=10).to(device)
    # print(model)
    criterion = nn.CrossEntropyLoss()
    
    if optimizer == 'adam':
        lr = 1e-3
        optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    else:
        lr=1e-2
        optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    
    num_epochs = 10
    # training
    start_time = time.time()
    for epoch in range(num_epochs):
        correct, count = 0, 0 
        for data, target in train_loader:
            # free the gradient from the previous batch
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            # reshape the image into a vector
            data = data.view(data.size(0), -1)
            # model forward
            output = model(data)
            # compute the loss
            loss = criterion(output, target)
            # model backward
            loss.backward()
            # update the model paramters
            optimizer.step()
            
            # adding this for train accuracy 
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            count += data.size(0)
        
        train_acc = 100. * correct / count
        # print(f'Training accuracy: {train_acc:.2f}%')
        print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
    
    training_time = time.time()- start_time
    # print(training_time)
    
    # validation
    val_loss = count = 0
    correct = total = 0
    for data, target in val_loader:
        data, target = data.to(device), target.to(device)
        data = data.view(data.size(0), -1)
        output = model(data)
        val_loss += criterion(output, target).item()
        count += 1
        pred = output.argmax(dim=1)
        correct += (pred == target).sum().item()
        total += data.size(0)
        
    val_loss = val_loss / count
    val_acc = 100. * correct / total
    # print(f'Validation loss: {val_loss:.2f}, accuracy: {val_acc:.2f}%')
    
    # test
    model.eval()
    correct = total = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            output = model(data)
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            total += data.size(0)
            
    test_acc = 100. * correct / total
    # print(f'Test Accuracy: {test_acc:.2f}%')
    print('hyperopt run done')
    return training_time, train_acc, val_acc, test_acc

In [11]:
import pandas as pd

results = []
devices = ['cpu']
batch_sizes = [64, 128, 1024]
optimizers = ['adam', 'sgd']
# learning_rates= [1e-4, 1e-3, 1e-2, 1e-1]
hidden_dims = [4, 32, 64, 128]

for batch_size in batch_sizes:
    for optimizer in optimizers:
        for device in devices:
            for hidden_dim in hidden_dims:
                training_time, train_acc, val_acc, test_acc = ten_digit(batch_size=batch_size, 
                                                                        optimizer=optimizer,
                                                                        hidden_dim=hidden_dim,
                                                                        # lr = lr, 
                                                                        device=device )
                lr = 1e-3 if optimizer=='adam' else 1e-2
                print([device, batch_size, optimizer, lr, hidden_dim,  training_time, train_acc, val_acc, test_acc])
                results.append([device, batch_size, optimizer, lr, hidden_dim,  training_time, train_acc, val_acc, test_acc])



headers = ['Device', 'Batch size', 'Optimizer', 'LR', 'Hidden Dim', 
           'Training Time', 'Train Acc', 'Val Acc', 'Test Acc']
df = pd.DataFrame(results, columns=headers)

Epoch 1, Loss: 1.4778
Epoch 2, Loss: 1.1644
Epoch 3, Loss: 1.1134
Epoch 4, Loss: 0.7658
Epoch 5, Loss: 0.7291
Epoch 6, Loss: 0.8017
Epoch 7, Loss: 0.5732
Epoch 8, Loss: 0.5302
Epoch 9, Loss: 0.5318
Epoch 10, Loss: 0.7484
hyperopt run done
['cpu', 64, 'adam', 0.001, 4, 20.82248091697693, 81.11666666666666, 80.5, 81.34]
Epoch 1, Loss: 0.4266
Epoch 2, Loss: 0.2298
Epoch 3, Loss: 0.1818
Epoch 4, Loss: 0.1993
Epoch 5, Loss: 0.2510
Epoch 6, Loss: 0.1186
Epoch 7, Loss: 0.1753
Epoch 8, Loss: 0.1715
Epoch 9, Loss: 0.2112
Epoch 10, Loss: 0.1112
hyperopt run done
['cpu', 64, 'adam', 0.001, 32, 21.138930320739746, 95.725, 94.58333333333333, 94.83]
Epoch 1, Loss: 0.2527
Epoch 2, Loss: 0.1573
Epoch 3, Loss: 0.2421
Epoch 4, Loss: 0.1623
Epoch 5, Loss: 0.1341
Epoch 6, Loss: 0.1033
Epoch 7, Loss: 0.2037
Epoch 8, Loss: 0.0764
Epoch 9, Loss: 0.1392
Epoch 10, Loss: 0.0966
hyperopt run done
['cpu', 64, 'adam', 0.001, 64, 22.073091745376587, 97.20833333333333, 95.56666666666666, 96.09]
Epoch 1, Loss: 0.2130

In [12]:
df.to_csv('sigmoid_hyperopt.csv')
df

Unnamed: 0,Device,Batch size,Optimizer,LR,Hidden Dim,Training Time,Train Acc,Val Acc,Test Acc
0,cpu,64,adam,0.001,4,20.822481,81.116667,80.5,81.34
1,cpu,64,adam,0.001,32,21.13893,95.725,94.583333,94.83
2,cpu,64,adam,0.001,64,22.073092,97.208333,95.566667,96.09
3,cpu,64,adam,0.001,128,22.682774,98.1,96.591667,96.87
4,cpu,64,sgd,0.01,4,19.016016,69.941667,70.083333,70.49
5,cpu,64,sgd,0.01,32,19.240644,89.883333,89.666667,90.33
6,cpu,64,sgd,0.01,64,19.526299,90.029167,89.866667,90.57
7,cpu,64,sgd,0.01,128,20.4487,90.0375,89.85,90.69
8,cpu,128,adam,0.001,4,18.58787,74.941667,73.933333,75.06
9,cpu,128,adam,0.001,32,18.807421,95.029167,94.341667,94.47


In [13]:
df.to_csv('sigmoid_hyperopt.csv')

In [17]:
df = pd.read_csv('/Users/vashisth/Documents/GitHub/Intro_DL/IDL_hw1/Question/Q2/sigmoid_hyperopt.csv')
latex_table = df.to_latex(index=False)
print(latex_table)

\begin{tabular}{rlrlrrrrrr}
\toprule
Unnamed: 0 & Device & Batch size & Optimizer & LR & Hidden Dim & Training Time & Train Acc & Val Acc & Test Acc \\
\midrule
0 & cpu & 64 & adam & 0.001000 & 4 & 20.822481 & 81.116667 & 80.500000 & 81.340000 \\
1 & cpu & 64 & adam & 0.001000 & 32 & 21.138930 & 95.725000 & 94.583333 & 94.830000 \\
2 & cpu & 64 & adam & 0.001000 & 64 & 22.073092 & 97.208333 & 95.566667 & 96.090000 \\
3 & cpu & 64 & adam & 0.001000 & 128 & 22.682774 & 98.100000 & 96.591667 & 96.870000 \\
4 & cpu & 64 & sgd & 0.010000 & 4 & 19.016016 & 69.941667 & 70.083333 & 70.490000 \\
5 & cpu & 64 & sgd & 0.010000 & 32 & 19.240644 & 89.883333 & 89.666667 & 90.330000 \\
6 & cpu & 64 & sgd & 0.010000 & 64 & 19.526299 & 90.029167 & 89.866667 & 90.570000 \\
7 & cpu & 64 & sgd & 0.010000 & 128 & 20.448700 & 90.037500 & 89.850000 & 90.690000 \\
8 & cpu & 128 & adam & 0.001000 & 4 & 18.587870 & 74.941667 & 73.933333 & 75.060000 \\
9 & cpu & 128 & adam & 0.001000 & 32 & 18.807421 & 95.029167

In [14]:
class MulticlassMLP(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super(MulticlassMLP, self).__init__()
        self.fc1 = nn.Linear(in_dim, hidden_dim)
        self.activation = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, out_dim)
        
    def forward(self, x):
        # Your code goes here
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x)
        
        return x

In [15]:
import pandas as pd

results = []
devices = ['cpu']
batch_sizes = [64, 128, 1024]
optimizers = ['adam', 'sgd']
# learning_rates= [1e-4, 1e-3, 1e-2, 1e-1]
hidden_dims = [4, 32, 64, 128]
for batch_size in batch_sizes:
    for optimizer in optimizers:
        for device in devices:
            for hidden_dim in hidden_dims:
                training_time, train_acc, val_acc, test_acc = ten_digit(batch_size=batch_size, 
                                                                        optimizer=optimizer,
                                                                        hidden_dim=hidden_dim,
                                                                        # lr = lr, 
                                                                        device=device )
                lr = 1e-3 if optimizer=='adam' else 1e-2
                print([device, batch_size, optimizer, lr, hidden_dim,  training_time, train_acc, val_acc, test_acc])
                results.append([device, batch_size, optimizer, lr, hidden_dim,  training_time, train_acc, val_acc, test_acc])



headers = ['Device', 'Batch size', 'Optimizer', 'LR', 'Hidden Dim', 
           'Training Time', 'Train Acc', 'Val Acc', 'Test Acc']
df = pd.DataFrame(results, columns=headers)
df.to_csv('relu_hyperopt_q2.csv')
df

Epoch 1, Loss: 2.0675
Epoch 2, Loss: 1.9844
Epoch 3, Loss: 1.7940
Epoch 4, Loss: 1.6219
Epoch 5, Loss: 1.6827
Epoch 6, Loss: 1.6457
Epoch 7, Loss: 1.6914
Epoch 8, Loss: 1.5235
Epoch 9, Loss: 1.7797
Epoch 10, Loss: 1.5669
hyperopt run done
['cpu', 64, 'adam', 0.001, 4, 22.202066898345947, 33.3625, 33.99166666666667, 34.13]
Epoch 1, Loss: 0.2392
Epoch 2, Loss: 0.2426
Epoch 3, Loss: 0.2713
Epoch 4, Loss: 0.4272
Epoch 5, Loss: 0.1802
Epoch 6, Loss: 0.2217
Epoch 7, Loss: 0.3104
Epoch 8, Loss: 0.0754
Epoch 9, Loss: 0.0838
Epoch 10, Loss: 0.0959
hyperopt run done
['cpu', 64, 'adam', 0.001, 32, 22.753937244415283, 95.64791666666666, 94.56666666666666, 95.12]
Epoch 1, Loss: 0.4181
Epoch 2, Loss: 0.0817
Epoch 3, Loss: 0.2968
Epoch 4, Loss: 0.2856
Epoch 5, Loss: 0.1629
Epoch 6, Loss: 0.1173
Epoch 7, Loss: 0.1449
Epoch 8, Loss: 0.0339
Epoch 9, Loss: 0.0786
Epoch 10, Loss: 0.0486
hyperopt run done
['cpu', 64, 'adam', 0.001, 64, 23.67895817756653, 97.19166666666666, 96.525, 96.8]
Epoch 1, Loss: 0.41

Unnamed: 0,Device,Batch size,Optimizer,LR,Hidden Dim,Training Time,Train Acc,Val Acc,Test Acc
0,cpu,64,adam,0.001,4,22.202067,33.3625,33.991667,34.13
1,cpu,64,adam,0.001,32,22.753937,95.647917,94.566667,95.12
2,cpu,64,adam,0.001,64,23.678958,97.191667,96.525,96.8
3,cpu,64,adam,0.001,128,24.853068,98.108333,96.425,96.53
4,cpu,64,sgd,0.01,4,21.19924,82.979167,82.091667,82.65
5,cpu,64,sgd,0.01,32,23.31456,92.93125,92.458333,92.94
6,cpu,64,sgd,0.01,64,24.6772,93.55,93.05,93.49
7,cpu,64,sgd,0.01,128,23.671364,93.885417,93.608333,94.1
8,cpu,128,adam,0.001,4,20.069331,65.714583,65.108333,66.38
9,cpu,128,adam,0.001,32,21.104044,93.7875,92.925,93.34


In [18]:
df = pd.read_csv('/Users/vashisth/Documents/GitHub/Intro_DL/IDL_hw1/Question/Q2/relu_hyperopt_q2.csv')
latex_table = df.to_latex(index=False)
print(latex_table)

\begin{tabular}{rlrlrrrrrr}
\toprule
Unnamed: 0 & Device & Batch size & Optimizer & LR & Hidden Dim & Training Time & Train Acc & Val Acc & Test Acc \\
\midrule
0 & cpu & 64 & adam & 0.001000 & 4 & 22.202067 & 33.362500 & 33.991667 & 34.130000 \\
1 & cpu & 64 & adam & 0.001000 & 32 & 22.753937 & 95.647917 & 94.566667 & 95.120000 \\
2 & cpu & 64 & adam & 0.001000 & 64 & 23.678958 & 97.191667 & 96.525000 & 96.800000 \\
3 & cpu & 64 & adam & 0.001000 & 128 & 24.853068 & 98.108333 & 96.425000 & 96.530000 \\
4 & cpu & 64 & sgd & 0.010000 & 4 & 21.199240 & 82.979167 & 82.091667 & 82.650000 \\
5 & cpu & 64 & sgd & 0.010000 & 32 & 23.314560 & 92.931250 & 92.458333 & 92.940000 \\
6 & cpu & 64 & sgd & 0.010000 & 64 & 24.677200 & 93.550000 & 93.050000 & 93.490000 \\
7 & cpu & 64 & sgd & 0.010000 & 128 & 23.671364 & 93.885417 & 93.608333 & 94.100000 \\
8 & cpu & 128 & adam & 0.001000 & 4 & 20.069331 & 65.714583 & 65.108333 & 66.380000 \\
9 & cpu & 128 & adam & 0.001000 & 32 & 21.104044 & 93.787500