# HW1 - Exploring MLPs with PyTorch

# Problem 1: Simple MLP for Binary Classification
In this problem, you will train a simple MLP to classify two handwritten digits: 0 vs 1. We provide some starter codes to do this task with steps. However, you do not need to follow the exact steps as long as you can complete the task in sections marked as <span style="color:red">[YOUR TASK]</span>.

## Dataset Setup
We will use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/). The `torchvision` package has supported this dataset. We can load the dataset in this way (the dataset will take up 63M of your disk space):

# HW1 - Exploring MLPs with PyTorch

# Problem 3: Handling Class Imbalance in MNIST Dataset
In this problem, we will explore how to handle class imbalance problems, which are very common in real-world applications. A modified MNIST dataset is created as follows: we choose all instances of digit “0”, and choose only 1\% instances of digit “1” for both training and test sets:

For such a class imbalance problem, accuracy may not be a good metric. Always predicting "0" regardless of the input can be 99\% accurate. Instead, we use the $F_1$ score as the evaluation metric:
$$F_1 = 2\cdot\frac{\text{precision}\cdot \text{recall}}{\text{precision} + \text{recall}}$$
where precision and recall are defined as:
$$\text{precision}=\frac{\text{number of instances correctly predicted as "1"}}{\text{number of instances predicted as "1"}}$$
$$\text{recall}=\frac{\text{number of instances correctly predicted as "1"}}{\text{number of instances labeled as "1"}}$$

To handle such a problem, some changes to the training may be necessary. Some suggestions include: 
1) Adjusting the class weights in the loss function, i.e., use a larger weight for the minority class when computing the loss.
2) Implementing resampling techniques (either undersampling the majority class or oversampling the minority class).

<span style="color:red">[YOUR TASK]</span>
- Create the imbalance datasets with all "0" digits and only 1\% "1" digits.
- Implement the training loop and evaluation section (implementing the $F_1$ metric). 
- Ignore the class imbalance problem and train the MLP. Report your hyper-parameter details and the $F_1$ score performance on the test set (as the baseline).
- Explore modifications to improve the performance of the class imbalance problem. Report your modifications and the $F_1$ scores performance on the test set.

In [4]:

import torch
from torchvision import transforms, datasets
import numpy as np
import pandas as pd
import sklearn
import torch.nn as nn
import time
from IPython.display import display

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [5]:
device = torch.device('cpu')

In [6]:
# define the data pre-processing
# convert the input to the range [-1, 1].
transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize(0.5, 0.5)]
    )

# Load the MNIST dataset 
# this command requires Internet to download the dataset
mnist = datasets.MNIST(root='/Users/vashisth/Documents/GitHub/Intro_DL/IDL_hw1/data', 
                       train=True, 
                       download=True, 
                       transform=transform)
mnist_test = datasets.MNIST(root='/Users/vashisth/Documents/GitHub/Intro_DL/IDL_hw1/data',   # './data'
                            train=False, 
                            download=True, 
                            transform=transform)

In [7]:
from torch.utils.data import DataLoader, random_split

print("Frequencies: ", torch.bincount(mnist.targets))
print(len(torch.bincount(mnist.targets)))

Frequencies:  tensor([5923, 6742, 5958, 6131, 5842, 5421, 5918, 6265, 5851, 5949])
10


In [8]:
# Define your MLP
class SimpleMLP(nn.Module):
    def __init__(self, in_dim, hidden_dim, out_dim):
        super(SimpleMLP, self).__init__()
        # Your code goes here
        self.fc1 = nn.Linear(in_dim, hidden_dim)
        self.activation = nn.ReLU()
        self.fc2 = nn.Linear(hidden_dim, out_dim)
        
    def forward(self, x):
        # Your code goes here
        x = self.fc1(x)
        x = self.activation(x)
        x = self.fc2(x)
        return x

# Your code goes here
hidden_dim = 4
model = SimpleMLP(in_dim=28 * 28,
                  hidden_dim=hidden_dim,
                  out_dim=2).to(device)
print(model)

SimpleMLP(
  (fc1): Linear(in_features=784, out_features=4, bias=True)
  (activation): ReLU()
  (fc2): Linear(in_features=4, out_features=2, bias=True)
)


In [9]:
# Your code goes here
def precision_score(labels, predictions):
    predictions, labels = np.array(labels), np.array(predictions)
    predictions_1 = np.sum(predictions==1)
    correct_1 = np.sum( (predictions==1) & (labels==1))
    precision = correct_1/ predictions_1 if predictions_1 > 0 else 1e-6
    return precision

def recall_score(labels, predictions):
    predictions, labels = np.array(labels), np.array(predictions)
    correct_1 = np.sum( (predictions==1) & (labels==1))
    labels_1 = np.sum(labels==1)
    recall = correct_1/ labels_1 if labels_1 > 0 else 1e-6
    return recall

def f1_score(labels, predictions):
    precision = precision_score(labels, predictions)
    recall = recall_score(labels, predictions)
    f1 = (2 * (recall * precision)) / (precision + recall)
    return f1

In [10]:
def two_digit(weight, batch_size=64):
    model = SimpleMLP(in_dim=28 * 28,
                  hidden_dim=hidden_dim,
                  out_dim=2).to(device)
    
    criterion = nn.CrossEntropyLoss(weight = weight)
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
    num_epochs = 10
    
    # training
    start_time = time.time()
    for epoch in range(num_epochs):
        correct, count = 0, 0 
        for data, target in train_loader:
            # free the gradient from the previous batch
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            # reshape the image into a vector
            data = data.view(data.size(0), -1)
            # model forward
            output = model(data)
            # compute the loss
            loss = criterion(output, target)
            # model backward
            loss.backward()
            # update the model paramters
            optimizer.step()
            
            # adding this for train accuracy 
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            count += data.size(0)
        
        train_acc = 100. * correct / count
        # print(f'Training accuracy: {train_acc:.2f}%')

    training_time = time.time()- start_time
    # print(training_time)
    
    # validation
    val_loss = count = 0
    correct = total = 0
    val_preds = []; val_labels=[]
    for data, target in val_loader:
        data, target = data.to(device), target.to(device)
        data = data.view(data.size(0), -1)
        output = model(data)
        val_loss += criterion(output, target).item()
        count += 1
        pred = output.argmax(dim=1)
        correct += (pred == target).sum().item()
        total += data.size(0)
        val_preds.append(pred)
        val_labels.append(target)
        # print(type(target))

    val_preds = torch.cat(val_preds).numpy()
    val_labels = torch.cat(val_labels).numpy()
    assert len(val_preds) == len(val_set)
    
    val_loss = val_loss / count
    val_acc = 100. * correct / total
    # print(f'Validation loss: {val_loss:.2f}, accuracy: {val_acc:.2f}%')
    f1_validation = f1_score(labels = val_labels, predictions = val_preds)
    # print(f'F1 score validation: {f1_validation:.2f}')
    
    # test
    model.eval()
    correct = total = 0
    test_preds = []; test_labels=[]

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            output = model(data)
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            total += data.size(0)
            test_preds.append(pred)
            test_labels.append(target)
        
    test_preds = torch.cat(test_preds).numpy()
    test_labels = torch.cat(test_labels).numpy()
    assert len(test_preds) == len(test_set)   
    test_acc = 100. * correct / total
    # print(f'Test Accuracy: {test_acc:.2f}%')
    # print(f'Validation loss: {val_loss:.2f}, accuracy: {val_acc:.2f}%')
    f1_test = f1_score(labels = test_labels, predictions =test_preds)
    # print(f'F1 score test: {f1_test:.2f}')

    
    return training_time, train_acc, val_acc, test_acc, f1_validation, f1_test

In [11]:
# Filter for digits 0 and 1
import random

In [12]:
train_0_original = [data for data in mnist if data[1] == 0]
train_1_original = [data for data in mnist if data[1] == 1]
print('Train set (before sparsing)', len(train_0_original), len(train_1_original), len(train_1_original) + len( train_0_original) )

Train set (before sparsing) 5923 6742 12665


<span style="color:red">[EXTRA BONUS]</span>

If the hyper-parameters are chosen properly, the baseline can perform satisfactorily on the class imbalance problem with 1% digit "1". We want to challenge the baseline and handle more class-imbalanced datasets.

Can you propose new ways for the class imbalance problem and achieve stable and satisfactory performance for large $N = 500, \; 1000, \; \cdots$?

In [13]:
headers = ['N', 'Batch size', 'Weight', 'Train Time ', 'Train Acc' ,' Val Acc', 'Test Acc', 'F1-Val', 'F1-Test']
question3_df =  pd.DataFrame(columns = headers)
question3_df

Unnamed: 0,N,Batch size,Weight,Train Time,Train Acc,Val Acc,Test Acc,F1-Val,F1-Test


In [14]:
N_list = [100] + [250*(i+1) for i in range(8)]
for N in N_list:
    train_0 = train_0_original.copy()
    train_1 =  train_1_original.copy()
    random.shuffle(train_1)
    train_1 = train_1[:len(train_1) // N]
    print(N, 'Train set (before sparsing)', len(train_0), len(train_1), len(train_1) + len( train_0) )# train_set = train_0 + train_1

    # Split training data (1s)into training and validation sets
    train_1len = int(len(train_1) *.8)
    val_1len = len(train_1) - train_1len
    train1_set, val1_set = random_split(train_1, [train_1len, val_1len])

    # Split training data (0s) into training and validation sets
    train_0len = int(len(train_0) *.8)
    val_0len = len(train_0) - train_0len
    train0_set, val0_set = random_split(train_0, [train_0len, val_0len])
    
    train_set = train0_set + train1_set
    val_set = val0_set + val1_set
    len(train_set), len(val_set)

    # creating test set
    test_0 = [data for data in mnist_test if data[1] == 0]
    test_1 = [data for data in mnist_test if data[1] == 1]
    print(N,'Test set (before sparsing)',len(test_0), len(test_1), len(test_1) + len( test_0) )

    test_1 = test_1[:len(test_1) // N]
    print(N,'Test set (after sparsing)',len(test_0), len(test_1), len(test_1) + len( test_0) )
    test_set = test_0 + test_1
    print('\n')
    # Define DataLoaders to access data in batches
    train_loader = DataLoader(train_set, batch_size=64, shuffle=True)
    val_loader = DataLoader(val_set, batch_size = 64, shuffle=False)
    test_loader = DataLoader(test_set, batch_size = 64, shuffle=False)

    # compensation = torch.tensor([1, N], dtype=torch.float32)
    compensation = torch.tensor([1, (train_0len/ train_1len )], dtype=torch.float32)
    weights = [[1,1], [1, int(N/10)], [1, int(N/2)], compensation]
    batch_size = 64
    results = []

    # for batch_size in batch_sizes:
    for weight in weights:
        reweight_factor = weight[1]/ weight[0]
        reweight_factor = float(reweight_factor)
        weight = torch.tensor(weight, dtype=torch.float32)
        weight = weight.to(device)
        training_time, train_acc, val_acc, test_acc, f1_validation, f1_test = two_digit(batch_size=batch_size, weight = weight)
        
        row = [N, batch_size, reweight_factor, training_time, train_acc, val_acc, test_acc, f1_validation, f1_test]
        question3_df = pd.concat([question3_df, pd.DataFrame([row], columns=headers)], ignore_index=True)

100 Train set (before sparsing) 5923 67 5990
100 Test set (before sparsing) 980 1135 2115
100 Test set (after sparsing) 980 11 991




  question3_df = pd.concat([question3_df, pd.DataFrame([row], columns=headers)], ignore_index=True)
  weight = torch.tensor(weight, dtype=torch.float32)


250 Train set (before sparsing) 5923 26 5949
250 Test set (before sparsing) 980 1135 2115
250 Test set (after sparsing) 980 4 984


500 Train set (before sparsing) 5923 13 5936
500 Test set (before sparsing) 980 1135 2115
500 Test set (after sparsing) 980 2 982


750 Train set (before sparsing) 5923 8 5931
750 Test set (before sparsing) 980 1135 2115
750 Test set (after sparsing) 980 1 981


1000 Train set (before sparsing) 5923 6 5929
1000 Test set (before sparsing) 980 1135 2115
1000 Test set (after sparsing) 980 1 981


1250 Train set (before sparsing) 5923 5 5928
1250 Test set (before sparsing) 980 1135 2115
1250 Test set (after sparsing) 980 0 980


1500 Train set (before sparsing) 5923 4 5927
1500 Test set (before sparsing) 980 1135 2115
1500 Test set (after sparsing) 980 0 980


1750 Train set (before sparsing) 5923 3 5926
1750 Test set (before sparsing) 980 1135 2115
1750 Test set (after sparsing) 980 0 980


2000 Train set (before sparsing) 5923 3 5926
2000 Test set (before sp

In [15]:
question3_df.to_csv(f'q3_hyperopt_weight_unsparsted_test.csv')
display(question3_df)

Unnamed: 0,N,Batch size,Weight,Train Time,Train Acc,Val Acc,Test Acc,F1-Val,F1-Test
0,100,64,1.0,0.441928,100.0,100.0,100.0,1.0,1.0
1,100,64,10.0,0.44445,99.979128,100.0,100.0,1.0,1.0
2,100,64,50.0,0.425864,100.0,100.0,99.899092,1.0,0.956522
3,100,64,89.396225,0.429034,100.0,100.0,100.0,1.0,1.0
4,250,64,1.0,0.421473,100.0,99.916037,100.0,0.909091,1.0
5,250,64,25.0,0.408722,100.0,99.832074,100.0,0.8,1.0
6,250,64,125.0,0.438926,99.978983,99.916037,100.0,0.909091,1.0
7,250,64,236.899994,0.412361,99.831862,99.832074,99.796748,0.833333,0.8
8,500,64,1.0,0.411593,99.957877,99.83165,100.0,0.5,1.0
9,500,64,50.0,0.403737,100.0,100.0,100.0,1.0,1.0


# Weighted Resampling 

In [15]:
# def convert_tensordata(dataset):
#     dataset_data = [data[0] for data in dataset]
#     dataset_labels = [data[1] for data in dataset]

from torch.utils.data import WeightedRandomSampler
headers = ['N', 'Batch Size', 'Weight', 'Train Time ', 'Train Acc' ,' Val Acc', 'Test Acc', 'F1-Val', 'F1-Test']
question3_df_resample =  pd.DataFrame(columns = headers)
question3_df_resample

Unnamed: 0,N,Batch Size,Weight,Train Time,Train Acc,Val Acc,Test Acc,F1-Val,F1-Test


In [16]:
def two_digit_resampling():
    model = SimpleMLP(in_dim=28 * 28,
                  hidden_dim=hidden_dim,
                  out_dim=2).to(device)
    batch_size = 64
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
    num_epochs = 10
    
    # training
    start_time = time.time()
    for epoch in range(num_epochs):
        correct, count = 0, 0 
        for data, target in train_loader:
            # free the gradient from the previous batch
            data, target = data.to(device), target.to(device)
            optimizer.zero_grad()
            # reshape the image into a vector
            data = data.view(data.size(0), -1)
            # model forward
            output = model(data)
            # compute the loss
            loss = criterion(output, target)
            # model backward
            loss.backward()
            # update the model paramters
            optimizer.step()
            
            # adding this for train accuracy 
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            count += data.size(0)
        
        train_acc = 100. * correct / count
        # print(f'Training accuracy: {train_acc:.2f}%')

    training_time = time.time()- start_time
    # print(training_time)
    
    # validation
    val_loss = count = 0
    correct = total = 0
    val_preds = []; val_labels=[]
    for data, target in val_loader:
        data, target = data.to(device), target.to(device)
        data = data.view(data.size(0), -1)
        output = model(data)
        val_loss += criterion(output, target).item()
        count += 1
        pred = output.argmax(dim=1)
        correct += (pred == target).sum().item()
        total += data.size(0)
        val_preds.append(pred)
        val_labels.append(target)
        # print(type(target))

    val_preds = torch.cat(val_preds).numpy()
    val_labels = torch.cat(val_labels).numpy()
    assert len(val_preds) == len(val_set)
    
    val_loss = val_loss / count
    val_acc = 100. * correct / total
    # print(f'Validation loss: {val_loss:.2f}, accuracy: {val_acc:.2f}%')
    f1_validation = f1_score(labels = val_labels, predictions = val_preds)
    # print(f'F1 score validation: {f1_validation:.2f}')
    
    # test
    model.eval()
    correct = total = 0
    test_preds = []; test_labels=[]

    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            data = data.view(data.size(0), -1)
            output = model(data)
            pred = output.argmax(dim=1)
            correct += (pred == target).sum().item()
            total += data.size(0)
            test_preds.append(pred)
            test_labels.append(target)
        
    test_preds = torch.cat(test_preds).numpy()
    test_labels = torch.cat(test_labels).numpy()
    assert len(test_preds) == len(test_set)   
    test_acc = 100. * correct / total
    # print(f'Test Accuracy: {test_acc:.2f}%')
    # print(f'Validation loss: {val_loss:.2f}, accuracy: {val_acc:.2f}%')
    f1_test = f1_score(labels = test_labels, predictions =test_preds)
    # print(f'F1 score test: {f1_test:.2f}')

    return training_time, train_acc, val_acc, test_acc, f1_validation, f1_test

In [17]:
N_list = [100] + [250*(i+1) for i in range(8)]
for N in N_list:
    train_0 = train_0_original.copy()
    train_1 =  train_1_original.copy()
    random.shuffle(train_1)
    train_1 = train_1[:len(train_1) // N]
    print(N, 'Train set (before sparsing)', len(train_0), len(train_1), len(train_1) + len( train_0) )# train_set = train_0 + train_1

    # Split training data (1s)into training and validation sets
    train_1len = int(len(train_1) *.8)
    val_1len = len(train_1) - train_1len
    train1_set, val1_set = train_1[:train_1len], train_1[train_1len:]

    # Split training data (0s) into training and validation sets
    train_0len = int(len(train_0) *.8)
    val_0len = len(train_0) - train_0len
    train0_set, val0_set = train_0[:train_0len], train_0[train_0len:]
    
    # train and val set
    train_set = train0_set + train1_set
    val_set = val0_set + val1_set
    random.shuffle(train_set)
    random.shuffle(val_set)
    len(train_set), len(val_set)

    # creating test set
    test_0 = [data for data in mnist_test if data[1] == 0]
    test_1 = [data for data in mnist_test if data[1] == 1]
    print(N,'Test set (before sparsing)',len(test_0), len(test_1), len(test_1) + len( test_0) )

    test_1 = test_1[:len(test_1) // N]
    print(N,'Test set (after sparsing)',len(test_0), len(test_1), len(test_1) + len( test_0) )
    test_set = test_0 + test_1
    test_loader = DataLoader(test_set, batch_size=64, shuffle=False)
    print('\n')
    # compensation = torch.tensor([1, N], dtype=torch.float32)
    compensation = int(train_0len/ train_1len)
    weight_factors = [1, int(N/10), int(N/2), compensation]
    batch_size = 64
    results = []

    # for batch_size in batch_sizes:
    for weight_factor in weight_factors:
        
        weights = np.array( [1.0 if data[1] == 0 else weight_factor for data in train_set])
        weights = torch.from_numpy(weights)
        
        sampler = WeightedRandomSampler(weights, num_samples=len(weights), replacement=True)
        
        train_loader = DataLoader(train_set, batch_size=64, sampler=sampler)
        val_loader = DataLoader(val_set, batch_size=64, shuffle=False)
        
        training_time, train_acc, val_acc, test_acc, f1_validation, f1_test = two_digit_resampling()
        
        row = [N, batch_size, weight_factor, training_time, train_acc, val_acc, test_acc, f1_validation, f1_test]
        question3_df_resample = pd.concat([question3_df_resample, pd.DataFrame([row], columns=headers)], ignore_index=True)

100 Train set (before sparsing) 5923 67 5990
100 Test set (before sparsing) 980 1135 2115
100 Test set (after sparsing) 980 11 991




  question3_df_resample = pd.concat([question3_df_resample, pd.DataFrame([row], columns=headers)], ignore_index=True)


250 Train set (before sparsing) 5923 26 5949
250 Test set (before sparsing) 980 1135 2115
250 Test set (after sparsing) 980 4 984


500 Train set (before sparsing) 5923 13 5936
500 Test set (before sparsing) 980 1135 2115
500 Test set (after sparsing) 980 2 982


750 Train set (before sparsing) 5923 8 5931
750 Test set (before sparsing) 980 1135 2115
750 Test set (after sparsing) 980 1 981


1000 Train set (before sparsing) 5923 6 5929
1000 Test set (before sparsing) 980 1135 2115
1000 Test set (after sparsing) 980 1 981


1250 Train set (before sparsing) 5923 5 5928
1250 Test set (before sparsing) 980 1135 2115
1250 Test set (after sparsing) 980 0 980


1500 Train set (before sparsing) 5923 4 5927
1500 Test set (before sparsing) 980 1135 2115
1500 Test set (after sparsing) 980 0 980


1750 Train set (before sparsing) 5923 3 5926
1750 Test set (before sparsing) 980 1135 2115
1750 Test set (after sparsing) 980 0 980


2000 Train set (before sparsing) 5923 3 5926
2000 Test set (before sp

In [18]:
question3_df_resample.to_csv(f'q3_hyperopt_resampling_unsparsed_test.csv')
display(question3_df_resample)

Unnamed: 0,N,Batch Size,Weight,Train Time,Train Acc,Val Acc,Test Acc,F1-Val,F1-Test
0,100,64,1,0.413916,99.853893,99.833194,99.798184,0.923077,0.9
1,100,64,10,0.453152,100.0,99.916597,99.899092,0.962963,0.952381
2,100,64,50,0.39954,100.0,100.0,100.0,1.0,1.0
3,100,64,89,0.443699,100.0,100.0,99.899092,1.0,0.956522
4,250,64,1,0.402522,99.936948,99.916037,99.796748,0.909091,0.666667
5,250,64,25,0.469228,100.0,100.0,100.0,1.0,1.0
6,250,64,125,0.37795,99.957966,100.0,99.796748,1.0,0.8
7,250,64,236,0.40428,100.0,100.0,100.0,1.0,1.0
8,500,64,1,0.471906,99.957877,99.915825,100.0,0.8,1.0
9,500,64,50,0.491204,100.0,100.0,100.0,1.0,1.0
