# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [49]:
from termcolor import colored

student_number = "1002911"
student_name = "Calvin Yusnoveri"

print(colored("Homework by "  + student_name + ', number: ' + student_number,'red'))

Homework by Calvin Yusnoveri, number: 1002911


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. Hint: be sure to check both this week and last week's lab. 

b) Check the predictions resulting from your model in the second code box below.


In [72]:
# load your data
import torch
import torch.nn as nn

train_set = [ [0, 0], [0, 1], [1, 0], [1, 1] ]
train_ans = [ [0, 1, 1, 0] ]
x = torch.Tensor(train_set)
y = torch.Tensor(train_ans).view(-1,1)

print(f"Test data: {x}\n {y}")

# name your model xor
class xor(nn.Module):
    def __init__(self, input_dim=2, output_dim=1):
        super(xor, self).__init__()
        self.fc1 = nn.Linear(input_dim, 2)
        self.sigmoid = nn.Sigmoid()
        self.fc2 = nn.Linear(2, output_dim)

    def forward(self, x):
        x = self.fc1(x)
        x = self.sigmoid(x)
        x = self.fc2(x)
        return x
    
# define your model loss function, optimizer, etc.
device = "cuda"
model = xor().to(device)
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr = 0.03)

# train the model
def train(x, y, model, loss_function, optimizer, device, epochs=1000+1):
    for epoch in range(epochs):
        features = x.to(device)
        target = y.to(device)

        optimizer.zero_grad()

        prediction = model(features)
        loss = loss_function(prediction, target.view(-1, 1)).to(device)
        loss.backward()
        optimizer.step()

        if epoch % 50 == 0: # print every 50 epochs
            print (f"Epoch: {epoch}, Loss: {loss}")

train(x, y, model, loss_function, optimizer, device)

Test data: tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]])
 tensor([[0.],
        [1.],
        [1.],
        [0.]])
Epoch: 0, Loss: 0.2929750680923462
Epoch: 50, Loss: 0.24973316490650177
Epoch: 100, Loss: 0.21934162080287933
Epoch: 150, Loss: 0.022789159789681435
Epoch: 200, Loss: 1.4584457858290989e-05
Epoch: 250, Loss: 5.269632765703136e-07
Epoch: 300, Loss: 5.461580632193375e-10
Epoch: 350, Loss: 2.0189183658203547e-11
Epoch: 400, Loss: 1.176836406102666e-13
Epoch: 450, Loss: 7.216449660063518e-15
Epoch: 500, Loss: 2.7755575615628914e-16
Epoch: 550, Loss: 2.7755575615628914e-16
Epoch: 600, Loss: 0.0
Epoch: 650, Loss: 0.0
Epoch: 700, Loss: 0.0
Epoch: 750, Loss: 0.0
Epoch: 800, Loss: 0.0
Epoch: 850, Loss: 0.0
Epoch: 900, Loss: 0.0
Epoch: 950, Loss: 0.0
Epoch: 1000, Loss: 0.0


In [73]:
from datetime import datetime

def save(model, fpath):
    now = datetime.now()
    timestamp = now.strftime("%d%m-%H%M")
    
    save_path = f'{fpath}-{timestamp}'
    torch.save(model.state_dict(), save_path) # model is saved in current dir for reproducibility
    print(f'Model saved in {save_path}.')
    return save_path

save_path = save(model, './xor')

Model saved in ./xor-2306-2215.


In [75]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

device = 'cuda'
loaded_xor = xor().to(device) # reinitialize model
loaded_xor.load_state_dict(torch.load(save_path)) # ./xor-2306-2215
model.eval()

for trial in test: 
    Xtest = torch.Tensor(trial).to(device)
    y_hat = model(Xtest)
    
    if y_hat > 0.5:
        prediction = 1
    else: 
        prediction = 0

    print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
[your answer here, no coding required]

* answer A: Multilabel classification tasks are decomposed into N binary classifiers where N is number of labels. 
So, the loss function is just Binary Cross Entropy loss for each classifier

* answer B: High variance error means overfitting.
  - 1) Reduce the size of model (i.e. less parameters, less layers) or increase the size of dataset (e.g. data augment)
  - 2) Do early stopping
  - 3) Add regularization term either L1 or L2 (depending on task) to penalize weights
  - 4) Add dropout layer to force the network not to rely on just one node which helps it to generalize better

```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [53]:
import pandas as pd

# assuming the data is downloaded and saved in the same dir -> ./content
train_data = './content/herremans_hit_1030training.csv'
test_data = './content/herremans_hit_1030test.csv'
train_data = pd.read_csv(train_data)
test_data = pd.read_csv(test_data)
print(f"Data loaded. Train data shape: {train_data.shape}, Test data shape: {test_data.shape}")
# print(train_data.head(50))

x = torch.FloatTensor(train_data.loc[:, train_data.columns != 'Topclass1030'].values).to(device)
y = torch.FloatTensor(train_data['Topclass1030']).to(device)

Data loaded. Train data shape: (321, 50), Test data shape: (79, 50)


In [54]:
# code your model 1

class MultiLayerLogisticRegression(nn.Module):
    def __init__(self, input_size, num_classes):
        super(MultiLayerLogisticRegression, self).__init__()
        self.fc1 = nn.Linear(input_size, 32)
        self.fc2 = nn.Linear(32, 8)
        self.fc3 = nn.Linear(8, num_classes)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.sigmoid(x)
        return x
    
# train model
device = 'cuda'
num_out = 1
num_inp = 49 # 50 total dimensions (num of columns of dataset)

lr_rate = 0.001
loss_function = nn.BCELoss()
model1 = MultiLayerLogisticRegression(num_inp, num_out).to(device)
optimizer = torch.optim.Adam(model1.parameters(), lr=lr_rate)

train(x, y, model1, loss_function, optimizer, device)
save_path = save(model1, './multi-layer-logreg')

Epoch: 0, Loss: 0.6781057715415955
Epoch: 50, Loss: 0.5716888308525085
Epoch: 100, Loss: 0.47020143270492554
Epoch: 150, Loss: 0.38029322028160095
Epoch: 200, Loss: 0.3195233941078186
Epoch: 250, Loss: 0.29076775908470154
Epoch: 300, Loss: 0.27820831537246704
Epoch: 350, Loss: 0.27223384380340576
Epoch: 400, Loss: 0.2692500650882721
Epoch: 450, Loss: 0.2676442861557007
Epoch: 500, Loss: 0.2666953504085541
Epoch: 550, Loss: 0.2660396993160248
Epoch: 600, Loss: 0.26557835936546326
Epoch: 650, Loss: 0.26521551609039307
Epoch: 700, Loss: 0.26495152711868286
Epoch: 750, Loss: 0.2646833658218384
Epoch: 800, Loss: 0.2645995318889618
Epoch: 850, Loss: 0.26448795199394226
Epoch: 900, Loss: 0.2643808424472809
Epoch: 950, Loss: 0.26423314213752747
Epoch: 1000, Loss: 0.2642240822315216
Epoch: 1050, Loss: 0.2640830874443054
Epoch: 1100, Loss: 0.2640083134174347
Epoch: 1150, Loss: 0.2639453709125519
Epoch: 1200, Loss: 0.26387515664100647
Epoch: 1250, Loss: 0.26391148567199707
Epoch: 1300, Loss: 0.26

In [55]:
# evaluate model 1 (called model1 here)

device = 'cuda'
trained_model1 = MultiLayerLogisticRegression(num_inp, num_out).to(device) # reinitialize model
trained_model1.load_state_dict(torch.load(save_path)) # try save_path = ./multi-layer-logreg-2306-2210
trained_model1.eval()

def run_evaluation(my_model):
    test = pd.read_csv('./content/herremans_hit_1030test.csv')
    labels = test.iloc[:,-1]
    test = test.drop('Topclass1030', axis=1)
    testdata = torch.Tensor(test.values)
    testlabels = torch.Tensor(labels.values).view(-1,1)

    TP = 0
    TN = 0
    FN = 0
    FP = 0

    for i in range(0, testdata.size()[0]): 
        # print(testdata[i].size())
        Xtest = torch.Tensor(testdata[i]).to(device)
        y_hat = my_model(Xtest)

        if y_hat > 0.5:
            prediction = 1
        else: 
            prediction = 0

        if (prediction == testlabels[i]):
            if (prediction == 1):
                TP += 1
            else: 
                TN += 1

        else:
            if (prediction == 1):
                FP += 1
            else: 
                FN += 1

    print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
    print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
    rate = TP/(FN+TP)
    print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(trained_model1)

True Positives: 47, True Negatives: 14
False Positives: 15, False Negatives: 3
Class specific accuracy of correctly predicting a hit song is 0.94


In [56]:
# code your model 2

class DropOutLogisticRegression(nn.Module):
    def __init__(self, input_size, num_classes):
        super(DropOutLogisticRegression, self).__init__()
        self.fc1 = nn.Linear(input_size, 32)
        self.fc2 = nn.Linear(32, 8)
        self.fc3 = nn.Linear(8, num_classes)
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        self.dropout = nn.Dropout(p=0.75)
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.dropout(x)
        x = self.fc2(x)
        x = self.relu(x)
        x = self.fc3(x)
        x = self.relu(x)
        x = self.sigmoid(x)
        return x
    
# train model
device = 'cuda'
num_out = 1
num_inp = 49 # 50 total dimensions (num of columns of dataset)

lr_rate = 0.001
loss_function = nn.BCELoss()
model2 = DropOutLogisticRegression(num_inp, num_out).to(device)
optimizer = torch.optim.Adam(model2.parameters(), lr=lr_rate)

train(x, y, model2, loss_function, optimizer, device)
save_path = save(model2, './drop-out-logreg')

Epoch: 0, Loss: 0.6838576197624207
Epoch: 50, Loss: 0.6308794021606445
Epoch: 100, Loss: 0.5846298933029175
Epoch: 150, Loss: 0.5645995140075684
Epoch: 200, Loss: 0.5282886028289795
Epoch: 250, Loss: 0.5055448412895203
Epoch: 300, Loss: 0.512498676776886
Epoch: 350, Loss: 0.480032354593277
Epoch: 400, Loss: 0.4498507082462311
Epoch: 450, Loss: 0.45719200372695923
Epoch: 500, Loss: 0.4388248026371002
Epoch: 550, Loss: 0.4245736002922058
Epoch: 600, Loss: 0.4117947816848755
Epoch: 650, Loss: 0.41078197956085205
Epoch: 700, Loss: 0.3838280737400055
Epoch: 750, Loss: 0.3869994878768921
Epoch: 800, Loss: 0.3886878788471222
Epoch: 850, Loss: 0.3684263527393341
Epoch: 900, Loss: 0.3625520169734955
Epoch: 950, Loss: 0.3505902588367462
Epoch: 1000, Loss: 0.3654760718345642
Epoch: 1050, Loss: 0.34178394079208374
Epoch: 1100, Loss: 0.3386198878288269
Epoch: 1150, Loss: 0.337293416261673
Epoch: 1200, Loss: 0.33868104219436646
Epoch: 1250, Loss: 0.3454767167568207
Epoch: 1300, Loss: 0.3516904413700

In [58]:
# evaluate model 2 (called model2 here)

device = 'cuda'
trained_model2 = DropOutLogisticRegression(num_inp, num_out).to(device) # reinitialize model
trained_model2.load_state_dict(torch.load(save_path)) # try save_path = ./drop-out-logreg-2306-2210
trained_model2.eval()

run_evaluation(trained_model2)

True Positives: 48, True Negatives: 8
False Positives: 21, False Negatives: 2
Class specific accuracy of correctly predicting a hit song is 0.96


Which works better and why do you think this may be (very briefly)? 


The first model `model1` are 3 layer logistic regression classifier. The second model `model2` is exactly the same architecture but with a dropout layer attached after the first fully connected layer. They both use the same optimizer and loss function.

The second model `model2` seems to work better when basic the metrics on class specific accuracy only. The dropout seems to force the model to not rely on any single particular node too much when training as a subset of it will be turned off randomly. This helps to prevent overfitting.

But, if we were to look at it in terms of how much True Positives and True Negatives are predicted, the first model `model1` seems to predict more True Positives and True Negatives.

Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!