# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [1]:
from termcolor import colored

student_number="1003391"
student_name="Tan Chia Yik"

print(colored("Homework by "  + student_name + ', number: ' + student_number,'red'))

[31mHomework by Tan Chia Yik, number: 1003391[0m


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. Hint: be sure to check both this week and last week's lab. 

b) Check the predictions resulting from your model in the second code box below.


In [None]:
# load your data
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# training set of input X and labels Y
X = torch.Tensor([[0,0],[0,1], [1,0], [1,1]])
Y = torch.Tensor([0,1,1,0]).view(-1,1)

# name your model xor
class XOR(nn.Module):
  def __init__(self, input_size, hidden_size, num_classes):
    super(XOR, self).__init__()
    self.linear_first = nn.Linear(input_size, hidden_size)
    self.linear_second = nn.Linear(hidden_size, num_classes)
    self.relu = nn.ReLU()
    self.sigmoid = nn.Sigmoid()

  def forward(self, x):
    out = self.linear_first(x)
    out = self.relu(out)
    out = self.linear_second(out)
    out = torch.sigmoid(out)
    return out

input_size = 2
hidden_size = 3
num_classes = 1
XOR_clf = XOR(input_size, hidden_size, num_classes)

# define your model loss function, optimizer, etc. 
lr_rate = 0.001
loss_function = nn.BCELoss() 
optimizer = torch.optim.Adam(XOR_clf.parameters(), lr=lr_rate)

# train the model

epochs = 10001 
steps = X.size(0) 

for i in range(epochs):  
    optimizer.zero_grad() 
    output_y = XOR_clf(X)#forward pass

    loss = loss_function(output_y, Y) #calculate the loss
    loss.backward() #backprop
    optimizer.step() #does the update

    if i % 1000 == 0:
        print ("Epoch: {0}, Loss: {1}, ".format(i, loss.data))


Epoch: 0, Loss: 0.7172043323516846, 
Epoch: 1000, Loss: 0.504838228225708, 
Epoch: 2000, Loss: 0.14360883831977844, 
Epoch: 3000, Loss: 0.058326609432697296, 
Epoch: 4000, Loss: 0.02836252935230732, 
Epoch: 5000, Loss: 0.015296418219804764, 
Epoch: 6000, Loss: 0.00876498594880104, 
Epoch: 7000, Loss: 0.005214254837483168, 
Epoch: 8000, Loss: 0.0031792016234248877, 
Epoch: 9000, Loss: 0.001971431775018573, 
Epoch: 10000, Loss: 0.0012360757682472467, 


In [None]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial)
  y_hat = XOR_clf(Xtest)

  if y_hat > 0.5:
    prediction = 1
  else: 
    prediction = 0

  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
[your answer here, no coding required]

* answer A

BCEWithLogitsLoss

highvariance - overfitting model
* answer B
  - 1 Regularization
  - 2 Dropout
  - 3 Early Stopping
  - 4 Data augmentation
```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [None]:
# code your model 1
import torch
import pandas as pd
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import io

# load data
all_data = pd.read_csv(io.BytesIO(uploaded['herremans_hit_1030training.csv']))
labels = all_data.iloc[:,-1] #Gets the target values, 1 means top 10 hit song, 0 means not top 10 hit song
labels = torch.Tensor(labels.values).reshape(-1,1)
train_data = all_data.drop('Topclass1030', axis=1) #Removes labels from dataset
train_data = torch.Tensor(train_data.values) 

num_input_features = train_data.size(1) #has an output dimension of 1

# define logistic regression model
class MulLayerPercp_model1(nn.Module):
  def __init__(self, input_size, hidden_size, output_size = 1):
    super(MulLayerPercp_model1, self).__init__()
    self.linear_first_layer = nn.Linear(input_size, hidden_size)
    self.linear_second_layer = nn.Linear(hidden_size, output_size)
    self.relu = nn.ReLU() #ReLU resolves the vanishing gradient problem, as the derivitive of the relu function is 1, and ReLU is the defauly activation function in multilater perceptron and CNN
    self.sigmoid = nn.Sigmoid()
  
  def forward(self, x):
    out = self.linear_first_layer(x) 
    out = self.relu(out)
    out = self.linear_second_layer(out)
    out = self.sigmoid(out)
    return out

hidden_size = 15
MulLayerPercp_clf_1 = MulLayerPercp_model1(num_input_features, hidden_size)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

epochs = 1001
learning_rate = 0.01
loss_function = nn.BCELoss() #Mean Squared Error loss
optimizer = torch.optim.SGD(MulLayerPercp_clf_1.parameters(), lr=learning_rate)

for i in range(epochs):
    MulLayerPercp_clf_1.train()
    optimizer.zero_grad()
    
    # Forward Pass
    output_y = MulLayerPercp_clf_1(train_data)

    # Compute Loss
    loss = loss_function(output_y, labels)

    # Backward pass
    loss.backward()
    optimizer.step()

    if i % 100 == 0:
        print(f"Epoch - {i} , Loss - {loss.item()}")



Epoch - 0 , Loss - 0.6963269114494324
Epoch - 100 , Loss - 0.6321107149124146
Epoch - 200 , Loss - 0.6132636666297913
Epoch - 300 , Loss - 0.5987904667854309
Epoch - 400 , Loss - 0.5847398042678833
Epoch - 500 , Loss - 0.5707181096076965
Epoch - 600 , Loss - 0.5575959086418152
Epoch - 700 , Loss - 0.5456567406654358
Epoch - 800 , Loss - 0.5349549651145935
Epoch - 900 , Loss - 0.5254961252212524
Epoch - 1000 , Loss - 0.5171141624450684


In [None]:
# evaluate model 1 (called model1 here)
import pandas as pd
import io

def run_evaluation(my_model):

  test = pd.read_csv(io.BytesIO(uploaded['herremans_hit_1030training.csv']))
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1)

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i])
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(MulLayerPercp_clf_1)


True Positives: 183, True Negatives: 59
False Positives: 59, False Negatives: 20
Class specific accuracy of correctly predicting a hit song is 0.9014778325123153


In [None]:
# code your model 2

import torch
import pandas as pd
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import io

# load data
all_data = pd.read_csv(io.BytesIO(uploaded['herremans_hit_1030training.csv']))
labels = all_data.iloc[:,-1] #Gets the target values, 1 means top 10 hit song, 0 means not top 10 hit song
labels = torch.Tensor(labels.values).reshape(-1,1)
train_data = all_data.drop('Topclass1030', axis=1) #Removes labels from dataset
train_data = torch.Tensor(train_data.values) 

num_input_features = train_data.size(1) #has an output dimension of 1

# define logistic regression model
class MulLayerPercp_model2(nn.Module):
  def __init__(self, input_size, hidden_size, output_size = 1):
    super(MulLayerPercp_model2, self).__init__()
    self.linear_first_layer = nn.Linear(input_size, hidden_size)
    self.linear_second_layer = nn.Linear(hidden_size, hidden_size)
    self.linear_third_layer = nn.Linear(hidden_size, hidden_size)
    self.linear_fourth_layer = nn.Linear(hidden_size, output_size)
    self.relu = nn.ReLU() #ReLU resolves the vanishing gradient problem, as the derivitive of the relu function is 1, and ReLU is the defauly activation function in multilater perceptron and CNN
    self.sigmoid = nn.Sigmoid()
  
  def forward(self, x):
    out = self.linear_first_layer(x) 
    out = self.relu(out)
    out = self.linear_second_layer(out)
    out = self.relu(out)
    out = self.linear_third_layer(out)
    out = self.relu(out)
    out = self.linear_fourth_layer(out)
    out = self.sigmoid(out)
    return out

hidden_size = 15
MulLayerPercp_clf_2 = MulLayerPercp_model2(num_input_features, hidden_size)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

epochs = 1001
learning_rate = 0.01
loss_function = nn.BCELoss() #Mean Squared Error loss
optimizer = torch.optim.SGD(MulLayerPercp_clf_2.parameters(), lr=learning_rate)

for i in range(epochs):
    MulLayerPercp_clf_2.train()
    optimizer.zero_grad()
    
    # Forward Pass
    output_y = MulLayerPercp_clf_2(train_data)

    # Compute Loss
    loss = loss_function(output_y, labels)

    # Backward pass
    loss.backward()
    optimizer.step()

    if i % 100 == 0:
        print(f"Epoch - {i} , Loss - {loss.item()}")


Epoch - 0 , Loss - 0.714074969291687
Epoch - 100 , Loss - 0.6905967593193054
Epoch - 200 , Loss - 0.6762173175811768
Epoch - 300 , Loss - 0.6669715642929077
Epoch - 400 , Loss - 0.6602802276611328
Epoch - 500 , Loss - 0.6552578806877136
Epoch - 600 , Loss - 0.6511135101318359
Epoch - 700 , Loss - 0.6470938920974731
Epoch - 800 , Loss - 0.6424397826194763
Epoch - 900 , Loss - 0.6372448801994324
Epoch - 1000 , Loss - 0.6311960220336914


In [None]:
# evaluate model 2 (called model2 here)

run_evaluation(MulLayerPercp_clf_2)

True Positives: 203, True Negatives: 0
False Positives: 118, False Negatives: 0
Class specific accuracy of correctly predicting a hit song is 1.0


Which works better and why do you think this may be (very briefly)? 


**[your answer here, also please summarise the differences between your two models]**

The second model works better, with close to perfect prediction, while the first model has a prediction of about 90% accuracy. both models use stochastic gradient descent as optimisers, and both have a hidden layer size of 15.

However, model 1 has 1 hidden layer, but model 2 has 3 hidden layers. As the second model has more parameters, model 2 is more complex, and it generalises better. Model 1 has less parameters, suggesting that the model is slightly bias, and doesnt do well outside of the train set. 



Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!

In [None]:

from google.colab import files
uploaded = files.upload()

Saving herremans_hit_1030training.csv to herremans_hit_1030training.csv
