# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [1]:
from termcolor import colored

student_name="Kuah Wee Ping"
student_number="1003345"

print(colored("Homework by "  + student_name + ', number: ' + student_number,'green'))

[32mHomework by Kuah Wee Ping, number: 1003345[0m


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. Hint: be sure to check both this week and last week's lab. 

b) Check the predictions resulting from your model in the second code box below.


In [2]:
import torch
import numpy as np
import pandas as pd

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable

In [None]:
# load your data
input_x = torch.Tensor([[0,0],[0,1], [1,0], [1,1]])
input_y = torch.Tensor([0,1,1,0]).view(-1,1)

test_x = torch.Tensor([[0,0],[0,1],[1,1],[1,0]])
test_y = torch.Tensor([0,1,0,1]).view(-1,1)


class FeedForwardNN(nn.Module):
    def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout, activation):
      super(FeedForwardNN, self).__init__()
      
      assert num_hidden > 0
      self.hidden_layers = nn.ModuleList([])
      self.hidden_layers.append(nn.Linear(input_size, hidden_dim))

      for i in range(num_hidden - 1):
        self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
      
      self.dropout = nn.Dropout(dropout)
      self.output_projection = nn.Linear(hidden_dim, num_classes)
      self.nonlinearity = activation

    def forward(self, x):
      for hidden_layer in self.hidden_layers:
        x = hidden_layer(x)
        x = self.dropout(x)
        x = self.nonlinearity(x)
    
      out = self.output_projection(x) 
      out_distribution = F.sigmoid(out)
      
      return out_distribution

# # name your model xor
def xor():
  input_size = 2
  num_classes = 1
  num_hidden = 3
  hidden_dim = 50
  dropout = 0.1
  activation = nn.Sigmoid()

  return FeedForwardNN(input_size, num_classes, num_hidden, hidden_dim, dropout, activation)
  

# define your model loss function, optimizer, etc. 
xor_clf = xor()
print(xor_clf)

xor_loss = nn.MSELoss()
xor_optim = optim.SGD(xor_clf.parameters(), lr=0.1, momentum=0.9)


# train the model
num_epochs = 10001

for epoch in range(num_epochs):
  for (x, y) in zip(input_x, input_y):
    reshaped_x = x.view(-1, 2)
    reshaped_x = Variable(reshaped_x)
    target = Variable(y)

    predicted = xor_clf(reshaped_x)
    target = target.unsqueeze(1)
    batch_loss = xor_loss(predicted, target)
    xor_optim.zero_grad()
    batch_loss.backward()
    xor_optim.step()
  
  if epoch % 500 == 0:
    print ("Epoch: {0}, Loss: {1}, ".format(epoch, batch_loss.data))




FeedForwardNN(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=2, out_features=50, bias=True)
    (1): Linear(in_features=50, out_features=50, bias=True)
    (2): Linear(in_features=50, out_features=50, bias=True)
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (output_projection): Linear(in_features=50, out_features=1, bias=True)
  (nonlinearity): Sigmoid()
)
Epoch: 0, Loss: 0.35627806186676025, 




Epoch: 500, Loss: 0.4106728434562683, 
Epoch: 1000, Loss: 0.14450834691524506, 
Epoch: 1500, Loss: 0.178245410323143, 
Epoch: 2000, Loss: 0.5057858824729919, 
Epoch: 2500, Loss: 0.40044623613357544, 
Epoch: 3000, Loss: 0.3650476336479187, 
Epoch: 3500, Loss: 0.2890944182872772, 
Epoch: 4000, Loss: 0.006477650720626116, 
Epoch: 4500, Loss: 3.986217416240834e-05, 
Epoch: 5000, Loss: 5.213099484535633e-06, 
Epoch: 5500, Loss: 1.8647305068952846e-06, 
Epoch: 6000, Loss: 3.134517100988887e-05, 
Epoch: 6500, Loss: 5.4220777201408055e-06, 
Epoch: 7000, Loss: 0.0001004817895591259, 
Epoch: 7500, Loss: 3.806601114320074e-07, 
Epoch: 8000, Loss: 1.8772070689010434e-06, 
Epoch: 8500, Loss: 2.9678182045245194e-07, 
Epoch: 9000, Loss: 2.4729533834033646e-05, 
Epoch: 9500, Loss: 1.6062893337220885e-05, 
Epoch: 10000, Loss: 9.863483683147933e-06, 


In [None]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial)
  y_hat = xor_clf(Xtest)

  if y_hat > 0.5:
    prediction = 1
  else: 
    prediction = 0

  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1




## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
[your answer here, no coding required]

* answer A
We can use Cross-Entropy Loss.
* answer B
  - 1 Use a different optimiser like Adam instead of SGD
  - 2 Use a dropout (nn.Dropout(0.5))
  - 3 Stop the training earlier before overfitting (lesser epoch)
  - 4 Augment the training data such as rotating an image for image data set

```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [13]:
class FeedForwardNN(nn.Module):
    def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout, activation):
      super(FeedForwardNN, self).__init__()
      
      assert num_hidden > 0
      self.hidden_layers = nn.ModuleList([])
      self.hidden_layers.append(nn.Linear(input_size, hidden_dim))

      for i in range(num_hidden - 1):
        self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
      
      self.dropout = nn.Dropout(dropout)
      self.output_projection = nn.Linear(hidden_dim, num_classes)
      self.nonlinearity = activation

    def forward(self, x):
      for hidden_layer in self.hidden_layers:
        x = hidden_layer(x)
        x = self.dropout(x)
        x = self.nonlinearity(x)
    
      out = self.output_projection(x) 
      out_distribution = F.sigmoid(out)
      
      return out_distribution

In [4]:
# Prepare data for training

# Download training data
!wget https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv -O herremans_hit_1030training.csv

# load training data
train_data = pd.read_csv('herremans_hit_1030training.csv')

train_labels = train_data.iloc[:,-1]
train_features = train_data.drop('Topclass1030', axis=1)


train_features = torch.Tensor(train_features.values).cuda()
train_labels = torch.Tensor(train_labels.values).view(-1,1).cuda()

# Get size
input_sizes = train_features.shape[1]
num_classes = train_labels.shape[1]

--2021-06-29 17:55:50--  https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv
Resolving dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)... 52.219.66.3
Connecting to dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)|52.219.66.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 147372 (144K) [text/csv]
Saving to: ‘herremans_hit_1030training.csv’


2021-06-29 17:55:51 (203 KB/s) - ‘herremans_hit_1030training.csv’ saved [147372/147372]



In [36]:
# code your model 1
def hitsong1(input_size, num_classes):
  num_hidden = 3
  hidden_dim = 50
  dropout = 0.1
  activation = nn.ReLU()
  
  return FeedForwardNN(input_size, num_classes, num_hidden, hidden_dim, dropout, activation)


hitsong1_clf = hitsong1(input_sizes, num_classes).cuda()
print(hitsong1_clf)

hitsong1_loss = nn.MSELoss()
hitsong1_optim = optim.SGD(hitsong1_clf.parameters(), lr=0.001, momentum=0.9)

# train the model

num_epochs = 101

for epoch in range(num_epochs):
  for (x, y) in zip(train_features, train_labels):
    reshaped_x = x.view(-1, input_sizes)
    reshaped_x = Variable(reshaped_x).cuda()
    target = Variable(y).cuda()

    predicted = hitsong1_clf(reshaped_x)
    target = target.unsqueeze(1)
    batch_loss = hitsong1_loss(predicted, target)
    hitsong1_optim.zero_grad()
    batch_loss.backward()
    hitsong1_optim.step()
  
  if epoch % 10 == 0:
    print ("Epoch: {0}, Loss: {1}, ".format(epoch, batch_loss.data))





FeedForwardNN(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=49, out_features=50, bias=True)
    (1): Linear(in_features=50, out_features=50, bias=True)
    (2): Linear(in_features=50, out_features=50, bias=True)
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (output_projection): Linear(in_features=50, out_features=1, bias=True)
  (nonlinearity): ReLU()
)




Epoch: 0, Loss: 0.254652738571167, 
Epoch: 10, Loss: 0.21368280053138733, 
Epoch: 20, Loss: 0.06645623594522476, 
Epoch: 30, Loss: 0.04178480803966522, 
Epoch: 40, Loss: 0.011214429512619972, 
Epoch: 50, Loss: 0.03416338935494423, 
Epoch: 60, Loss: 0.1401108056306839, 
Epoch: 70, Loss: 0.0011592656373977661, 
Epoch: 80, Loss: 0.09836075454950333, 
Epoch: 90, Loss: 0.0004204816068522632, 
Epoch: 100, Loss: 1.1096223033746355e-06, 


In [8]:
# Download test data
!wget https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv -O herremans_hit_1030test.csv

def run_evaluation(my_model):

  test = pd.read_csv('herremans_hit_1030test.csv')
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1)

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i]).cuda()
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

--2021-06-29 17:56:55--  https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv
Resolving dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)... 52.219.62.115
Connecting to dorax.s3.ap-south-1.amazonaws.com (dorax.s3.ap-south-1.amazonaws.com)|52.219.62.115|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 36712 (36K) [text/csv]
Saving to: ‘herremans_hit_1030test.csv’


2021-06-29 17:56:57 (154 KB/s) - ‘herremans_hit_1030test.csv’ saved [36712/36712]



In [37]:
# evaluate model 1 (called model1 here)

run_evaluation(hitsong1_clf)

True Positives: 37, True Negatives: 19
False Positives: 10, False Negatives: 13
Class specific accuracy of correctly predicting a hit song is 0.74




In [32]:
# code your model 2

def hitsong2(input_size, num_classes):
  num_hidden = 3
  hidden_dim = 50
  dropout = 0.1
  activation = nn.Sigmoid()
  
  return FeedForwardNN(input_size, num_classes, num_hidden, hidden_dim, dropout, activation)


hitsong2_clf = hitsong2(input_sizes, num_classes).cuda()
print(hitsong2_clf)

hitsong2_loss = nn.MSELoss()
hitsong2_optim = optim.SGD(hitsong1_clf.parameters(), lr=0.001, momentum=0.9)

# train the model

num_epochs = 101

for epoch in range(num_epochs):
  for (x, y) in zip(train_features, train_labels):
    reshaped_x = x.view(-1, input_sizes)
    reshaped_x = Variable(reshaped_x).cuda()
    target = Variable(y).cuda()

    predicted = hitsong2_clf(reshaped_x)
    target = target.unsqueeze(1)
    batch_loss = hitsong2_loss(predicted, target)
    hitsong2_optim.zero_grad()
    batch_loss.backward()
    hitsong2_optim.step()
  
  if epoch % 10 == 0:
    print ("Epoch: {0}, Loss: {1}, ".format(epoch, batch_loss.data))


FeedForwardNN(
  (hidden_layers): ModuleList(
    (0): Linear(in_features=49, out_features=50, bias=True)
    (1): Linear(in_features=50, out_features=50, bias=True)
    (2): Linear(in_features=50, out_features=50, bias=True)
  )
  (dropout): Dropout(p=0.1, inplace=False)
  (output_projection): Linear(in_features=50, out_features=1, bias=True)
  (nonlinearity): Sigmoid()
)




Epoch: 0, Loss: 0.3867832124233246, 
Epoch: 10, Loss: 0.3813243806362152, 
Epoch: 20, Loss: 0.3854142725467682, 
Epoch: 30, Loss: 0.3871181011199951, 
Epoch: 40, Loss: 0.38897904753685, 
Epoch: 50, Loss: 0.3781259059906006, 
Epoch: 60, Loss: 0.3826589584350586, 
Epoch: 70, Loss: 0.3813149333000183, 
Epoch: 80, Loss: 0.3901846408843994, 
Epoch: 90, Loss: 0.37933507561683655, 
Epoch: 100, Loss: 0.38481128215789795, 


In [33]:
# evaluate model 2 (called model2 here)

run_evaluation(hitsong2_clf)

True Positives: 50, True Negatives: 0
False Positives: 29, False Negatives: 0
Class specific accuracy of correctly predicting a hit song is 1.0




Which works better and why do you think this may be (very briefly)? 


**[your answer here, also please summarise the differences between your two models]**

Model 2 works better (100% accuracy to 74% accuray in model 1 on test set).

In model 2, the activation function used Sigmoid as compared to the one used in model 1 which is ReLU. Sigmoid activation function gave a higher accuracy and performed better because it is better than ReLU for binary classification problem (output label is either 0 or 1).

The rest of the hyperparameters, loss function, optimiser, etc remains the same for the Feed-Forward NN in both models.

Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!