# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [1]:
from termcolor import colored

student_number="1002819"
student_name="Samson Yu Bai Jian"

print(colored("Homework by "  + student_name + ', number: ' + student_number,'red'))

[31mHomework by Samson Yu Bai Jian, number: 1002819[0m


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. 

b) Check the predictions resulting from your model in the second code box below.


In [2]:
# load your data
import torch

X = torch.Tensor([[0,0],[0,1], [1,0], [1,1]])
Y = torch.LongTensor([0,1,1,0]).view(-1,1)

# name your model xor
import torch.nn as nn
import torch.nn.functional as F

class FeedForwardNN(nn.Module):
    def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout):
        super(FeedForwardNN, self).__init__()
        
        assert num_hidden > 0
        self.hidden_layers = nn.ModuleList([])
        self.hidden_layers.append(nn.Linear(input_size, hidden_dim))
        for i in range(num_hidden - 1):
            self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
        self.dropout = nn.Dropout(dropout)
        self.output_projection = nn.Linear(hidden_dim, num_classes)
        self.nonlinearity = nn.ReLU()
    
    def forward(self, x):
        for hidden_layer in self.hidden_layers:
            x = hidden_layer(x)
            x = self.dropout(x)
            x = self.nonlinearity(x)
      
        out = self.output_projection(x)
        return out

def xor():
    num_outputs = 2
    num_input_features = 2
    num_hidden = 2
    hidden_dim = 5
    dropout = 0

    model = FeedForwardNN(num_input_features, num_outputs, num_hidden, hidden_dim, dropout)
    return model

xor = xor()

# define your model loss function, optimizer, etc. 
lr_rate = 0.02
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(xor.parameters(), lr=lr_rate)

# train the model
import numpy as np

epochs = 2000
steps = X.size(0)

for i in range(epochs):
    for j in range(steps):
        data_point = np.random.randint(X.size(0))

        x_var = torch.Tensor(X[data_point]).unsqueeze(0)
        y_var = torch.LongTensor(Y[data_point])

        optimizer.zero_grad()
        y_hat = xor(x_var)

        loss = loss_function(y_hat, y_var)
        loss.backward()
        optimizer.step()

    if i % 100 == 0:
        print ("Epoch: {0}, Loss: {1}, ".format(i, loss.data.numpy()))

Epoch: 0, Loss: 0.5926584005355835, 
Epoch: 100, Loss: 0.6802399754524231, 
Epoch: 200, Loss: 0.5889790058135986, 
Epoch: 300, Loss: 0.5438827872276306, 
Epoch: 400, Loss: 0.042589422315359116, 
Epoch: 500, Loss: 0.022974850609898567, 
Epoch: 600, Loss: 0.1682976931333542, 
Epoch: 700, Loss: 0.007836077362298965, 
Epoch: 800, Loss: 0.08127047121524811, 
Epoch: 900, Loss: 0.06406918913125992, 
Epoch: 1000, Loss: 0.0047372253611683846, 
Epoch: 1100, Loss: 0.0037967516109347343, 
Epoch: 1200, Loss: 0.002916489727795124, 
Epoch: 1300, Loss: 0.0014448452275246382, 
Epoch: 1400, Loss: 0.030779751017689705, 
Epoch: 1500, Loss: 0.0012076949933543801, 
Epoch: 1600, Loss: 0.0017921352991834283, 
Epoch: 1700, Loss: 0.0008364992681890726, 
Epoch: 1800, Loss: 0.0014757943572476506, 
Epoch: 1900, Loss: 0.019848771393299103, 


In [3]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function

test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial)
  y_hat = xor(Xtest)
  prediction = np.argmax(y_hat.detach().numpy(), axis=0)
  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), prediction))


0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
[your answer here, no coding required]

* answer A
nn.BCELoss, since we want to do binary classification on each output node. For example, if there are 10 classes, there will be 10 output nodes, and we want to check for each class if it is part of the input. Hence, we will do binary classification on each output node.

* answer B
  - Increase training dataset size.
  - Decrease model size (e.g. number of layers or number of parameters).
  - Add early stopping.
  - Add regularisation.

```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multiplayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 




In [34]:
# code your model 1
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

class MLP(nn.Module):
    def __init__(self, input_size, num_classes, num_hidden, hidden_dim):
        super(MLP, self).__init__()
        
        assert num_hidden > 0
        self.hidden_layers = nn.ModuleList([])
        self.hidden_layers.append(nn.Linear(input_size, hidden_dim))
        for i in range(num_hidden - 1):
            self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
        self.output_projection = nn.Linear(hidden_dim, num_classes)
        self.nonlinearity = nn.ReLU()
    
    def forward(self, x):
        for hidden_layer in self.hidden_layers:
            x = hidden_layer(x)
            x = self.nonlinearity(x)
      
        out = self.output_projection(x)
        out_distribution = torch.sigmoid(out)

        return out_distribution

import pandas as pd
from torch.utils.data import Dataset

class Dataset(torch.utils.data.Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.labels)

    def __getitem__(self, index):
        X = self.data[index]
        y = self.labels[index]

        return X, y

train = pd.read_csv('https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv')
labels = train.iloc[:,-1]
train = train.drop('Topclass1030', axis=1)
traindata = torch.Tensor(train.values)
trainlabels = torch.Tensor(labels.values).view(-1,1)
dataset = Dataset(traindata, trainlabels)

# model1 parameters
num_outputs = 1
num_input_features = train.shape[-1]
num_hidden = 2
hidden_dim = 50

model1 = MLP(num_input_features, num_outputs, num_hidden, hidden_dim).to(device)

# training parameters
epochs = 1000
lr_rate = 0.02
criterion = nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model1.parameters(), lr=lr_rate)

# dataloader parameters
params = {'batch_size': 32,
          'shuffle': True}
dataloader = torch.utils.data.DataLoader(dataset, **params)

for i in range(epochs):
    epoch_loss = 0
    for batch_data, batch_labels in dataloader:
        X = batch_data.to(device)
        y = batch_labels.to(device)

        optimizer.zero_grad()
        y_hat = model1(X)

        loss = criterion(y_hat, y)
        loss.backward()
        optimizer.step()

        epoch_loss += loss.cpu().data.numpy()

    if i % 100 == 0:
        print ("Epoch: {0}, Loss: {1}, ".format(i, epoch_loss / len(dataloader)))

Epoch: 0, Loss: 0.6821933388710022, 
Epoch: 100, Loss: 0.45815915682099084, 
Epoch: 200, Loss: 0.19358438795263117, 
Epoch: 300, Loss: 0.025462186979976566, 
Epoch: 400, Loss: 0.009691712742840702, 
Epoch: 500, Loss: 0.0045931234066797924, 
Epoch: 600, Loss: 0.003005152515305037, 
Epoch: 700, Loss: 0.0022362439723854714, 
Epoch: 800, Loss: 0.0017538869922811334, 
Epoch: 900, Loss: 0.0014217824103649366, 


In [35]:
# evaluate model 1 (called model1 here)

import pandas as pd 

def run_evaluation(my_model):

#   test = pd.read_csv('/content/herremans_hit_1030test.csv')
  test = pd.read_csv('https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv')
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1)

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i]).to(device)
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(model1)

True Positives: 38, True Negatives: 16
False Positives: 13, False Negatives: 12
Class specific accuracy of correctly predicting a hit song is 0.76


In [36]:
# code your model 2
# model2 parameters
num_outputs = 1
num_input_features = train.shape[-1]
num_hidden = 5
hidden_dim = 50

model2 = MLP(num_input_features, num_outputs, num_hidden, hidden_dim).to(device)

# training parameters
epochs = 1000
lr_rate = 0.02
criterion = nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model2.parameters(), lr=lr_rate)

for i in range(epochs):
    epoch_loss = 0
    for batch_data, batch_labels in dataloader:
        X = batch_data.to(device)
        y = batch_labels.to(device)

        optimizer.zero_grad()
        y_hat = model2(X)

        loss = criterion(y_hat, y)
        loss.backward()
        optimizer.step()

        epoch_loss += loss.cpu().data.numpy()

    if i % 100 == 0:
        print ("Epoch: {0}, Loss: {1}, ".format(i, epoch_loss / len(dataloader)))

Epoch: 0, Loss: 0.7034374908967451, 
Epoch: 100, Loss: 0.6353667432611639, 
Epoch: 200, Loss: 0.5634772940115496, 
Epoch: 300, Loss: 0.15108455514365976, 
Epoch: 400, Loss: 0.08429935269735077, 
Epoch: 500, Loss: 0.016981393453368746, 
Epoch: 600, Loss: 0.0013241947480392728, 
Epoch: 700, Loss: 0.0005806320272809403, 
Epoch: 800, Loss: 0.00034530234353786165, 
Epoch: 900, Loss: 0.00024352074301119004, 


In [39]:
# evaluate model 2 (called model2 here)

run_evaluation(model2)

True Positives: 42, True Negatives: 16
False Positives: 13, False Negatives: 8
Class specific accuracy of correctly predicting a hit song is 0.84


Which works better and why do you think this may be (very briefly)? 


**[your answer here, also please summarise the differences between your two models]**
model2 works better, with a class specific accuracy of 0.84 as compared to model1's 0.76.
The main difference is that model2 has more hidden layers (specifically 5 layers as compared to model1's 2). This potentially allows model2 to reduce its bias error (i.e. underfitting) by learning more complex representations of the data. This can also be seen in how training loss is lower for model2 as compared to model1.

Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!