# PyTorch - homework 2: neural networks

-- Prof. Dorien Herremans

Please run the whole notebook with your code and submit the `.ipynb` file on eDimension that includes your answers [so after you run it]. 

In [1]:
from termcolor import colored

student_number="1004657"
student_name="Samuel Sim Wei Xuan"

print(colored("Homework by "  + student_name + ', Number: ' + student_number,'red'))

[31mHomework by Samuel Sim Wei Xuan, Number: 1004657[0m


 ## Question 1 -- XOR neural network [3pts]

a) Train an (at least) 2-layer neural network that can solve the XOR problem. Hint: be sure to check both this week and last week's lab. 

b) Check the predictions resulting from your model in the second code box below.


In [2]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

# load your data
X = torch.Tensor([[0,0],[0,1],[1,0],[1,1]])
Y = torch.Tensor([0,1,1,0]).view(-1,1)  

class FeedForwardNN(nn.Module):
    def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout):
        super(FeedForwardNN, self).__init__()
        assert num_hidden > 0
        self.hidden_layers = nn.ModuleList([]) # Storage of layers
        self.hidden_layers.append(nn.Linear(input_size, hidden_dim)) # Append 1st input layer
        for i in range(num_hidden - 1): # Append num_hidden layers
            self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))

        self.dropout = nn.Dropout(dropout) # Drop out layers
        self.output_projection = nn.Linear(hidden_dim, num_classes) # Final output Lyaer
        self.nonlinearity = nn.ReLU() # Non-linearities layers
        
    def forward(self, x):
        # Apply the hidden layers, nonlinearity, and dropout.
        for hidden_layer in self.hidden_layers:
            x = hidden_layer(x)
            x = self.dropout(x)
            x = self.nonlinearity(x)
    
        # Output layer: project x to a distribution over classes.
        out = self.output_projection(x)
        
        # Softmax the out tensor to get a log-probability distribution
        out_distribution = F.log_softmax(out, dim=-1)
        return out_distribution

# name your model xor
def xor(input_size:int, num_classes:int, num_hidden:int, hidden_dim:int, dropout:int):
    return FeedForwardNN(input_size,num_classes,num_hidden,hidden_dim,dropout)

xor = xor(input_size=2, num_classes=2, num_hidden=2, hidden_dim=10, dropout=0)
    
# define your model loss function, optimizer, etc. 
criterion = nn.NLLLoss()
optimizer = optim.SGD(xor.parameters(),lr=0.001,momentum=0.9)

# train the model
epoch = 1000
steps = X.size(0)

def train(model, optimizer, criterion, x = X, y = Y):
    model.train()
    for i in range(epoch):
        for j in range(steps):
            optimizer.zero_grad()             
            inp = x[j].unsqueeze(0)
            label = y[j].type(torch.LongTensor)      
            predicted = model(inp)   
            loss = criterion(predicted, label)     
            loss.backward()
            optimizer.step()

        if i % 100 == 0:
            print("Epoch num: {}, Loss: {}".format(i, loss))

train(xor, optimizer, criterion)

  from .autonotebook import tqdm as notebook_tqdm


Epoch num: 0, Loss: 0.8438106775283813
Epoch num: 100, Loss: 0.7058674097061157
Epoch num: 200, Loss: 0.692531406879425
Epoch num: 300, Loss: 0.6836292743682861
Epoch num: 400, Loss: 0.6504733562469482
Epoch num: 500, Loss: 0.49822258949279785
Epoch num: 600, Loss: 0.3867962062358856
Epoch num: 700, Loss: 0.2536177635192871
Epoch num: 800, Loss: 0.12854667007923126
Epoch num: 900, Loss: 0.06752711534500122


In [3]:
# test your model using the following functions (make sure the output is printed and saved when you submit this notebook):
# depending on how you defined your network you may need to slightly tweek the below prediction function
test = [[0,0],[0,1],[1,1],[1,0]]

for trial in test: 
  Xtest = torch.Tensor(trial)
  y_hat = xor(Xtest)
  y_hat_class = torch.argmax(y_hat, axis=0)

  print("{0} xor {1} = {2}".format(int(Xtest[0]), int(Xtest[1]), y_hat_class))

0 xor 0 = 0
0 xor 1 = 1
1 xor 1 = 0
1 xor 0 = 1


## Question 2  [2pts]

Imagine a neural network model for a multilabel classification task. 

a) Which loss function should you use?

b) The resulting trained modal has a high variance error. Give 4 possible solutions to improve the model. 


```
* a) Binary cross entropy loss for every class
* b) Possible solutions to improve high variance error model:
  - Reduce the number of features (using any feature selection techniques or even manually selecting them)
  - Use regularization (E.g. L1 & L2)
  - Add early stopping (stop training before overfitting)
  - Add data to training set
```


## Question 3 - Improve hit classification [5pts]

Remember the hit predicton dataset from last week? 

a) Improve the model using a multilayer perceptron. 

b) Make sure to run your models on the GPU. 

c) Tweek the hyperparameters such as number of nodes or layers, or other. Show two possible configurations and explain which works better and very briefly explain why this may be the case. 

In [4]:
# code your model 1
import numpy as np
import pandas as pd
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader

if torch.cuda.is_available():  
    device = "cuda:0" 
else:  
    device = "cpu"  

dataset = pd.read_csv('https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030training.csv')
Y = dataset['Topclass1030'].to_numpy()
X = dataset.drop('Topclass1030', axis=1).to_numpy()

class MLP(nn.Module):
    def __init__(self, input_size, num_classes, num_hidden, hidden_dim, dropout):
        super(MLP, self).__init__()
        assert num_hidden > 0
        self.hidden_layers = nn.ModuleList([]) 
        self.hidden_layers.append(nn.Linear(input_size, hidden_dim)) 
        for i in range(num_hidden - 1): 
            self.hidden_layers.append(nn.Linear(hidden_dim, hidden_dim))
        self.dropout = nn.Dropout(dropout) 
        self.output_projection = nn.Linear(hidden_dim, num_classes) 
        self.nonlinearity = nn.ReLU() 
        
    def forward(self, x):
        for hidden_layer in self.hidden_layers:
            x = hidden_layer(x)
            x = self.dropout(x)
            x = self.nonlinearity(x)
        out = self.output_projection(x)
        out_distribution = torch.sigmoid(out)
        return out_distribution

class HitClassificationDataset(Dataset):
    # https://pytorch.org/tutorials/beginner/basics/data_tutorial.html
    def __init__(self, inputs, targets):
        self.inputs = inputs 
        self.targets = targets

    def __len__(self):
        return len(self.inputs)

    def __getitem__(self, idx):
        return (self.inputs[idx].astype(np.float32), self.targets[idx].astype(np.float32))

train_dataset = HitClassificationDataset(X, Y)
train_dataloader = DataLoader(train_dataset, batch_size=6, shuffle=True)

model1 = MLP(input_size=X.shape[1], num_classes=1, num_hidden=2, hidden_dim=5, dropout=0.1).to(device)

criterion = nn.BCELoss() 
lr = 0.003
momentum = 0.9
optimizer = optim.SGD(model1.parameters(), lr=lr, momentum=momentum)
num_epochs = 1000

def train(model, num_epochs, optimizer, criterion):
    model.train()
    for epoch in range(num_epochs): 
        total_batch_loss = 0
        for (inputs, targets) in train_dataloader:
            predicted = model(inputs.to(device)).squeeze(1)  
            batch_loss = criterion(predicted, targets.to(device))
            total_batch_loss += batch_loss
            optimizer.zero_grad()
            batch_loss.backward()
            optimizer.step()

        if epoch % 100 == 0:
            print("Epoch num: {}, Epoch loss: {}".format(epoch, total_batch_loss))

train(model1, num_epochs, optimizer, criterion)

Epoch num: 0, Epoch loss: 37.51312255859375
Epoch num: 100, Epoch loss: 21.01948356628418
Epoch num: 200, Epoch loss: 13.406999588012695
Epoch num: 300, Epoch loss: 13.165366172790527
Epoch num: 400, Epoch loss: 12.370891571044922
Epoch num: 500, Epoch loss: 13.523350715637207
Epoch num: 600, Epoch loss: 10.853699684143066
Epoch num: 700, Epoch loss: 13.679305076599121
Epoch num: 800, Epoch loss: 10.793499946594238
Epoch num: 900, Epoch loss: 13.595739364624023


In [5]:
# evaluate model 1 (called model1 here)
import pandas as pd 

def run_evaluation(my_model):

  test = pd.read_csv('https://dorax.s3.ap-south-1.amazonaws.com/herremans_hit_1030test.csv')
  labels = test.iloc[:,-1]
  test = test.drop('Topclass1030', axis=1)
  testdata = torch.Tensor(test.values)
  testlabels = torch.Tensor(labels.values).view(-1,1)

  TP = 0
  TN = 0
  FN = 0
  FP = 0

  for i in range(0, testdata.size()[0]): 
    # print(testdata[i].size())
    Xtest = torch.Tensor(testdata[i]).to(device)
    y_hat = my_model(Xtest)
    
    if y_hat > 0.5:
      prediction = 1
    else: 
      prediction = 0

    if (prediction == testlabels[i]):
      if (prediction == 1):
        TP += 1
      else: 
        TN += 1

    else:
      if (prediction == 1):
        FP += 1
      else: 
        FN += 1

  print("True Positives: {0}, True Negatives: {1}".format(TP, TN))
  print("False Positives: {0}, False Negatives: {1}".format(FP, FN))
  rate = TP/(FN+TP)
  print("Class specific accuracy of correctly predicting a hit song is {0}".format(rate))

run_evaluation(model1)

True Positives: 40, True Negatives: 13
False Positives: 16, False Negatives: 10
Class specific accuracy of correctly predicting a hit song is 0.8


In [6]:
# code your model 2
model2 = MLP(input_size=X.shape[1], num_classes=1, num_hidden=2*2, hidden_dim=5*2, dropout=0.1).to(device)

criterion = nn.BCELoss() 
lr = 0.003
momentum = 0.9
optimizer = optim.SGD(model2.parameters(), lr=lr, momentum=momentum)
num_epochs = 1000

def train(model, num_epochs, optimizer, criterion):
    model.train()
    for epoch in range(num_epochs): 
        total_batch_loss = 0
        for (inputs, targets) in train_dataloader:
            predicted = model(inputs.to(device)).squeeze(1)   
            batch_loss = criterion(predicted, targets.to(device))
            total_batch_loss += batch_loss
            optimizer.zero_grad()
            batch_loss.backward()
            optimizer.step()

        if epoch % 100 == 0:
            print("Epoch num: {}, Epoch loss: {}".format(epoch, total_batch_loss))

train(model2, num_epochs, optimizer, criterion)

Epoch num: 0, Epoch loss: 38.74613571166992
Epoch num: 100, Epoch loss: 19.917156219482422
Epoch num: 200, Epoch loss: 11.314476013183594
Epoch num: 300, Epoch loss: 8.276206016540527
Epoch num: 400, Epoch loss: 7.354493618011475
Epoch num: 500, Epoch loss: 5.952020168304443
Epoch num: 600, Epoch loss: 4.600288391113281
Epoch num: 700, Epoch loss: 7.136390209197998
Epoch num: 800, Epoch loss: 4.231368064880371
Epoch num: 900, Epoch loss: 3.4576897621154785


In [7]:
# evaluate model 2 (called model2 here)
run_evaluation(model2)

True Positives: 42, True Negatives: 14
False Positives: 15, False Negatives: 8
Class specific accuracy of correctly predicting a hit song is 0.84


Which works better and why do you think this may be (very briefly)? 


`model1 = MLP(input_size=X.shape[1], num_classes=1, num_hidden=2, hidden_dim=5, dropout=0.1).to(device)`

`model2 = MLP(input_size=X.shape[1], num_classes=1, num_hidden=2*2, hidden_dim=5*2, dropout=0.1).to(device)`

The difference in both models lies in the number of hidden layers (num_hidden), the number of nodes of each hidden layer (hidden_dim) and the dropout rate. Model 1 has 2 hidden layers with size of 5 for each. Model 2 has 2 times the number of hidden layers with 2 times the size. Generally speaking, by increasing the size of the neural network and building a deeper one, the accuracy of the model might increase or decrease.

However, in this case, since our Model 1 has only 2 hidden layers with a size of 5 nodes each. The neural network is very small in model1, furthermore the size of the input features is 49 which is much larger than the size of each hidden layers. Therefore, the computations might not bring about the best accuracies and result in under fitting. Therefore, by increasing the overall size of depth of the neural network in model2, we should expect a higher accuracy as we reduce avoidable bias caused.

Indeed, we observe a better convergence in terms of training losses and also in terms of the final class specific accuracy. 

Additionally, submit your results [here](https://forms.gle/NtJJEE7Wm5ZRM3Je7) for 'Class specific accuracy of correctly predicting a hit song' and see if you got the best performance of the class! Good luck!