<a href="https://colab.research.google.com/github/abdallah197/neuralnetworks-project/blob/master/Abdallah_NNTI_Project1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **NNTI 19/20 Project 1:  PCA and Logistic Regression**

## **Deadline: 17 December 2019, 23:59**


---



**Important:** For all computations in this project, please use the torch library. The torch package contains data structures for multi-dimensional tensors and mathematical operations over these are defined. Additionally, it provides many utilities for efficient serializing of Tensors and arbitrary types, and other useful utilities.

# **1. Principal component analysis (PCA) [12 points]**

Principal component analysis (PCA) is a technique used to emphasize variation and bring out strong patterns in a dataset. It's often used to make data easy to explore and visualize. PCA is also a very useful dimesnionality reduction technique. In the folowing, we explore how to apply PCA on the CIFAR dataset.

CIFAR 10 is a collection images which is commonly used to train machine learning and computer vision algorithms. This dataset contains 50000 training images and 10000 validation images such that the images can be classified between 10 different classes.  The images in CIFAR-10 are of size 3x32x32, i.e. 3-channel color images of 32x32 pixels in size.

**Task 1. Getting the dataset using torchvision**

Torchvision is a pytorch package which helps in loading datasets in the image domain. It has dataloaders for common datasets like CIFAR 10, MNIST etc. Complete the code below to download the CIFAR dataset using torchvision. Torchvision returns the dataset in which every image is stretched out into a 3072-dimensional row vector.

Complete the code below to load the CIFAR dataset using torchvision. Print the labels and some images with the corresponding labels.
[1.5 points]

In [0]:
import torch
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
import numpy as np

# TODO: define transformation function using torchvision.transforms to convert images into pytorch tensors
transform = None

# TODO: use torchvision.datasets to load CIFAR-10 train dataset and the defined transform
train_dataset = None

# TODO: use torch.utils.data.DataLoader to get a python Iterable over the dataset, use batch_size = 20
train_loader = None

# TODO: get the first batch of images of the dataset using python Iterator
train_iter = None
images, labels = None, None

# TODO: plot the first batch of images of the training dataset

# the labels are numbered from 0 - 9 as follows
classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

**Points**: $0.0$ of $1.5$
**Comments**: None

**Task 2. Centering and normalization** 

PCA is applied on data after centering and normalization.
Complete the functions below to center and normalize the images. 

[1 point]

In [0]:
def center(images):
    # TODO : substract the mean from the images
    mean = torch.mean(images)
    centered_images = images - mean
    return centered_images

def normalize(images):
    # TODO : normalize the centered images by dividing by the standard deviation
    std = torch.std(images)
    normalized_images = images/std
    return normalized_images

def transform(images):
  # reshaping the images as a 3072-dimensional row vector
  images = images.reshape((images.shape[0], 3072))
  images = center(images)
  images = normalize(images)
  return images


**Points**: $0.0$ of $1$
**Comments**: None

**Task 3. Implementing PCA** 

PCA takes in the data points and the target dimension which is lesser than the original dimension of the data. In this case, the data matrix is of [number of images, 3072]. 

The following presents the main steps to perform a slightly modified version of PCA. This function takes as input the original data points, and one of the two parameters: target_dim or target_variance. If target_dim is given then return the projected data into a target_dim-dimensional space. Otherwise, the function returns data projected into a low dimensional space which captures a ratio of target_variance from the data. 


1.   Trasnform the data by centering and normalizing the images
2.   Get the data matrix of shape [#images, #dimensions] 
3.   Compute the covariance matrix of the data matrix
4.   Compute the eigenvectors and eigenvalues of the covariance matrix
5.   Sort eigenvectors by decreasing eigenvalues
6.   If target_variance is given then compute the target_dim corresponding to it.
7.   Select the top target_dim eigenvectors to get the encoding matrix of shape [target_dim, #dimesnions]
6.   Multiply the data matrix with the encoding matrix to project the data into the low dimensional space

Complete the function below to implement PCA and return the reduced dimension set and captured variance.

[8 points]

In [0]:
import numpy as np
import time
'''
PCA function must take the original images and either the number of target
dimensions or the ratio of the total variance to be captured from the data

'''
def PCA(images, target_dim = 0, target_variance = 0): 
  start = time.time()
  # transform the data
  data = transform(images)
  # TODO : compute the covariance matrix of the data points
  # cov = torch.from_numpy(np.cov(data.t()))
  cov = torch.mm(data.T, data)/data.size()[0]
  # print("cov at: {0}".format(time.time() - start))
  # TODO: compute the eigenvectors and eigenvalues of the covariance matrix
  w, lamb = torch.eig(cov, True)
  eigvecs = lamb
  eigvals = w[:,0]
  # print("eig at: {0}".format(time.time() - start))
  # TODO: sort eigenvectors by decreasing eigenvalues
  sorted_eigvals_idx = torch.argsort(eigvals)
  sorted_eigvecs = torch.index_select(eigvecs, 1, sorted_eigvals_idx)
  # print("sorted at: {0}".format(time.time() - start))
  
  # sorted.
  if(target_variance != 0):
    # TODO: compute target_dim such that target_variance is captured from the data
    sorted_eigvals, _ = torch.sort(eigvals)
    total_var = torch.sum(sorted_eigvals)

    current_var = 0
    target_dim = 0
    i = len(sorted_eigvals) - 1

    while current_var <= (target_variance * total_var) :
      target_dim += 1
      current_var += sorted_eigvals[i]
      i -= 1
    print("target variance at: {0}".format(time.time() - start))
    print("Changed target_dim to: {0}".format(target_dim))
  
  # TODO: choose $target_dim eigenvectors corresponding to the low dimensional subspace
  to_cat = []
  for i in range(target_dim):
    temp = sorted_eigvecs[-i].reshape(3072, 1)
    to_cat.append(temp)
  encoding_matrix = torch.cat(to_cat, dim=1)

  # TODO: multiply the data matrix with the encoding matrix to get the reduced dataset
  reduced_data = data.mm(encoding_matrix.float())

  return reduced_data, target_dim, encoding_matrix




**Points**: $0.0$ of $8$
**Comments**: None

**Task 4. PCA for dimensionality reduction** 

PCA is normally used to bring to acheive dimensionality reduction for high       dimensional datasets. This is acheived by bringing the dataset into a low dimensional subspace while still capturing most of the variance in the dataset. Use the above function to reduce the dataset into a 50 dimensional subspace.  

[0.5 point]

In [0]:
# TODO : use dataloader with batch size 2000 to load the dataset
train_loader = None
# TODO : apply PCA with target_d = 50 on the first batch (contains 2000 images) from the dataloader
reduced_data = None

**Points**: $0.0$ of $0.5$
**Comments**: None

**Task 5. PCA for visualization**

PCA is often used for visualization purposes.  Visually exploring the data can become challenging when we have more than 3 features. But this can be a very useful tool when dealing with data related problems. 

Please followe the steps below and complete the folowing code to create a scatterplot of the first and second principal components. Make use of matplotlib for the following.

1.   Create a 2D scatter plot. For each data point, plot the first principle component on $x$ axis and the second principle component on $y$ axis, use different colors for each class.
2.   Set corresponding labels: assign label "first principle component" for $x$ axis and "second principle component" for $y$ axis.
3.   Add legends for each class.
 
Are the first two components discriminative enough to classify points from any pair of the ten classes ?
[1 point]

In [0]:
# create a scatterplot of the first two dimensions of the reduced data

**Points**: $0.0$ of $1$
**Comments**: None


---



# **2. Logistic regression [18 points]**


Logistic regression is a classification algorithm used to assign observations to a discrete set of classes. Unlike linear regression which outputs continuous number values, logistic regression transforms its output using the logistic sigmoid function to return a probability value which can then be mapped to two or more discrete classes.

In this section, we explore the binary logistic regression. We change the clsssification problem with 10 classes into a binary classification problem by considering only the points from the classes *'ship'* and *'car'*.

![Logistic regression image](https://drive.google.com/uc?id=1eRF1-2qnQYAkkCDpAwROj5MiMKzct2x0)


**Task 6. Generating training dataset** 

Since logistic regression is a classification problem with two classes, we need a labelled dataset with two classes as the training set. Complete the function to obtain the datapoints corresponding to labels *'ship'* and *'car'*. 

Please follow the steps below: 

1. Load the train and test sets of CIFAR 10 from torchvision using a batch size of 1024.
2. Split the training samples by 80:20 ratio into train set and validation set respectively. 
3. Filter the datasets to only have images with classes 'ship' or 'car'. The corresponding labels are 8 and 1 respectively.
4. The new labels for binary classification problem should be 'ship' : 0 and 'car': 1.
5. Define a [torch.utils.data.Dataset](https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html) with the filtered tensors of images and the newly created labels.
6. Define a dataloader for training and validation datasets with batch_size 64.

[1.5 point]


In [0]:
# TODO : generate the train, validation, and test sets from CIFAR 10  
import os
import torch
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from torch.utils.data import Dataset, DataLoader, random_split
from torchvision import transforms, utils, datasets
from torch import nn,autograd
from torch.autograd import Variable
import warnings
warnings.filterwarnings("ignore")



In [4]:
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) #mean, std for Red, Green and Blue
])
train = datasets.CIFAR10(root='data/cifar/', train=True, transform=transform, target_transform=None, download=True)
test =datasets.CIFAR10(root='data/cifar/', train=False, transform=transform, target_transform=None, download=True)

0it [00:00, ?it/s]

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to data/cifar/cifar-10-python.tar.gz


170500096it [00:06, 27175875.93it/s]                               


Extracting data/cifar/cifar-10-python.tar.gz to data/cifar/
Files already downloaded and verified


In [0]:

class TrainCIFARDataset(Dataset):
    

    def __init__(self, file):
        self.file = self.filter_dataset(file)
    
 
    def __len__(self):
        return len(self.file)
    def __getitem__(self, idx):
        return self.file[idx]

    def replace_target_values(self ,t):
        t = list(t)
        targets = {8:0, 1:1}
        t[1] = targets[t[1]]
        return tuple(t)

    def filter_dataset(self, data):
        """
        filter the data to get only classes with ship and car, then replace the values of ships and car to 0,1
        """
        targets = {8:0, 1:1}
        filtered_ = [data[i] for i in range(len(data)) if data[i][1] in targets] 
        filtered_ = list(map(self.replace_target_values, filtered_)) 
        return filtered_

class TestCIFARDataset(TrainCIFARDataset):
    def __init__(self, file):
        self.file = self.filter_dataset(file)
    def __len__(self):
        return len(self.file)

    def __getitem__(self, idx):
        return self.file[idx]


In [0]:
train_data = TrainCIFARDataset(train)
train_loader = DataLoader(train_data, batch_size=1024)
test_dataset = TestCIFARDataset(test)
test_dataset = DataLoader(test_dataset, batch_size=1024)

train_size = int(0.8 * len(train_data))
test_size = len(train_data) - train_size


train_dataset, val_dataset = random_split(train_data, [train_size, test_size])
train_dataset, val_dataset = DataLoader(train_dataset, batch_size=64), DataLoader(val_dataset, batch_size= 64)

In [0]:
# for batch, label in train_dataset:
#     print(batch,label)
#     break

**Points**: $0.0$ of $1.5$
**Comments**: None

**Task 7. Get the dataset in the low dimensional subspace** 

Apply pca on the original data points to get new data matrix.
The target dimensions must capture 90% of the variance in the data.
Use the previously defined PCA function for this task. In the coming sections, we will be using this projected training dataset to training the logistic regression model.

[0.5 point]


In [8]:
# TODO : use the pca function defined in task 3 to reduce dimensions of the train set 
reduced_data = PCA(images, target_variance=0.9)

NameError: ignored

**Points**: $0.0$ of $0.5$
**Comments**: None

**Task 8. Logistic regression hypothesis** 

In order to map predicted values to probabilities, logistic regression needs a function which returns values between 0 and 1. Logistic function is used in this case. This function maps any real value into another value between 0 and 1. In machine learning, it is also referred to as sigmoid and is used to map predictions to probabilities.

$f(x) = \frac{1}{1 + e^{-x}}$

![Image of logistic regression function](https://en.wikipedia.org/wiki/Logistic_function#/media/File:Logistic-curve.svg)

Complete the following function which returns the sigmoid of a given input. 

[0.5 point]


In [0]:
def sigmoid(x):
    m = nn.Sigmoid()
    return m(x)
    #another implementatios is return 1/(1+torch.exp(-x))
  # TODO find and return the sigmoid of x
  

**Points**: $0.0$ of $0.5$
**Comments**: None

**Task 9. Loss function** 


A common loss function used when dealing with probabilities in binary classification is binary cross entropy loss.

$cross\_entropy\_loss(y, \hat y) = \frac{1}{N} \sum_{i=0}^{N} y\log \hat y_{i} + (1-y)\log (1 - \hat y_{i})$

For binary cross entropy loss, the number of classes is 2.

Read about cross entropy in this [link](https://en.wikipedia.org/wiki/Cross_entropy).

Complete the following function to return the binary cross entropy loss. 

[1 point]

In [0]:
def bce(y_true, y_pred):
    loss = nn.BCELoss()
    bce_loss = loss(y_pred, y_true) 
    # bce_loss = torch.mean(y_true * torch.log(y_pred) + (1-y_true)*torch.log(1-y_pred))
  # TODO : compute mean binary cross entropy loss given a list of predicted and true labels
#   bce_loss = torch.mean((torch.dot(y_true.float(), torch.log(y_pred))
#        + torch.dot((torch.ones(y_true.size())-y_true), torch.log(torch.ones(y_true.size())-y_pred))
#   ))

    return bce_loss


**Points**: $0.0$ of $1$
**Comments**: None

**Task 10. Gradient descent to minimize the loss** 

The logistic regression parameter need to be optimized to minimize the loss function.

We have the output of the logistic regression given a vector **x** as follows.

$f(x) = \frac{1}{1 + e^{-wx}}$

Complete the following function to calculate the gradient of binary cross entropy loss function with respect to the parameter w. 

[1 point]

In [0]:
def gradient(loss):
  # TODO : compute and return the gradient of loss w.r.t the weight parameter
  
  return loss.backward()

**Points**: $0.0$ of $1$
**Comments**: None

**Task 11 . Fitting the model** [5 point]

Complete the function below which fits a logistic regression model on the given input data with the specified learning rate and number of epochs using stochastic gradient descent.

Follow the steps below to complete the function.
```
For each epoch:
  For each mini batch:
    1. Compute the predicted probabilities for all samples in the batch (y_pred)
    2. Compute the predicted probabilities for all samples in the batch (y_pred).
    3. Compute mean loss of the batch using function defined in task 9.
    4. Compute the gradient of the loss w.r.t the weight parameter. use functin defined in task 10.
    5. Keep track of the mean loss during each epoch.
    6. Update weight parameter using stochastic gradient descent. The batch size is 64, according to the dataloaders defined in task 6. 
Return the loss and the optimized weight parameter.
```


In [0]:
np.random.seed(0)

def fit(data, epochs, learning_rate):
    

    
    # TODO: get the data points and corresponding labels
    #   x, y = [sample for sample in data]
    '''TODO: The input x, which is multidimensional in this case, is multiplied
     with the logistic regression parameter W to get a scalar. This is then 
     passed to the sigmoid function to get the probability. Use a tensor of 
     the required shape to initialize the weight parameter
    '''
    num_of_batches = len(data)
    np.random.seed(0)
    weight = torch.randn(1,3072, requires_grad= True)
    # looping over the data   
    for epoch in range(epochs):  
        net_loss = 0.0 
        # for each mini batch
        for batch_x, batch_y in data:
            x_ = batch_x.view(64,3072) # 3 x 32 x 32
            # z = torch.mm(x_, weight)
            # x_, target_dim, encoding_matrix = PCA(x_,target_dim=25)
            # print(x_.size())
            z = torch.mm(weight, x_.t())
            y_pred = sigmoid(z)
            # TODO: compute the mean loss of the batch
            # y_pred = y_pred.view(64)
            # y_pred  = torch.tensor(y_pred, dtype=torch.long)
            y_true = batch_y.view(1, 64).float()
            loss = bce(y_true,y_pred)
            # print(loss)
            net_loss += loss

            # TODO: compute the gradient of the loss w.r.t weight
            gradient(loss)
            # TODO : perform one step of stochastic gradient descent to update weight
            with torch.no_grad():
                weight -= learning_rate * weight.grad
                weight.grad.zero_()
            


        print('epoch: %d net_loss: %6.3f'%(epoch, net_loss/num_of_batches))
        
    return weight, net_loss/num_of_batches

**Points**: $0.0$ of $5$
**Comments**: None

**Task 12. Hyperparameter tuning** 

The learning rate and the number of epochs are important hyperparameters that need to be set before training. 
Complete the function below to select the best hyperparameters given the list of possible combinations.

[1.5 point]

In [0]:

def select_best_hyperparams(data, learning_rates, epochs):
  # TODO : initialize best loss
    best_loss = 0
    hyperparams = {}

    for learning_rate, epoch in zip(learning_rates, epochs):
        np.random.seed(0)
        # TODO find the hyperparameter combination which returns the minimum loss after training (using fit function)
        weight, loss = fit(data,epoch, learning_rate)
        hyperparams[loss]= [epoch,learning_rate]    
         
    best_hyperparams = hyperparams[min(hyperparams)]
    best_loss = min(hyperparams)
    return best_hyperparams, best_loss

**Points**: $0.0$ of $1.5$
**Comments**: None

**Task 13. Training using the best hyperparameters** [0.5 point]

Complete the code below to select the best hyperparamater combination and then fit the training data using the selected learning rate and number of epochs.



In [21]:
# hyperparameters combinations 

np.random.seed(0) #to reproduce the results 
learning_rates = [0.01, 0.1, 1] 
epochs = [25, 50, 100]
# TODO : use the function defined in task 12 to find the best hyperparameter combination from the above list
best_hyperparams , best_loss= select_best_hyperparams(train_dataset,learning_rates, epochs)

print('Best hyperparameters using validation data.\nLearning rate: %5.3f, Number of epochs: %d '% (best_hyperparams[1], best_hyperparams[0]))

epoch: 0 net_loss: 13.891
epoch: 1 net_loss: 12.061
epoch: 2 net_loss: 10.455
epoch: 3 net_loss:  9.307
epoch: 4 net_loss:  8.513
epoch: 5 net_loss:  7.938
epoch: 6 net_loss:  7.515
epoch: 7 net_loss:  7.228
epoch: 8 net_loss:  6.942
epoch: 9 net_loss:  6.712
epoch: 10 net_loss:  6.520
epoch: 11 net_loss:  6.374
epoch: 12 net_loss:  6.267
epoch: 13 net_loss:  6.179
epoch: 14 net_loss:  6.046
epoch: 15 net_loss:  5.965
epoch: 16 net_loss:  5.880
epoch: 17 net_loss:  5.787
epoch: 18 net_loss:  5.696
epoch: 19 net_loss:  5.619
epoch: 20 net_loss:  5.544
epoch: 21 net_loss:  5.474
epoch: 22 net_loss:  5.411
epoch: 23 net_loss:  5.329
epoch: 24 net_loss:  5.241
epoch: 0 net_loss:  9.941
epoch: 1 net_loss:  6.750
epoch: 2 net_loss:  6.196
epoch: 3 net_loss:  5.875
epoch: 4 net_loss:  5.393
epoch: 5 net_loss:  5.137
epoch: 6 net_loss:  4.828
epoch: 7 net_loss:  4.480
epoch: 8 net_loss:  4.258
epoch: 9 net_loss:  4.032
epoch: 10 net_loss:  3.892
epoch: 11 net_loss:  3.696
epoch: 12 net_loss:  

**Points**: $0.0$ of $0.5$
**Comments**: None

**Task 14. Logistic regression threshold** 

Logistic regression takes an input and returns a values between 0 and 1. To interpret this output as a probability of the input being in a class, we need to define a threshold. We set a threshold of 0.5.

We predict class 0 if f(x) is greater than or equal to 0.5, else we predict the data point to be of an instance of class 1.

Complete the following function which predicts the class (ship or car) of a given input. [1 point]

In [22]:
threshold = 0.5
l_Rate, epoch = best_hyperparams[1], best_hyperparams[0]
weight, loss = fit(train_dataset,epoch, l_Rate)



epoch: 0 net_loss:  9.312
epoch: 1 net_loss:  6.980
epoch: 2 net_loss:  6.266
epoch: 3 net_loss:  5.766
epoch: 4 net_loss:  5.301
epoch: 5 net_loss:  4.920
epoch: 6 net_loss:  4.584
epoch: 7 net_loss:  4.222
epoch: 8 net_loss:  3.962
epoch: 9 net_loss:  3.722
epoch: 10 net_loss:  3.538
epoch: 11 net_loss:  3.402
epoch: 12 net_loss:  3.342
epoch: 13 net_loss:  3.225
epoch: 14 net_loss:  3.164
epoch: 15 net_loss:  3.101
epoch: 16 net_loss:  3.075
epoch: 17 net_loss:  2.973
epoch: 18 net_loss:  2.890
epoch: 19 net_loss:  2.840
epoch: 20 net_loss:  2.688
epoch: 21 net_loss:  2.758
epoch: 22 net_loss:  2.722
epoch: 23 net_loss:  2.646
epoch: 24 net_loss:  2.590
epoch: 25 net_loss:  2.545
epoch: 26 net_loss:  2.502
epoch: 27 net_loss:  2.455
epoch: 28 net_loss:  2.435
epoch: 29 net_loss:  2.421
epoch: 30 net_loss:  2.340
epoch: 31 net_loss:  2.344
epoch: 32 net_loss:  2.288
epoch: 33 net_loss:  2.274
epoch: 34 net_loss:  2.290
epoch: 35 net_loss:  2.260
epoch: 36 net_loss:  2.248
epoch: 37 n

In [0]:
def predict(x):
  # TODO : compute the predicted class label
    z = torch.mm(weight, x.t())

    z = sigmoid(z)
    # fc = nn.Linear(1024, 1)
    # z = fc(z).view(1)
    return z

[link text](https://)**Points**: $0.0$ of $1$
**Comments**: None

**Task 15. Reporting accuracy on test set** [*0.5* point]

The test set is used to give an indication of the generalization abilities of the model, that is to estimate how good the model is over random guessing at an unseen data point.

Complete the code below to compute the accuracy of logistic regression model on the test set. For this, first bring the test set to the low dimensional subspace and then make predictions using the trained model. 


In [65]:
# TODO : bring the test set into the low dimesnional subspace defined earlier for the train set
reduced_test_set = None

# TODO: compute the accuracy on the reduced test set
acc = 0.0
n = 0
for image, label in test_dataset:
    n += label.shape[0] 
    image = image.view(image.size()[0],3*32*32)
    out = predict(image)

    out = (out>0.5)
    correct = (out == label).float().sum()
    acc+= correct
print(n)
acc = acc/n



print('Accuracy on the test set : %6.3f'% (acc))

2000
Accuracy on the test set :  0.754


In [51]:
print(1==True)

True


**Points**: $0.0$ of $0.5$
**Comments**: None

**Task 16. Improving accuracy on test set** [*5* point]

Use pytorch's neural network layer functions and construct a model which gives better accuracies for the same training and test set. You can have a look at the torch.nn package for this. 

Describe the model and explain why it performs better ?

In [0]:
def my_model():
  # TODO

**Points**: $0.0$ of $5$
**Comments**: None