# Assignment 2

In this assignment, you will (1) extract global features from a publicly availabe dataset with one of the pre-trained neural networks available in pytorch, and (2) classify the dataset using the traditional k-Neural Neighbours classifier.

You will be also asked to impelment k-fold cross-validation to evaluate your model.

------------------------

In [None]:
# Load needed packages
import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

When working wiht Pytorch, dataloader() is a must to know function.

Read more about this function and the parameters it accepts in https://blog.paperspace.com/dataloaders-abstractions-pytorch/ ;

DataLoader(
    dataset,
    batch_size=1,
    shuffle=False,
    num_workers=0,
    collate_fn=None,
    pin_memory=False,
 )

In [None]:
from torch.utils.data import DataLoader

The variable transform encapsulates the needed transformations of our data

Read more about transforms in https://blog.paperspace.com/dataloaders-abstractions-pytorch/

In [None]:

transform = transforms.Compose([
    # resize
    transforms.Resize(32),
    # center-crop
    transforms.CenterCrop(32),
    # to-tensor
    transforms.ToTensor(),
    # normalize
    transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])
])

Load your dataset

In [None]:
# Example solution for the CIFAR dataset 
# Information about the dataset
dataset = 'CIFAR10'
classes = ('plane', 'car', 'bird', 
           'cat','deer', 'dog', 'frog', 
           'horse', 'ship', 'truck')


dataset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

dataloader = torch.utils.data.DataLoader(dataset, batch_size=4,
                                          shuffle=False)

## Exercise: RGB feature extraction

Extract RGB values from the image as three lists. Concatenate those 3 lists createing a 1-D feature vector. This feature vectos is the descriptor of your image.

In [None]:
# Solution goes here

## Exercise: Feature extraction using pre-traind networks

Load pretrained a network to extract global features from the images. 
We will use the values of the last fully connected layer of the deep network as a descriptor, i.e. we will remove the last fully-connected layer. Therefore, after feed-fowarding the input throught the network, we save the output as the descriptor of the image.

You can use different networks for this purpose.

In [None]:
import torch.nn as nn
from torchvision import models

# name of the model you wish to use and must be selected from this list
# [resnet, alexnet, vgg, squeezenet, densenet, inception]

In [None]:
# Solution:
import torch.nn as nn
from torchvision import models

# Load model
model = models.resnet18(pretrained=True)

# Remove last fully-connected layer
new_classifier = nn.Sequential(*list(model.classifier.children())[:-1])
model.classifier = new_classifier

# Iterate over the images extracting features
extracted_features= []
for sample in dataloader:
    extracted_features.append(model(sample))
    
# save extracted features
np.save(extracted_features,'extracted_features' + dataset)

------------------------

## Exercise: Dataset preparation

Optional - to also evaluate the features you extracted in Assignment 1 

In [None]:
# Optional (and interesting!): Reusing the extracted RGB features in the previous assignment
RGBfeatures = np.load('datasetRGB.npy')
RGBfeatures = shuffle(RGBfeatures)

print('Number of samples: ',len(RGBfeatures))

### Train - Test Split

Write a function **train_test_split(dataset, ratio)** which takes a dataset as an input and returns two datasets one for training and another for testing.


In [None]:
def train_test_split(dataset, ratio):
    ...
    return training_data,testing_data

In [None]:
# Solution Numpy version:
def train_test_split(dataset, ratio):
    print('Total number of samples:', len(dataset))
    i = int((1-ratio)* len(dataset))
    train_dataset = dataset[:i,:] 
    test_dataset = dataset[i:,:]
    print('Samples Train:', len(train_dataset))
    print('Samples Test:', len(test_dataset))
    return train_dataset,test_dataset

--------------------------------

## Exercise: Performance evaluation

Implement a function to evaluate the accuracy of your prediction. 
We will rely on the evalution metric accuracy.

In [None]:
def accuracy_metric(actual, predicted):
    ...
    return accuracy_value

In [None]:
# Solution:
def accuracy_metric(actual, predicted):
    correct = 0
    for i in range(len(actual)):
        if actual[i] == predicted[i]:
            correct += 1
    return correct / float(len(actual)) * 100.0

--------------------------------

## Exercise: Train and test your Nearest Neighbour model

Apply the classifier with different values of k (number of nearest neighbours) to the two set of previously extracted descriptors (RGB and CNN features) and evaluate the performance of your models (accuracy).

You can have a look at the documentation to understand the parameters that define the learning of the model,
https://scikit-learn.org/stable/modules/neighbors.html


#### a) Train and Test your model - Assess and show the performance of your model

In [None]:
from sklearn.neighbors import NearestNeighbors
import numpy as np

In [None]:
# Use your k-NN - play with the value of the parameters to see how the model performs
kvalue_list = [2,4,10] 
for kvalue in kvalue_list:
    ...
    print('Accuracy of the model is ', acc)

In [None]:
# Solution:

kvalue_list = [2,4,10]
for kvalue in kvalue_list:
    
    knn = KNeighborsClassifier(n_neighbors=kvalue, metric='euclidean')
    knn.fit(train_features, train_labels)
    
    for sample in test_data_
        test_pred.append(knn.predict(test_features))
    
    acc = accuracy_metric(test_labels, test_pred)
    
    print('Accuracy of the model is ', acc)

#### b) Visualize resutls 

Steps to follow:

1) Apply PCA and select the 2 first principal components to represent each sample.

2) Plot the samples with dots. Use a color per class. 

3) Plot the samples again but with empty filled circles. Use the color of the class predicted per sample (misclassifications will make the colors to not coincide).

You can do this for (1) training and (2) test set. In (1) you can see how well the method fits to the training data and (2) will give you an idea of the missclassifications.

In [None]:
# your code here

## Exercise: k-Fold cross validation

Assess the performance of your implemented Neural network using k-Fold cross validation. 

Remember that, for each fold, the network weights need to be initialized. 

Run your implemented function evaluating for k = 2, 5 and 10. You are also suggested to implement the leave-one-out strategy. Report the average accuracy and the standard deviation.

In [None]:
# Load packages
from sklearn.model_selection import KFold
import numpy as np
from sklearn.utils import shuffle
from sklearn.svm import SVC

dataset = np.load('extracted_features' + dataset + '.npy')
dataset = shuffle(dataset)

# K fold parameters
N = len(dataset)
k_list = [2,5,10]

In [None]:
# Solution:

avg_acc_list,std_list= [],[]
for k in k_list:
    print(' ')
    print(' ')
    print("Running experiments for k = ", k)
    print(' ')
    kf = KFold(n_splits=k)
    kf.get_n_splits(dataset)
    
    folds_acc= []
    for train_index, test_index in kf.split(dataset):

      
        # prepare data
        train_dataset = dataset[train_index,:]
        test_dataset  = dataset[test_index,:]
        
        print('# train:',len(train_dataset))
        print('# test:',len(test_dataset))
        print(' ')
        
        clf = make_pipeline(StandardScaler(), LinearSVC(random_state=0, tol=1e-5))
        
        # Train network
        clf.fit(X, y)
        

        # Make a prediction with a network
        predicted,gt = [],[]
        for row in test_dataset:
            prediction = predict(network, row)
            gt.append(row[-1])
            print('Expected=%d, Got=%d' % (row[-1], prediction))

            if prediction>0.5: # value we set to accept a prediction as true
                predicted.append(1)
            else:
                predicted.append(0)

        # Assess performance network
        accuracy_value = accuracy_metric(gt, predicted)
        print('Accuracy on test data = ', (accuracy_value),'%')   
        folds_acc.append(accuracy_value)
    # Accumulate results tree
    avg_acc_list.append(folds_acc)
    print('-------                 -------                 -------                 -------')
    print(' ')
    print('            -------                 -------                 -------            ')
    print(' ')
    print('-------                 -------                 -------                 -------')

In [None]:
print('Summary results:')
print(' ')
print(' ')
for i,k in enumerate(k_list):
    print(k,'-fold cross validation:')  
    print('Accuracies per fold: ', avg_acc_list[i]) 
    
    avg_acc = round(sum(avg_acc_list[i])/k,2)
    std_list= round(np.std(avg_acc_list[i]),2)
    print('Average accuracy: ', avg_acc,'+-', std_list) 
    print(' ')

### Extra possible exercises: 
- implement other classifiers, 
- extract other descriptors from the images,
- implement ohter evlauiton metrics: recall, precission and f-score.