## Ass3 Live Programming - Classifier Evaluation
### Important:
* #### Read the entire task description before starting
* #### Do only what is asked
* #### You find documentation for the provided components below

### Task:
* #### Your task is to evaluate the performance of various classifiers (given in `classifiers`) for classifying PCA-transformed MNIST images with different numbers of PCA features.
  * The classifiers' interfaces are documented below.
  * Do not hardcode (or copy-paste) the code for the individual classifiers. Your function `evaluate` should work with arbitrary dictionaries of the same structure.
* #### The data loaded is already PCA-transformed, it has the same structure as always (feature_matrix: num_samples x num_features, label_vector: num_samples)
* #### Create an evaluation as shown in the plot below (plots just have to be there and don't have to be as beautiful as ours :P): 
  * The crosses (X) mark the highest accuracy for each classifier. You might want to use `np.argmax` ("Returns the indices of the maximum values along an axis.") to find the number of features corresponding to the highest accuracy.
  * Let `evaluate` return a dictionary containing the best number of features and corresponding accuracy for each classifier as a tuple, eg.:
  ```python
  {'euclidean_5nn': (50, 0.88),
    'cosine_5nn': (100, 0.89),
    'tree': (6, 0.69),
    'svc': (20, 0.93)}
    ```

#### All classifiers provide the following interfaces:

```
Classifier.fit(self, X, y)
    """Fits the model to the given training data

    Parameters
    ----------
    X : np.ndarray
        training features
    y : np.ndarray
        training labels
    """
    
Classifier.score(self, X, y)
    """Return the global accuracy of the given test data and labels.

    Parameters
    ----------
    X : np.ndarray
        testina features
    y : np.ndarray
        testina labels

    Returns
    -------
    score : float
        Mean accuracy of the prediction.
```

#### Lineplots can be created like this
```python
lineplot = hv.Curve((x_data, y_data))
```
#### To add a label, use
```python
lineplot = hv.Curve((x_data, y_data), label="whatever")
```
#### Scatterplots can be created like this
```python
scatterplot = hv.Scatter((x_data, y_data))
scatterplot = hv.Scatter((x_data, y_data)).opts(size=20, marker="x", color="k") # to create nice big X's
```
#### And combined into an overlay using hv.Overlay, or by multiplying them
```python
overlay = hv.Overlay(list_of_plots)
overlay = plot1 * plot2
```
#### Plots can be placed next to each other using hv.Layout from a list, or by adding them
```python
layout = hv.Layout(list_of_plots)
layout = plot_or_layout + plot_or_layout
```


In [1]:
# Execute this to import the provided components
import pickle

from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC

import numpy as np
import holoviews as hv
hv.extension("bokeh")

from IPython.display import clear_output
clear_output()

In [2]:
with open("image_pca.p", "rb") as pf:
    (X_train, y_train), (X_test, y_test) = pickle.load(pf)
    # X_train.shape -> (num_train_samples, num_features)
    # y_train.shape -> (num_train_samples,)
    # X_test.shape  -> (num_test_samples, num_features)
    # y_test.shape  -> (num_test_samples,)

In [27]:
classifiers = {
    "euclidean_5nn" : KNeighborsClassifier(n_neighbors=5, metric="euclidean"),
    "cosine_5nn" : KNeighborsClassifier(n_neighbors=5, metric="cosine"),
    "tree": DecisionTreeClassifier(),
    "svc": SVC(),
}


num_features = [1,2,3,4,5,6,7,8,9,10,20,30,40,50,100]

def evaluate(classifiers, X_train, y_train, X_test, y_test, num_dims):
    # YOUR CODE GOES HERE:
    
    big_dict = {}
    plot_list = []
    return_dict = {}
    
    for clf in classifiers.keys():
        big_dict[clf] = []
        return_dict[clf] = ()
        for feature in num_features:
            classifiers[clf].fit(X_train[:, :feature], y_train)
            big_dict[clf].append(classifiers[clf].score(X_test[:, :feature], y_test))
        value = big_dict[clf][np.argmax(big_dict[clf])]
        location = num_features[np.argmax(big_dict[clf])]
        return_dict[clf] = (location, value)
            
        plot_list.append(hv.Curve((num_dims, big_dict[clf]), legend=clf).opts(xlabel='Number of PCA features', ylabel="global accuracy"))
        
    display(hv.Overlay(plot_list))
        
    return return_dict
    
    

In [28]:
evaluate(classifiers, X_train, y_train, X_test, y_test, num_features)



{'euclidean_5nn': (50, 0.88),
 'cosine_5nn': (100, 0.89),
 'tree': (30, 0.69),
 'svc': (20, 0.93)}