## Multi-class classification

**Aim:** exercise with classification problems beyond binary classification. 

We will see two strategies to address the multi-class classification problem:
- OvR: one versus the rest
- OvO: one versus one

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import warnings
warnings.filterwarnings('ignore')

# unit test utilities: you can ignore these function
def is_approximately_equal(test,target,eps=1e-2):
    return np.mean(np.fabs(np.array(test) - np.array(target)))<eps

def assert_test_equality(test, target):
    assert is_approximately_equal(test, target), 'Expected:\n %s \nbut got:\n %s'%(target, test)

In [None]:
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix

## Question 1

Implement the procedures `train_svm, test_svm, score_svm` to train, test and compute the distance from the boundary surface for a large margin linear binary classifier.

Make the functions `estimator = train_svm(X_train, y_train, param)` and `test_svm(X_test, estimator)` and `score_svm(X_test, estimator)`. The function `train` takes in input a data matrix `X_train` a target vector `y_train` and a single value `param` which specifies the regularization constat `C`. The function `train_svm` outputs an estimator object. The function of type `test_svm` takes in input a data matrix `X_test` the fit object `estimator` and outputs the predicted targets. The function `score_svm` takes in input a data matrix `X_test` the fit object `estimator` and outputs the distance from the boundary surface for each instance. 

In [None]:
from sklearn.svm import LinearSVC

def train_svm(X_train, y_train, param):
    # YOUR CODE HERE
    raise NotImplementedError()
    
def test_svm(X_test, est):
    # YOUR CODE HERE
    raise NotImplementedError()
    
def score_svm(X_test, est):
    # YOUR CODE HERE
    raise NotImplementedError()

## OvR

The one-vs-rest strategy, also known as one-vs-all, consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. 

Advantages:
- it is computational efficiency as only n_classes classifiers are needed
- it is interpretabile as each class is represented by one and only one classifier

## Question 2

Implement the function `estimators = train_OvR(X_train, y_train, train_func, param)` and the function `preds = test_OvR(X_test, score_func, estimators)`. 

`train_OvR` takes in input the data matrix `X_train`, the target vector `y_train`, the training procedure `train_func` with an associated parameter `param` and it outputs an object that represent the estimators fit using the OvR strategy.

`test_OvR` takes in input the data matrix `X_test`, the scoring procedure `score_func`, the estimators object and it returns the predicted class for each instance in the data matrix.

*Note that here we need to employ a scoring procedure and not a classification procedure, as we need to find the most confident estimate for each prediction.*

In [None]:
def train_OvR(X_train, y_train, train_func, param):
    # YOUR CODE HERE
    raise NotImplementedError()
    
def test_OvR(X_test, score_func, estimator):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# This cell is reserved for the unit tests. Do not consider this cell. 
### BEGIN TESTS 
X_train = np.array([[10, 10],[8, 10],[-5, 5.5],[-5.4, 5.5],[-20, -20],[-15, -20]])
y_train = np.array([0, 0, 1, 1, 2, 2])
X_test = np.array([[-19, -20], [9, 9], [-5, 5]])
y_test = np.array([2, 0, 1])
est = train_OvR(X_train, y_train, train_svm, param=1)
preds = test_OvR(X_test, score_svm, est)
test_cm = confusion_matrix(y_test, preds)
target_cm = np.eye(3)
assert_test_equality(target_cm, test_cm)
### END TESTS

## OvO

The one-vs-one strategy constructs one classifier per pair of classes. At prediction time, the class which received the most votes is selected. 

In the event of a tie (among two classes with an equal number of votes), it selects the class with the highest aggregate classification confidence by summing over the pair-wise classification confidence levels computed by the underlying binary classifiers.
In this exercise you can ignore this case and break the tie at random.

Cons:
- complexity: it requires to fit n_classes * (n_classes - 1) / 2 classifiers

Pros:
- smaller individual problems: each individual learning problem only involves a small subset of the data whereas in OvR the complete dataset is used n_classes times

## Question 3

Implement the function `estimators = train_OvO(X_train, y_train, train_func, param)` and the function `preds = test_OvO(X_test, test_func, estimators)`. 

`train_OvO` takes in input the data matrix `X_train`, the target vector `y_train`, the training procedure `train_func` with an associated parameter `param` and it outputs an object that represent the estimators fit using the OvR strategy.

`test_OvO` takes in input the data matrix `X_test`, the scoring procedure `test_func`, the estimators object and it returns the predicted class for each instance in the data matrix.

In [None]:
def train_OvO(X_train, y_train, train_func, param):
    # YOUR CODE HERE
    raise NotImplementedError()

def test_OvO(X_test, test_func, estimators):
    # YOUR CODE HERE
    raise NotImplementedError()

In [None]:
# This cell is reserved for the unit tests. Do not consider this cell. 
### BEGIN TESTS 
X_train = np.array([[10, 10],[8, 10],[-5, 5.5],[-5.4, 5.5],[-20, -20],[-15, -20]])
y_train = np.array([0, 0, 1, 1, 2, 2])
X_test = np.array([[11, 11], [-5, 5], [-15, -15]])
y_test = np.array([0, 1, 2])
est = train_OvO(X_train, y_train, train_svm, param=1)
preds = test_OvO(X_test, test_svm, est)
test_cm = confusion_matrix(y_test, preds)
target_cm = np.eye(3)
assert_test_equality(target_cm, test_cm)
### END TESTS

## Question 4

Apply the OvR and OvO technique on the [digit dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html?highlight=digits#sklearn.datasets.load_digits).

Compare your results with the implementation offered by *scikit*.

You just need to run the following cells.

In [None]:
X,y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33)

In [None]:
est = train_OvR(X_train, y_train, train_svm, param=1)
preds = test_OvR(X_test, score_svm, est)
confusion_matrix(y_test, preds)

In [None]:
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import LinearSVC
preds = OneVsRestClassifier(LinearSVC(C=1)).fit(X_train, y_train).predict(X_test)
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, preds)

---

In [None]:
est = train_OvO(X_train, y_train, train_svm, param=1)
preds = test_OvO(X_test, test_svm, est)
confusion_matrix(y_test, preds)

In [None]:
from sklearn.multiclass import OneVsOneClassifier
from sklearn.svm import LinearSVC
preds = OneVsOneClassifier(LinearSVC(C=1)).fit(X_train, y_train).predict(X_test)
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, preds)

---