# Multiclass Classification

While binary classifiers are used to distinguish between two classes (e.g. detect if a transaction is a fraudulent one, classify an email into either spam or non-spam and etc.), multiclass classifiers distinguish between more than two classes. 

There are various ways that we can use to perform multiclass classification by leveraging any binary classifiers. In this exercise, you will implement two such strategies for multiclass classification: _One-versus-all_ strategy and _One-versus-one_ strategy.

- **One-versus-all (OvA)**: In this strategy, we train a single binary classifier per class, with the samples of that class as positive samples and all other samples as negatives. During inference, we get the prediction from each classifier and select the class with the highest score. This strategy is also called the one-versus-the-rest strtegey. 

- **One-versus-one (OvO)**: In this strategy, we train a binary classifier for every pair of classes. If there are N classes in the problem, you need to train N * (N-1) / 2 classifiers. During inference, we have to run through all N * (N-1) / 2 classifiers and ses which class wins the most votes. The main advantage of OvO strategy is that each binary classifier only needs to be train on the part of the training dataset for the two classes that it needs to separate. 

In [1]:
# import packages
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import LogisticRegression

# make this notebook's output stable across runs
np.random.seed(0)

## Avila Dataset

In this lab assignment, we use the [Avila](https://archive.ics.uci.edu/ml/datasets/Avila) data set has been extracted from 800 images of the the "Avila Bible", a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain.  
The palaeographic analysis of the  manuscript has  individuated the presence of 12 copyists. The pages written by each copyist are not equally numerous. 
Each pattern contains 10 features and corresponds to a group of 4 consecutive rows.

The prediction task consists in associating each pattern to one of the 12 copyists (labeled as: A, B, C, D, E, F, G, H, I, W, X, Y).
The data have has been normalized, by using the Z-normalization method, and divided in two data sets: a training set containing 10430 samples, and a test set  containing the 10437 samples.


In [2]:
# Load train and test data from CSV files.
train = pd.read_csv("avila-tr.txt", header=None)
test = pd.read_csv("avila-ts.txt", header=None)

x_train = train.iloc[:,:-1]
y_train = train.iloc[:,-1]

x_test = test.iloc[:,:-1]
y_test = test.iloc[:,-1]

In [3]:
# Output the number of images in each class in the train and test datasets
y_train.value_counts()

A    4286
F    1961
E    1095
I     831
X     522
H     519
G     446
D     352
Y     266
C     103
W      44
B       5
Name: 10, dtype: int64

In [4]:
y_test.value_counts()
# Train and test have the same distribution of data samples per each class

A    4286
F    1962
E    1095
I     832
X     522
H     520
G     447
D     353
Y     267
C     103
W      45
B       5
Name: 10, dtype: int64

Question 1.1: Check for missing Data

In [5]:
res = y_train.isnull().sum()
res += y_test.isnull().sum()
res += x_test.isnull().sum()
res += x_train.isnull().sum()
res[res > 0]
# There is no missing data

Series([], dtype: int64)

Question 1.2: Apply Z-normalization to data

In [6]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(x_train)
X_test = scaler.transform(x_test)

In [7]:
# Convert x_train numpy array type to pandas dataframe
X_train = pd.DataFrame(X_train, index=x_train.index, columns=x_train.columns)
X_test = pd.DataFrame(X_test, index=x_test.index, columns=x_test.columns)

Question 2.1: Write a method to train multiple logistic regression models performing One vs All (OvA) classification. The method allows you to pass in training features, and target. The method returns a list of models and their associated labels. 
Within the method:
- Determine the list of classes
- Create a place to store all the models
- For each class, train a model with the target variable set to 1 and 0 for all other classes
- Return the list of models trained and associated labels.

In [12]:
def trainOvA(x, y):
    """
    TODO: Train the multiclass classifier using OvA strategy. 
    """
    labels = sorted(y.unique())
    n_labels = len(labels)
    print("number of classes is {}".format(n_labels))
    # Creating variables for storing models and their labels
    models = [None] * n_labels
    model_labels = [None] * n_labels

    for i, label in enumerate(labels):
        #Create model
        model = LogisticRegression()
        label = labels[i]
        print("Train Logistic Regression model for class {}".format(label))

        # update the label according to OvA strategy
        # All rows with label will now have 1 output and all rows with non-label will have 0 output
        y_label = y.replace(label, 1)
        y_label = y_label.replace(labels[:i] + labels[i+1:], 0)
        # Reference: https://stackoverflow.com/questions/19286657/index-all-except-one-item-in-python
        
        # Train model
        models[i] = model.fit(x,y_label.values.ravel())
        model_labels[i] = y_label
        
    return models, model_labels

Question 2.2: Write a method that leverage the multiple models train for OvA, and outputs the majority class.

In [13]:
def predictOvA(models, model_labels, x):
    """
    TODO: Make predictions on multiclass problems using the OvA strategy. 
    """
    if models == None:
        sys.exit("The model has not been trained yet. Please call train() first. Exiting...")
        
    predictions = []

    #Create prediction
    for i,model in enumerate(models):
        prediction = models[i].predict(x)
        predictions.append(prediction)

    predictions = np.transpose(predictions)
    # Transposing to convert 12 x 1437 output into a 1437 x 12 matrix so that max between each unique class can be queried for each test input sample
    predictions = pd.DataFrame(predictions, columns=sorted(y_test.unique()))
    return predictions.idxmax(axis=1).values

I see a benefit here in calling predict_proba() instead of predict(). Use predict_proba() to generate the probabilities for class 0 and class 1. Choose and store the max of the probabilites in the predict[] list and then apply idmax on the data set. This way we'll be choosing a more accurate max based on probabilities and not based on 1/0.

Question 2.3: Train OvA model on the Avila dataset

In [14]:
models, model_labels = trainOvA(X_train, y_train)

number of classes is 12
Train Logistic Regression model for class A
Train Logistic Regression model for class B
Train Logistic Regression model for class C
Train Logistic Regression model for class D
Train Logistic Regression model for class E
Train Logistic Regression model for class F
Train Logistic Regression model for class G
Train Logistic Regression model for class H
Train Logistic Regression model for class I
Train Logistic Regression model for class W
Train Logistic Regression model for class X
Train Logistic Regression model for class Y


Question 2.4: Predict and evalutate the results of your model

In [15]:
te_z_ova = predictOvA(models, model_labels, X_test)
te_z_ova

array(['A', 'A', 'A', ..., 'A', 'A', 'A'], dtype=object)

In [16]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

ova_accuracy = accuracy_score(y_test, te_z_ova)
ova_confuction_matrix = confusion_matrix(y_test, te_z_ova)
print("Accuracy of OvA classifier is {}.".format(ova_accuracy))
print("Confusion matrix of OvA classifier: \n {}".format(ova_confuction_matrix))

Accuracy of OvA classifier is 0.49717351729424164.
Confusion matrix of OvA classifier: 
 [[4258    0    0    0   12    2    0    0   13    0    1    0]
 [   3    2    0    0    0    0    0    0    0    0    0    0]
 [ 101    0    0    0    1    0    0    0    1    0    0    0]
 [ 353    0    0    0    0    0    0    0    0    0    0    0]
 [1084    0    0    0    2    0    0    0    2    0    6    1]
 [1946    0    0    0    0    1    0    1   12    0    0    2]
 [ 446    0    0    0    0    0    0    1    0    0    0    0]
 [ 515    0    0    0    0    1    0    0    4    0    0    0]
 [ 185    0    0    0    8    1    0    1  626    0    4    7]
 [  36    0    0    0    8    0    0    0    0    0    1    0]
 [ 225    0    0    0    4    0    0    0   36    0  247   10]
 [ 155    0    0    0    1    0    0    0   52    0    6   53]]


---

Question 3.1: Develop a method that trains a list of models based on the OvO stragety for multiclass classification using logistic regression. 

In [44]:
def trainOvO(x, y):
    """
    TODO: Train the multiclass classifier using OvO strategy. 
    """
    labels = sorted(y.unique())
    n_labels = len(labels)
    n_models = int(len(labels) * (len(labels) - 1) / 2)
    print("number of classes is {}".format(n_labels))

    models = []
    model_labels = []
    model_idx = 0
    for i in range(n_labels):
        for j in range(i+1, n_labels):
            label_i = labels[i]
            label_j = labels[j]
            print("Train Logistic Regression model to distinguish {} and {}".format(label_i, label_j))

            # update the label according to OvO strategy
            # Drop all rows with irrelevant data i.e. data not belonging to label_i and label_j
            drop_indices = y[ (y != label_i) & (y != label_j) ].index
            train_x = x.drop(drop_indices)
            train_y = y.drop(drop_indices)

            # construct the logistic regression instance
            lr = LogisticRegression(solver = 'liblinear')
            models.append(lr.fit(train_x,train_y.values.ravel()))
            model_labels.append(train_y)
        
    return models, model_labels

Question 3.2: Write a method that leverage the multiple models train for OvO, and outputs the majority class.

In [48]:
def predictOvO(models, labels, x):
    """
    TODO: Make predictions on multiclass problems using the OvO strategy. 
    """
    if models == None:
        sys.exit("The model has not been trained yet. Please call train() first. Exiting...")

    n_models = len(models)
    predictions = []

    #Create prediction
    for i in range(n_models):
        prediction = models[i].predict(x)
        predictions.append(prediction)

    predictions = np.transpose(predictions)
    # Transposing to convert 66 x 1437 output into a 1437 x 66 matrix so that max between each unique class can be queried for each test input sample
    predictions = pd.DataFrame(predictions)
    return predictions.mode(axis=1).iloc[:, 0].values

Question 3.3: Train OvO model on the Avila dataset

In [46]:
models, labels = trainOvO(X_train, y_train)

number of classes is 12
Train Logistic Regression model to distinguish A and B
Train Logistic Regression model to distinguish A and C
Train Logistic Regression model to distinguish A and D
Train Logistic Regression model to distinguish A and E
Train Logistic Regression model to distinguish A and F
Train Logistic Regression model to distinguish A and G
Train Logistic Regression model to distinguish A and H
Train Logistic Regression model to distinguish A and I
Train Logistic Regression model to distinguish A and W
Train Logistic Regression model to distinguish A and X
Train Logistic Regression model to distinguish A and Y
Train Logistic Regression model to distinguish B and C
Train Logistic Regression model to distinguish B and D
Train Logistic Regression model to distinguish B and E
Train Logistic Regression model to distinguish B and F
Train Logistic Regression model to distinguish B and G
Train Logistic Regression model to distinguish B and H
Train Logistic Regression model to distin

Question 3.4: Predict and evalutate the results of your model

In [51]:
te_z_ovo = predictOvO(models, labels, X_test)
te_z_ovo

10437

In [53]:
ovo_accuracy = accuracy_score(y_test, te_z_ovo)
ovo_confuction_matrix = confusion_matrix(y_test, te_z_ovo)
print("Accuracy of OvO classifier is {}.".format(ovo_accuracy))
print("Confusion matrix of OvO classifier: \n {}".format(ovo_confuction_matrix))

Accuracy of OvO classifier is 0.5720992622401073.
Confusion matrix of OvO classifier: 
 [[4043    0   24    4   41  120    2   21   27    0    2    2]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  64    0    0    0   15   13    0    7    4    0    0    0]
 [ 281    0    1    0   44   13    0   10    2    0    2    0]
 [ 637    0    0    0  327   49    0   32   10    0   37    3]
 [1698    0    0    2   14  207    0   20   16    0    1    4]
 [ 364    0    0    0   18   51    7    5    0    0    2    0]
 [ 280    0    0    0   38   45    0  145   10    0    1    1]
 [  48    0    2    0    1   13    1    8  723    4   15   17]
 [  33    0    0    0   10    0    0    0    0    0    2    0]
 [  47    0    0    0   53    5    0    1   33    7  343   33]
 [  29    0    0    0    7    3    0    3   30    0   24  171]]


Question 4.1: [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) within sklearn supports two approaches for solving multi-class problems: 'ovr', 'multinomial'. Try out both approaches, and evaluate compare the performance against what you developed in questions 2 and 3.

In [56]:
clf = LogisticRegression(multi_class='ovr').fit(X_train, y_train)
y_ovr = clf.predict(X_test)

ovr_accuracy = accuracy_score(y_test, y_ovr)
ovr_confusion_matrix = confusion_matrix(y_test, y_ovr)


print("Accuracy of OvA classifier is {}.".format(ovr_accuracy))
print("Confusion matrix of OvA classifier: \n {}".format(ovr_confusion_matrix))

Accuracy of OvO classifier is 0.5308996838171889.
Confusion matrix of OvO classifier: 
 [[4159    0    0    0   31   62    0    8   18    0    6    2]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  80    0    0    0    9   11    0    0    3    0    0    0]
 [ 327    0    0    0    5    6    0   14    0    0    1    0]
 [ 917    0    0    0   76   54    0   10    9    0   28    1]
 [1882    0    0    0    4   55    0    4   13    0    0    4]
 [ 413    0    0    0    4   23    0    4    0    0    3    0]
 [ 366    0    0    0   31   66    0   40   15    0    1    1]
 [  64    0    0    0    2    5    0    3  733    0   15   10]
 [  34    0    0    0    9    0    0    0    0    0    2    0]
 [  95    0    0    0   15    3    0    2   34    0  347   26]
 [  56    0    0    0    1    1    0    0   63    0   20  126]]


In [60]:
#class = multinomial
clf = LogisticRegression(max_iter=1000, multi_class='multinomial').fit(X_train, y_train)
y_multinomial = clf.predict(X_test)

multinomial_accuracy = accuracy_score(y_test, y_multinomial)
multinomial_confuction_matrix = confusion_matrix(y_test, y_multinomial)


print("Accuracy of multinomial classifier is {}.".format(multinomial_accuracy))
print("Confusion matrix of multinomial classifier: \n {}".format(multinomial_confuction_matrix))

Accuracy of multinomial classifier is 0.5615598352016863.
Confusion matrix of multinomial classifier: 
 [[4041    0    1    3   65  127    0   18   21    0    7    3]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  66    0    0    0   13   17    0    4    3    0    0    0]
 [ 288    0    0    0   30   17    0   14    1    0    3    0]
 [ 660    0    0    0  270   60    0   32   11    0   56    6]
 [1730    0    2    1   15  178    0   14   16    0    1    5]
 [ 391    0    0    0   17   21    0   14    0    0    4    0]
 [ 305    0    0    0   57   47    0   96   13    0    2    0]
 [  34    0    0    1    4   17    0    7  726    0   22   21]
 [  30    0    0    0   11    0    0    0    0    2    2    0]
 [  49    0    0    0   24    3    0    7   23    6  372   38]
 [  29    0    0    0    3    3    0    0   32    0   29  171]]


Question 4: Create a new text cell in your Notebook: Complete a 50-100 word summary (or short description of your thinking in applying this week's learning to the solution) of your experience in this assignment. Include: What was your incoming experience with this model, if any? what steps you took, what obstacles you encountered. how you link this exercise to real-world, machine learning problem-solving. (What steps were missing? What else do you need to learn?) This summary allows your instructor to know how you are doing and allot points for your effort in thinking and planning, and making connections to real-world work.

Incoming experience: No incoming experience apart from previous assignments Steps taken: This week's lesson was about doing multiclass classification using OvO and OvA. Implemented both methods on the data set to get a deeper understanding of the concepts Obstacles: This week's obstacle was understanding the starter code, why we need model labels for predict method (we don't need it?) and going throught the different errors that were thrown Link to real world: Helped me understand how both strategies work, and how they work when compared to other multiclass classification models Steps missing (with just this week's learning): I don't think there was anything missing