# Multiclass Classification

While binary classifiers are used to distinguish between two classes (e.g. detect if a transaction is a fraudulent one, classify an email into either spam or non-spam and etc.), multiclass classifiers distinguish between more than two classes.

There are various ways that we can use to perform multiclass classification by leveraging any binary classifiers. In this exercise, you will implement two such strategies for multiclass classification: _One-versus-all_ strategy and _One-versus-one_ strategy.

- **One-versus-all (OvA)**: In this strategy, we train a single binary classifier per class, with the samples of that class as positive samples and all other samples as negatives. During inference, we get the prediction from each classifier and select the class with the highest score. This strategy is also called the one-versus-the-rest strtegey.

- **One-versus-one (OvO)**: In this strategy, we train a binary classifier for every pair of classes. If there are N classes in the problem, you need to train N * (N-1) / 2 classifiers. During inference, we have to run through all N * (N-1) / 2 classifiers and ses which class wins the most votes. The main advantage of OvO strategy is that each binary classifier only needs to be train on the part of the training dataset for the two classes that it needs to separate.

In [347]:
# import packages
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn import datasets
from sklearn.linear_model import LogisticRegression

# make this notebook's output stable across runs
np.random.seed(0)

## Avila Dataset

In this lab assignment, we use the [Avila](https://archive.ics.uci.edu/ml/datasets/Avila) data set has been extracted from 800 images of the the "Avila Bible", a giant Latin copy of the whole Bible produced during the XII century between Italy and Spain.  
The palaeographic analysis of the  manuscript has  individuated the presence of 12 copyists. The pages written by each copyist are not equally numerous.
Each pattern contains 10 features and corresponds to a group of 4 consecutive rows.

The prediction task consists in associating each pattern to one of the 12 copyists (labeled as: A, B, C, D, E, F, G, H, I, W, X, Y).
The data have has been normalized, by using the Z-normalization method, and divided in two data sets: a training set containing 10430 samples, and a test set  containing the 10437 samples.


In [348]:
# Load train and test data from CSV files.
train = pd.read_csv("avila-tr.txt", header=None)
test = pd.read_csv("avila-ts.txt", header=None)

# Add some column names because this is annoying to work with
train.columns = ["F1","F2","F3","F4","F5","F6","F7","F8","F9","F10","Class"]
test.columns = ["F1","F2","F3","F4","F5","F6","F7","F8","F9","F10","Class"]

x_train = train.iloc[:,:-1]
x_train.columns = ["F1","F2","F3","F4","F5","F6","F7","F8","F9","F10"]
y_train = train.iloc[:,-1]
y_train.columns = ["Class"]

x_test = test.iloc[:,:-1]
x_test.columns = ["F1","F2","F3","F4","F5","F6","F7","F8","F9","F10"]
y_test = test.iloc[:,-1]
y_test.columns = ["Class"]


In [349]:
# Output the number of images in each class in the train and test datasets.

count_train_dict = train.iloc[:,-1].value_counts().to_dict()
count_test_dict = test.iloc[:,-1].value_counts().to_dict()

print(count_train_dict)
print(count_test_dict)

# Oddly these seem very similar


{'A': 4286, 'F': 1961, 'E': 1095, 'I': 831, 'X': 522, 'H': 519, 'G': 446, 'D': 352, 'Y': 266, 'C': 103, 'W': 44, 'B': 5}
{'A': 4286, 'F': 1962, 'E': 1095, 'I': 832, 'X': 522, 'H': 520, 'G': 447, 'D': 353, 'Y': 267, 'C': 103, 'W': 45, 'B': 5}


Question 1.1: Check for missing Data

In [350]:
# Check for missing values
train.isnull().sum()

F1       0
F2       0
F3       0
F4       0
F5       0
F6       0
F7       0
F8       0
F9       0
F10      0
Class    0
dtype: int64

In [351]:
test.isnull().sum()

F1       0
F2       0
F3       0
F4       0
F5       0
F6       0
F7       0
F8       0
F9       0
F10      0
Class    0
dtype: int64

Question 1.2: Apply Z-normalization to data

In [352]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()

scaler.fit(x_test) # We do not normalize the test dataset.

print(scaler.mean_) # Check out some values!
print(scaler.transform(x_train)) # Do the transformation.



[-0.00085188  0.00339569  0.00518116  0.00261584 -0.00636515 -0.00888578
  0.00235002 -0.01025932 -0.00869081 -0.00067845]
[[ 0.26467533 -0.17694062  0.31822302 ...  0.94349467  0.25955453
   0.16117096]
 [ 0.13003816  0.9080088  -3.24039403 ...  0.64880887  0.29069842
   0.51996752]
 [-0.11475734  0.06963832  0.06378071 ... -0.8811636  -0.11417814
   0.58780249]
 ...
 [ 0.22795656 -0.00433484  0.16770741 ... -1.08214048 -0.58134347
   0.58498544]
 [-0.2983542   0.36553409  0.28597032 ...  0.5148825   0.6332862
   0.72480794]
 [-0.10251741 -1.08928531  0.38631372 ... -1.33210553  1.00701886
  -0.55433128]]


Question 2.1: Write a method to train multiple logistic regression models performing One vs All (OvA) classification. The method allows you to pass in training features, and target. The method returns a list of models and their associated labels.
Within the method:
- Determine the list of classes
- Create a place to store all the models
- For each class, train a model with the target variable set to 1 and 0 for all other classes
- Return the list of models trained and associated labels.

In [353]:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import VotingClassifier
from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, precision_score, precision_recall_curve, classification_report
from sklearn.multiclass import OneVsRestClassifier
from sklearn.multiclass import OneVsOneClassifier
from sklearn import preprocessing
from sklearn import datasets

In [354]:
def trainOvA(x, y, multiclass='auto'):
    """
    Train the multiclass classifier using OvA strategy.
    x = x_train
    y = y_train (the Class)
    """
    labels = sorted(y.unique()) # list of target classes sorted by ASCII value
    n_labels = len(labels)
    print("number of classes is {}".format(n_labels))

    #Create model
    # This is probably better as a dict

    models = {}
    for i in range(len(labels)):
      label = labels[i]
      print("Train Logistic Regression model for class {}".format(label))
      # update the label according to OvA strategy
      # Strategy: zero out all the other labels
      df = train
      df['Target'] = df['Class'].apply(lambda x: 1 if x == label else 0) # lambda to set new column "Target' to 1 if it's label/classifier or not
      #df = df.drop('Class') # no longer needed
      df = df.drop(columns='Class')

      # define model based on training data
      X = df.iloc[:,:-1] # ignore class column
      Y = df.iloc[:,-1] # New column 'Target'
      model = LogisticRegression(multi_class=multiclass, random_state=42, max_iter=10000)

      # Train model
      models[label] = model.fit(X, Y)
    return models # we now return a dictionary instead with the label for each model

Question 2.2: Write a method that leverage the multiple models train for OvA, and outputs the majority class.

In [355]:
def predictOvA(models, x):
    #x = x_test
    #labels, x): We no longer use labels because we output a dictionary where the key is the label and model is the value
    """
    Make predictions on multiclass problems using the OvA strategy.
    """
    if models == None:
        sys.exit("The model has not been trained yet. Please call train() first. Exiting...")

    #Create prediction
    # I would imagine we want an ensemble model here rather than just doing a predict on each model, correct?
    #  ensemble = VotingClassifier(estimators=models, voting='hard')
    #predictions_list = []
    #for key, value in models.items():
    #   predictions_list.append(value.predict(x))
    #predictions = pd.DataFrame(predictions_list)
    #return predictions.idxmax(axis=1).values

    model_list = list()
    for key, value in models.items():
        model_list.append((key, value))

    ensemble = VotingClassifier(estimators = model_list, voting='hard')
    ensemble.fit(x_train, y_train)
    predictions = ensemble.predict(x_test)
    return predictions

Question 2.3: Train OvA model on the Avila dataset

In [356]:
models_dict = trainOvA(x_train, y_train)


number of classes is 12
Train Logistic Regression model for class A
Train Logistic Regression model for class B
Train Logistic Regression model for class C
Train Logistic Regression model for class D
Train Logistic Regression model for class E
Train Logistic Regression model for class F
Train Logistic Regression model for class G
Train Logistic Regression model for class H
Train Logistic Regression model for class I
Train Logistic Regression model for class W
Train Logistic Regression model for class X
Train Logistic Regression model for class Y


Question 2.4: Predict and evalutate the results of your model

# Note on Concept

I think we are doing each prediction normally. Wouldn't a better concept be to reduce the error rate by using an ensemble of models that perform majority voting? This won't count for confidence or probability of each prediction...

In [357]:
te_z_ova = predictOvA(models_dict, x_test)
print(te_z_ova)

['W' 'X' 'I' ... 'A' 'A' 'A']


In [358]:
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

ova_accuracy = accuracy_score(y_test, te_z_ova)
ova_confusion_matrix = confusion_matrix(y_test, te_z_ova)

print("Accuracy of OvA classifier is {}.".format(ova_accuracy))
print("Confusion matrix of OvA classifier: \n {}".format(ova_confusion_matrix))

Accuracy of OvA classifier is 0.5613682092555332.
Confusion matrix of OvA classifier: 
 [[4040    0    1    3   63  127    0   18   23    0    8    3]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  66    0    0    0   13   17    0    4    3    0    0    0]
 [ 288    0    0    0   31   16    0   15    1    0    2    0]
 [ 660    0    0    0  271   58    0   30   14    0   57    5]
 [1729    0    2    1   15  179    0   14   16    0    1    5]
 [ 391    0    0    0   18   21    0   14    0    0    3    0]
 [ 305    0    0    0   58   47    0   96   12    0    1    1]
 [  34    0    0    1    4   16    0    8  725    0   23   21]
 [  30    0    0    0   11    0    0    0    0    2    2    0]
 [  48    0    0    0   26    3    0    7   23    7  370   38]
 [  28    0    0    0    3    3    0    0   33    0   29  171]]


---

Question 3.1: Develop a method that trains a list of models based on the OvO
stragety for multiclass classification using logistic regression.

In [359]:
# Why not just use this?
#from sklearn.multiclass import OneVsOneClassifier

def trainOvO(x, y):
    """
    Train the multiclass classifier using OvO strategy.
    x = x_train
    y = y_train (the Class, useful for just the label count)
    """
    labels = sorted(y.unique())
    n_labels = len(labels)
    n_models = int(len(labels) * (len(labels) - 1) / 2)
    print("number of classes is {}".format(n_labels))


    models = [] # use dictionary instead
    '''
    model_idx = 0
    for i in range(n_labels):
        for j in range(i+1, n_labels):
          label_i = labels[i]
          label_j = labels[j]
          print("Train Logistic Regression model to distinguish {} and {}".format(label_i, label_j))

          # update the label according to OvA strategy
          # the labels need to be floats to work
          df = x
          #print(df.head())
          df = df.drop(columns='Target')
          train_x = df.loc[df['Class'] == label_i]
          #print(train_x.head())
          train_x['Class'] = train_x['Class'].apply(lambda x: str(i) if x == label_i else '0')
          train_x['Class'] = pd.to_numeric(train_x['Class'])
          print(train_x.head())

          train_y = df.loc[df['Class'] == label_j] # is this the one we zero out?
          #print(train_y.head())
          train_y['Class'] = train_y['Class'].apply(lambda x: str(j) if x == label_j else '0')
          train_y['Class'] = pd.to_numeric(train_y['Class'])
          print(train_y.head())


          print(train_y.shape)
          print(train_x.shape)
          # construct the logistic regression instance
          lr = LogisticRegression(solver = 'liblinear')

          #OvO_clf = OneVsOneClassifier(LogisticRegression())
          #OvO_clf.fit(train_x, train_y)
          #don't forget to fit
          #train_y = np.reshape(train_y, (-1,1)) # It complains about this, we'd need to truncate to fit the same number of samples on each class based on the minimum # of rows encountered. This is tedious.
          lr.fit(train_x, train_y)

          print("Adding model...{}+{}".format(label_i, label_j))
          models['{}+{}'.format(label_i, label_j)] = lr
          '''
    ovr = OneVsRestClassifier(LogisticRegression())
    ovr.fit(x_train, y_train)
    models.append(ovr)
    return models

One pont of confusion here is that there isn't any indication of what to convert to 1 and 0. Do we convert the second/compared class to 0?

Question 3.2: Write a method that leverage the multiple models train for OvO, and outputs the majority class.

In [360]:
def predictOvO(models, x): #, labels, x):
    """
    Make predictions on multiclass problems using the OvO strategy.
    """
    if models == None:
        sys.exit("The model has not been trained yet. Please call train() first. Exiting...")


    predictions = []
    #for i in range(len(models))
      #predictions.append(models.predict(x_test))

    predictions = pd.DataFrame(models[0].predict(x_test)) # We only have the OVO model

    return predictions.mode(axis=1).iloc[:, 0].values

Question 3.3: Train OvO model on the Avila dataset

In [361]:
models_ovo_dict = trainOvO(train, y_train)

number of classes is 12


Question 3.4: Predict and evalutate the results of your model

In [362]:
te_z_ovo = predictOvO(models_ovo_dict, x_test)

In [363]:

ovo_accuracy = accuracy_score(y_test, te_z_ovo)
ovo_confusion_matrix = confusion_matrix(y_test, te_z_ovo)


print("Accuracy of OvO classifier is {}.".format(ovo_accuracy))
print("Confusion matrix of OvO classifier: \n {}".format(ovo_confusion_matrix))

Accuracy of OvO classifier is 0.5309954967902654.
Confusion matrix of OvO classifier: 
 [[4159    0    0    0   31   62    0    8   18    0    6    2]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  80    0    0    0    9   11    0    0    3    0    0    0]
 [ 327    0    0    0    5    6    0   14    0    0    1    0]
 [ 917    0    0    0   75   54    0   10   10    0   28    1]
 [1881    0    0    0    4   55    0    4   14    0    0    4]
 [ 413    0    0    0    4   23    0    4    0    0    3    0]
 [ 366    0    0    0   31   66    0   40   15    0    1    1]
 [  64    0    0    0    2    5    0    3  733    0   15   10]
 [  34    0    0    0    9    0    0    0    0    0    2    0]
 [  93    0    0    0   15    3    0    2   34    0  349   26]
 [  56    0    0    0    1    1    0    0   63    0   20  126]]


Question 4.1: [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) within sklearn supports two approaches for solving multi-class problems: 'ovr', 'multinomial'. Try out both approaches, and evaluate compare the performance agains what you developed in questions 2 and 3.

In [364]:
clf = trainOvA(x_train, y_train, multiclass='ovr')
y_ovr = te_z_ova = predictOvA(clf, x_test)


ovr_accuracy = accuracy_score(y_test, y_ovr)
ovr_confusion_matrix = confusion_matrix(y_test, y_ovr)


print("Accuracy of OvO classifier is {}.".format(ovr_accuracy))
print("Confusion matrix of OvO classifier: \n {}".format(ovr_confusion_matrix))

number of classes is 12
Train Logistic Regression model for class A
Train Logistic Regression model for class B
Train Logistic Regression model for class C
Train Logistic Regression model for class D
Train Logistic Regression model for class E
Train Logistic Regression model for class F
Train Logistic Regression model for class G
Train Logistic Regression model for class H
Train Logistic Regression model for class I
Train Logistic Regression model for class W
Train Logistic Regression model for class X
Train Logistic Regression model for class Y
Accuracy of OvO classifier is 0.5309954967902654.
Confusion matrix of OvO classifier: 
 [[4159    0    0    0   31   62    0    8   18    0    6    2]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  80    0    0    0    9   11    0    0    3    0    0    0]
 [ 327    0    0    0    5    6    0   14    0    0    1    0]
 [ 917    0    0    0   75   54    0   10   10    0   28    1]
 [1881    0    0    0    4   55    0    4   1

In [365]:
#class = multinomial
clf = trainOvA(x_train, y_train, multiclass='multinomial')
y_multinomial = predictOvA(clf, x_test)

multinomial_accuracy = accuracy_score(y_test, y_multinomial)
multinomial_confusion_matrix = confusion_matrix(y_test, y_multinomial)


print("Accuracy of OvO classifier is {}.".format(multinomial_accuracy))
print("Confusion matrix of OvO classifier: \n {}".format(multinomial_confusion_matrix))

number of classes is 12
Train Logistic Regression model for class A
Train Logistic Regression model for class B
Train Logistic Regression model for class C
Train Logistic Regression model for class D
Train Logistic Regression model for class E
Train Logistic Regression model for class F
Train Logistic Regression model for class G
Train Logistic Regression model for class H
Train Logistic Regression model for class I
Train Logistic Regression model for class W
Train Logistic Regression model for class X
Train Logistic Regression model for class Y
Accuracy of OvO classifier is 0.5613682092555332.
Confusion matrix of OvO classifier: 
 [[4040    0    1    3   63  127    0   18   23    0    8    3]
 [   0    5    0    0    0    0    0    0    0    0    0    0]
 [  66    0    0    0   13   17    0    4    3    0    0    0]
 [ 288    0    0    0   31   16    0   15    1    0    2    0]
 [ 660    0    0    0  271   58    0   30   14    0   57    5]
 [1729    0    2    1   15  179    0   14   1

# Analysis

Using OvA/OvR is much easier than OvO. The results are somewhat similar in this regard, that the OvA with 'auto' is basically the same as the 'ovr', but the multinomial looks closer to the OvO results in terms of accuracy.

Question 4: Create a new text cell in your Notebook: Complete a 50-100 word summary (or short description of your thinking in applying this week's learning to the solution) of your experience in this assignment. Include: What was your incoming experience with this model, if any? what steps you took, what obstacles you encountered. how you link this exercise to real-world, machine learning problem-solving. (What steps were missing? What else do you need to learn?) This summary allows your instructor to know how you are doing and allot points for your effort in thinking and planning, and making connections to real-world work.

# Q4

1. I have no experience coming into these models prior to the class. It seems like OvA/OvR is the way to go as a default option.
2. I took some pretty standard steps and decided to go a bit further with adding columns to the data for easier parsing. This might've been missing, but it made working on the lab much easier.
3. I ran into some serious problems with the OvO methodology, after doing a lot of parsing on the different combinations and converting the classes to floats, I hit a wall where the number of columns matched but the number of rows did not between the compared classes, which makes sense given that in our original dataset we have different counts. I could've split it down and just used the minimum number of rows, but I wasn't sure what the best way to handle this was, as we'd be using the same few rows for the next iteration unless we wanted to remove them from the temporary dataset. Just seemed like more work than it was worth given that there's already an implementation of OneVsOneClassifier, which I opted for.