# Module 6: Multi-class Support Vector Machine

In this session, we fit a multi-class linear SVM on **red wine** dataset
with the typical train/validate workflow.

SVM only supports binary classification by itself.
The multiclass support can be handled according to a **one-vs-one** or **one-vs-rest** scheme.
Click [here](http://scikit-learn.org/stable/modules/classes.html#module-sklearn.multiclass)
to read more about different types of multiclass support schemes.

Here we use the single class linear SVM from TensorFlow community contribution and construct
a **one-vs-one multi-class SVM** based on that.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

import os, sys
import itertools
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.preprocessing import scale
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

tf.logging.set_verbosity(tf.logging.ERROR)

## Load dataset

In [None]:
# Dataset location
DATASET = '/dsa/data/all_datasets/wine-quality/winequality-red.csv'
assert os.path.exists(DATASET)

# Load and shuffle
dataset = pd.read_csv(DATASET, sep=';').sample(frac = 1).reset_index(drop=True)

# Pull features and labels
selected_features = [1,6,9,10]
X = scale(np.array(dataset.iloc[:, selected_features]))
y = np.array(dataset.quality)

# Create training/validation split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25)

dataset.describe()

## Create multi-class SVM model

Create an array containing names of feature columns.

In [None]:
                                             # code 20 is the space character
feature_names = [dataset.columns[i].replace('\x20', '_') for i in selected_features]
print(feature_names)

Create feature columns, 
which are something conceptually similar to **TensorFlow placeholders**,
that takes data from training loop.

They must be fed training data during training.

In [None]:
feature_columns = [tf.contrib.layers.real_valued_column(i) for i in feature_names]

### Note: 
One-vs-one multi-class SVM is essentially _a collection of binary linear SVMs_,
each created for predicting a pair of classes against each other.

Therefore for each pair of classes, there needs to be an SVM.
In otherwords, given 4 classes {A,B,C,D}, there will need to be the following classifiers:
 * A vs B
 * A vs C
 * A vs D
 * B vs C
 * B vs D
 * C vs D

Note: For $N$ classes, there will be ${N \choose 2}$, "N choose 2", pair-wise classifiers.
  * See: https://en.wikipedia.org/wiki/Binomial_coefficient

Here we print out all classes with its distribution and all possible pair of classes.

In [None]:
class_labels = np.unique(y_train)
print('class distribution', {i:np.sum(y==i) for i in y_train})
class_pairs = [(i,j) for i,j in itertools.product(class_labels, class_labels) if j>i]
print('class pairs', class_pairs)

Now we traverse all pair of classes, and create an SVM dedicated to each pair.

In [None]:
classifiers = {
    pair: tf.contrib.learn.SVM('example_id', feature_columns=feature_columns, l2_regularization=1.0)
        for pair in class_pairs
}

## Training and preparation

Accordingly, we are going to need an `my_input_fn()` for each SVM. 
It's done by creating the following function that returns an `my_input_fn()` for any given pair.

In [None]:
def get_input_fn(pair):
    # subset out all relevant data to this pair of classes.
    sample = np.logical_or(y_train == pair[0], y_train == pair[1])
    X_subset = X_train[sample, :]
    y_subset = y_train[sample] == pair[1]
    
    # creating my_input_fn() that works on a subset of training data.
    def my_input_fn():
        columns = {
            feature_name: tf.constant(np.expand_dims(X_subset[:, i], 1))
                for i,feature_name in enumerate(feature_names)
        }
        columns['example_id'] = tf.constant([str(i+1) for i in range(len(X_subset))])
        labels = tf.constant(y_subset)
        return columns, labels
    return my_input_fn, len(y_subset)

Fit all SVM classifiers.

In [None]:
for pair in class_pairs:
    this_input_fn, sample_size = get_input_fn(pair)
    print('Fitting an SVM to classes', pair, 'with', sample_size, 'samples.')
    classifiers[pair].fit(input_fn = this_input_fn, steps=30)

## Evaluation

Below defines a set of functions that facilitates making predictions of this multi-class SVM.

In [None]:
def svm_pred_to_class(predictions, pair):
    """ Convert SVM prediction into class labels
           1. Take 'classes' attribute from each prediction
           2. Use the binary prediction as index to find out original label from pair
           
        Example:
            a prediction of 1 resulting from an SVM dedicated to classes (3, 4)
            will be translated into 4, which is the original class label instead
            of binary label 0 or 1.
    """
    return list(map(lambda i: pair[i['classes']], predictions))

def predict_fn():
    """ Prepare test data from X_test """
    return {
        feature_name: tf.constant(np.expand_dims(X_test[:, i], 1))
            for i,feature_name in enumerate(feature_names)
    }

def vote(labels):
    """ Aggregate prediction results from one-vs-one SVMs by counting votes per class """
    hist, bins = np.histogram(labels, class_labels)
    return bins[np.argmax(hist)]

def get_predictions(X_test): 
    """ Make predictions using all SVMs and aggregrate results. """
    
    # Make predictions with all SVMs and stack results in columns.
    #   This results in a matrix in shape (num_samples, num_class_pairs)
    predictions = np.column_stack([
        svm_pred_to_class(classifiers[pair].predict(input_fn = predict_fn), pair)
            for pair in class_pairs])
    print('predictions', predictions.shape)
    
    # Aggregate results along axis=1 into a final prediction for each sample.
    return np.array([vote(row) for row in predictions])


Now we make a prediction on the test dataset.

In [None]:
y_pred = get_predictions(X_test)

Make an evaluation.

In [None]:
print('accuracy', accuracy_score(y_test, y_pred))
plt.imshow(confusion_matrix(y_test, y_pred))

# Save your Notebook!