# Multiclass Perceptron and SVM

In this notebook, we'll try out the multiclass Perceptron and SVM on small data sets.

### Import

In [None]:
%matplotlib inline

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

matplotlib.rc('xtick', labelsize=14) 
matplotlib.rc('ytick', labelsize=14)

## Multiclass Perceptron Algorithm

The multiclass Perceptron algorithm is similar in spirit to our earlier binary Perceptron algorithm, except that now there is a linear function for each class.

If there are __`k`__ classes, (`0,1,...,k-1`). For __`d`-dimensional data__, the classifier will be parametrized by:
* __`w`__: this is a __`(kxd)` numpy array__ with one row for each class
* __`b`__: this is a __`k`-dimensional numpy array__ with one offset for each class

Thus the linear function for class `j` (where `j` lies in the range `0` to `k-1`) is given by `w[j,:], b[j]`.

In [None]:
def evaluate_classifier(w, b, x):
    """Prediction of classifier at x"""
    k = len(b)
    scores = np.zeros(k)
    for j in range(k):
        scores[j] = w[j, :] @ x + b[j]
    return np.argmax(scores)

### Train multiclass Perceptron

where
* `x`: n-by-d numpy array with n data points, each d-dimensional
* `y`: n-dimensional numpy array with the labels (in the range `0` to `k-1`)
* `k`: the number of classes
* `n_iters`: the training procedure will run through the data at most this many times (default: 100)
* `w,b`: parameters for the final linear classifier, as above
* `converged`: flag (True/False) indicating whether the algorithm converged within the prescribed number of iterations

__NOTE:__ If the data is not linearly separable, then the training procedure will not converge.

In [None]:
def train_multiclass_perceptron(x, y, k, n_iters=1000):
    
    n, d = x.shape
    w, b = np.zeros((k, d)), np.zeros(k)
    converged = 0
    np.random.seed(0)
    
    for itr in range(n_iters):
        for j in np.random.permutation(n):
            true_y, pred_y = int(y[j]), evaluate_classifier(w, b, x[j,:])
            if pred_y != true_y:
                w[true_y,:] += x[j,:]
                b[true_y] += 1.0
                w[pred_y,:] -= x[j,:]
                b[pred_y] -= 1.0
                converged = itr
            
    if converged < n_iters:
        print("Perceptron algorithm: iterations until convergence: ", converged)
    else:
        print("Perceptron algorithm: did not converge within the specified number of iterations")
    return w, b, converged < n_iters

### Visualise Multiclass Perceptron boundaries

where
* `x` and `y` are the two-dimensional data and their labels (in the range `0,...,k-1`)
* `pred_fn` is the classifier: it is a function that takes a data point and returns a label

In [None]:
def display_data_and_boundary(x, y, pred_fn, title='Title'):
    # Determine the x1- and x2- limits of the plot
    x1min = min(x[:,0]) - 1
    x1max = max(x[:,0]) + 1
    x2min = min(x[:,1]) - 1
    x2max = max(x[:,1]) + 1
    plt.xlim(x1min, x1max)
    plt.ylim(x2min, x2max)
    
    # Plot the data points
    labels = np.unique(y).astype('i')
    cols = ['ro', 'k^', 'b*','gx']
    for label in labels:
        plt.plot(x[(y==label), 0], x[(y==label), 1], cols[label%4], markersize=8)
        
    # Construct a grid of points at which to evaluate the classifier
    density = 0.05
    xx1, xx2 = np.meshgrid(np.arange(x1min, x1max+density, density), np.arange(x2min, x2max+density, density))
    grid = np.c_[xx1.ravel(), xx2.ravel()]
    
    # Use prediction function
    Z = np.array([pred_fn(pt) for pt in grid])
    
    # Show the classifier's boundary using a color plot
    Z = Z.reshape(xx1.shape)
#     plt.pcolormesh(xx1, xx2, Z, cmap=plt.cm.Pastel1, vmin=0, vmax=k)
    plt.contourf(xx1, xx2, Z, cmap=plt.cm.Pastel1, vmin=0, vmax=len(labels))
    plt.title(title)
    plt.show()

The following procedure, **run_multiclass_perceptron**, loads a labeled two-dimensional data set, learns a linear classifier using the Perceptron algorithm, and then displays the data as well as the boundary.

The data file is assumed to contain one data point per line, along with a label, like:
* `3 8 2` (meaning that point `x=(3,8)` has label `y=2`)

In [None]:
# !find ../../_data | grep -i data_

In [None]:
def get_data(datafile):
    """"""
    data = np.loadtxt(datafile)
    x, y = data[:, 0:2], data[:, 2]
    k = len(np.unique(y))
    return x, y, k

In [None]:
def run_multiclass_perceptron(datafile):
    """"""
    x, y, k = get_data(datafile)
    
    # Run the Perceptron algorithm for at most 1000 iterations
    w, b, converged = train_multiclass_perceptron(x, y, k, 1000)
    
    # Show the data and boundary
    pred_fn = lambda p: evaluate_classifier(w, b, p)
    display_data_and_boundary(x, y, pred_fn)

In [None]:
run_multiclass_perceptron('../../_data/data_3.txt')

In [None]:
run_multiclass_perceptron('../../_data/data_4.txt')

## 3. Experiments with multiclass SVM

Now let's see how multiclass SVM fares on these same data sets. We start with an analog of the **run_multiclass_perceptron** function. The key difference is that the SVM version, **run_multiclass_svm**, takes a second parameter: the regularization constant `C` in the convex program of the soft-margin SVM.

In [None]:
from sklearn.svm import SVC, LinearSVC

In [None]:
def run_multiclass_svm(datafile, C_value=1.0):
    
    x, y, k = get_data(datafile)
    clf = LinearSVC(loss='hinge', multi_class='crammer_singer', C=C_value).fit(x, y)
    pred_fn = lambda p: clf.predict(p.reshape(1,-1))   
    # Show the data and boundary
    display_data_and_boundary(x, y, pred_fn, 'SVM, C: {}'.format(C_value))

Let's run this on the two data sets `data_3.txt` and `data_4.txt` that we saw earlier. Try playing with the second parameter to see how the decision boundary changes. You should try values like `C = 0.01, 0.1, 1.0, 10.0, 100.0`.

In [None]:
for c in [0.01, 0.1, 1.0, 10.0, 100.0]:
    run_multiclass_svm('../../_data/data_3.txt', c)

In [None]:
for c in [0.01, 0.1, 1.0, 10.0, 100.0]:
    run_multiclass_svm('../../_data/data_4.txt', c)

<font color="magenta">For you to think about:</font> How would you summarize the effect of varying `C`?

## IRIS data set

This is four-dimensional data with three labels.  
We will pick two features, as a consequence the problem is not linearly separable.  
 - the Perceptron algorithm would never converge
 - the soft-margin SVM obtains a reasonable solution

In [None]:
from sklearn import datasets

In [None]:
iris = datasets.load_iris()
x, y = iris.data, iris.target

# Select two of the four features
x = x[:, [1, 3]]

### Train and predict

In [None]:
clf = LinearSVC(loss='hinge', multi_class='crammer_singer').fit(x,y)
pred_fn = lambda p: clf.predict(p.reshape(1, -1))

display_data_and_boundary(x, y, pred_fn)