# Task 2: Implement a multi-class perceptron algorithm


Implement (from scratch) a multi-class perceptron training algorithm ("Perceptron learning rule" from slide 34, second lecture) and use it for training a single layer perceptron with 10 nodes (one per digit), each node
having 256+1 inputs (inputs and bias) and 1 output. Train your network on the train set and evaluate on both the train and the test set, in the same way as you did in the previous task. As your algorithm is
non-deterministic (results depend on how you initialize weights), repeat your experiments a few times to get a feeling of the reliability of your accuracy estimates.
Try to make your code efficient. In particular, try to limit the number of loops, using matrix multiplication whenever possible. For example, append to your train and test data a column of ones that will represent
the bias. The weights of your network can be stored in a matrix W of size 257x10. Then the output of the network on all inputs is just a dot product of two matrices: T rain and W, where T rain denotes the matrix
of all input vectors (one per row), augmented with 1's (biases). To find the output node with the strongest activation use the numpy argmax() function. An efficient implementation of your algorithm shouldn't take
more than a few seconds to converge on the training set (yes, the training set consists of patterns that are linearly separable so the perceptron algorithm will converge).
How does the accuracy of this single-layer multi-class perceptron compare to the distance-based methods in task 1?

In [1]:
import numpy as np
import pandas as pd
import random
from sklearn.metrics import accuracy_score

In [2]:
# load train set
x_train = pd.read_csv("/content/train_in.csv", names=range(256))
y_train = pd.read_csv("/content/train_out.csv", names=['label'])

# load test set
x_test = pd.read_csv("/content/test_in.csv", names=range(256))
y_test = pd.read_csv("/content/test_out.csv", names=['label'])

x_train = x_train.values
y_train = y_train['label'].values
x_test = x_test.values
y_test = y_test['label'].values

In [3]:
def train(x, y, learning_rate=0.01, max_iters=1000):
    n_samples, n_features = x.shape
    classes = np.unique(y)
    n_classes = len(classes)

    # add a bias column and initialize weights
    x = np.hstack((x, np.ones((x.shape[0], 1))))
    #np.random.seed(0)
    #weights = np.random.rand(n_features + 1, n_classes)

    weights = np.zeros((n_features + 1, n_classes))

    for epoch in range(max_iters):
        misclassified = 0
        for idx, vector in enumerate(x):
            scores = np.dot(vector.reshape(1, -1), weights)
            y_hat = np.argmax(scores)

            if y_hat != y[idx]:
                for i in classes:
                    if scores[0, i] > scores[0, y[idx]]:
                        weights[:, i] -= learning_rate * vector
                weights[:, y[idx]] += learning_rate * vector
                misclassified += 1

        if misclassified == 0:
            print(f"The algorithm converged after {epoch} iterations.")
            break

    return weights

In [4]:
def predict(X, weights):
    n_samples = X.shape[0]
    X = np.hstack((X, np.ones((n_samples, 1))))
    scores = np.dot(X, weights)
    predicted_classes = np.argmax(scores, axis=1)
    return predicted_classes

In [5]:
# training
trained_weights = train(x_train, y_train, learning_rate=0.1, max_iters=100)

# prediction
y_pred_train = predict(x_train, trained_weights)
y_pred_test = predict(x_test, trained_weights)

accuracy_train = np.mean(y_pred_train == y_train) * 100
accuracy_test = np.mean(y_pred_test == y_test) * 100

print(f"Accuracy on the train set: {accuracy_train:.2f}%")
print(f"Accuracy on the test set: {accuracy_test:.2f}%")


The algorithm converged after 33 iterations.
Accuracy on the train set: 100.00%
Accuracy on the test set: 87.80%


In [6]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred_test)

In [7]:
cm

array([[215,   0,   2,   2,   3,   0,   0,   0,   1,   1],
       [  0, 114,   0,   0,   1,   0,   3,   1,   0,   2],
       [  3,   1,  85,   2,   3,   0,   1,   2,   4,   0],
       [  2,   0,   2,  63,   0,   2,   0,   2,   5,   3],
       [  5,   1,   3,   0,  67,   2,   2,   2,   1,   3],
       [  3,   0,   0,   6,   2,  39,   1,   1,   1,   2],
       [  1,   0,   2,   0,   2,   0,  85,   0,   0,   0],
       [  0,   0,   2,   0,   4,   0,   0,  55,   0,   3],
       [  3,   0,   2,   3,   3,   3,   0,   0,  74,   4],
       [  0,   1,   1,   0,   2,   0,   0,   1,   2,  81]])

With weights initialized to 0, it takes 33 iterations.

Whereas with randomly initialized values, it takes 41 iterations (seed 43)