# Perceptron applied to the Iris dataset

**Reading the dataset:** $\;$ we also check that the data matrix and labels have the right number of rows and columns

In [1]:
import numpy as np; from sklearn.datasets import fetch_olivetti_faces
iris = fetch_olivetti_faces(); X = iris.data.astype(np.float16);
y = iris.target.astype(np.uint).reshape(-1, 1);
print(X.shape, y.shape, "\n", np.hstack([X, y])[:5, :])

(400, 4096) (400, 1) 
 [[0.30981445 0.36767578 0.41723633 ... 0.16113281 0.15698242 0.        ]
 [0.45458984 0.47119141 0.51220703 ... 0.15283203 0.15283203 0.        ]
 [0.31811523 0.40087891 0.49169922 ... 0.14880371 0.15283203 0.        ]
 [0.19836426 0.19421387 0.19421387 ... 0.75195312 0.73974609 0.        ]
 [0.5        0.54541016 0.58251953 ... 0.17358398 0.17358398 0.        ]]


**Dataset partition:** $\;$ We create a split of the Iris dataset with $20\%$ of data for test and the rest for training, previously shuffling the data according to a given seed provided by a random number generator. Here, as in all code that includes randomness (which requires generating random numbers), it is convenient to fix said seed to be able to reproduce experiments with accuracy.

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, random_state=23)
print(X_train.shape, X_test.shape)

(320, 4096) (80, 4096)


**Perceptron implementation:** $\;$ returns weights in homogeneous notation, $\mathbf{W}\in\mathbb{R}^{(1+D)\times C};\;$ also the number of errors and iterations executed

In [3]:
def perceptron(X, y, b=0.1, a=1.0, K=200):
    N, D = X.shape; Y = np.unique(y); C = Y.size; W = np.zeros((1+D, C))
    for k in range(1, K+1):
        E = 0
        for n in range(N):
            xn = np.array([1, *X[n, :]])
            cn = np.squeeze(np.where(Y==y[n]))
            gn = W[:,cn].T @ xn; err = False
            for c in np.arange(C):
                if c != cn and W[:,c].T @ xn + b >= gn:
                    W[:, c] = W[:, c] - a*xn; err = True
            if err:
                W[:, cn] = W[:, cn] + a*xn; E = E + 1
        if E == 0:
            break;
    return W, E, k

**Learning a (linear) classifier with Perceptron:** $\;$ Perceptron minimizes the number of training errors (with margin $b$)
$$\mathbf{W}^*=\operatorname*{argmin}_{\mathbf{W}=(\boldsymbol{w}_1,\dotsc,\boldsymbol{w}_C)}\sum_n\;\mathbb{ I}\biggl(\max_{c\neq y_n}\;\boldsymbol{w}_c^t\boldsymbol{x}_n+b \;>\; \boldsymbol{w}_{y_n}^t\boldsymbol{ x}_n\biggr)$$

In [4]:
W, E, k = perceptron(X_train, y_train)
print("Number of iterations executed: ", k)
print("Number of training errors: ", E)
print("Weight vectors of the classes (in columns and with homogeneous notation):\n", W);

Number of iterations executed:  42
Number of training errors:  0
Weight vectors of the classes (in columns and with homogeneous notation):
 [[-587.         -581.         -581.         ... -590.
  -557.         -585.        ]
 [-247.08929443 -233.21960449 -244.16125488 ... -266.64141846
  -244.67816162 -239.23931885]
 [-265.13061523 -251.18511963 -267.29406738 ... -281.07702637
  -270.37103271 -256.94555664]
 ...
 [-211.28652954 -203.89273071 -199.72198486 ... -196.6746521
  -201.48944092 -207.01318359]
 [-205.98419189 -202.98080444 -198.18414307 ... -194.52960205
  -195.22445679 -199.60604858]
 [-206.72164917 -203.47958374 -198.56820679 ... -189.42913818
  -192.849823   -194.80505371]]


**Calculation of test error rate:**

In [5]:
X_testh = np.hstack([np.ones((len(X_test), 1)), X_test])
y_test_pred  = np.argmax(X_testh @ W, axis=1).reshape(-1, 1)
err_test = np.count_nonzero(y_test_pred != y_test) / len(X_test)
print(f"Error rate on test: {err_test:.1%}")

Error rate on test: 7.5%


**Margin adjustment:** $\;$ experiment to learn a value of $b$

In [6]:
for b in (.0, .01, .1, 10, 100):
    W, E, k = perceptron(X_train, y_train, b=b, K=1000)
    print(b, E, k)

0.0 0 48
0.01 0 58
0.1 0 42
10 0 45
100 0 72
