# Perceptron learning

In [257]:
import numpy as np
from matplotlib import pyplot as plt

Perceptron learning is an iterative algorithm that converges to the appropriate weights for a single perceptron, given a learnable training set. If the training set is not learnable, the algorithm will not converge.

**Perceptron Learning Algorithm**

Given a dataset $X$ of $m$ observations $x \in \mathcal{R}^{1 \times n}$, an outcome vector $y \in \mathcal{R}^{m \times 1}$, we wish to find a weight vector $w \in \mathcal{R}^{n \times 1}$ such that $y_i = A(\mathbf{x}_i \cdot \mathbf{w})$ for each $0 \leq i < m$. In matrix form: $A(\mathbf{X} \mathbf{w}) = \mathbf{y}$. The last element of each observation vector $\mathbf{x}$ will be -1 to account for the bias term.

1. Initialize $ \mathbf{w} $ as a random vector.
2. **For each epoch**:
    - For each $ (\mathbf{x}_i, y_i)$ in the training set:
      - Compute $ \hat{y} = A(\mathbf{x}_i \cdot \mathbf{w}) $.
      - Update $ \mathbf{w} $ as:
        $$
        \mathbf{w} \leftarrow \mathbf{w} + (y_i - \hat{y}) \lambda \mathbf{x}_i
        $$
3. Compute accuracy:
   $$
   \text{accuracy} = 1 - \frac{\sum_i |f(\mathbf{x}_i) - A(\mathbf{x}_i \cdot \mathbf{w})|}{\text{len}(\text{training\_set})}
   $$
4. Return $ (\mathbf{w}, \text{accuracy}) $.

The learning rate $\lambda$ determines the rate of convergence. In practice $\lambda$ can be close to, but not greater than, 1.

We'll define the data set as a matrix $X$ and a vector $y$ using numpy. 

In [286]:
## The dataset corresponding to boolean AND. 

X = np.array([
    [0, 0, -1],
    [0, 1, -1],
    [1, 0, -1],
    [1, 1, -1]
])
y = np.array([[0, 0, 0, 1]])

In [294]:
y[0,i]

np.int64(1)

In [295]:
## The heaviside step function

def A(x):
    return np.heaviside(x,0)

In [296]:
w = np.random.random((3,1))

In [297]:
for j in range(100):
    i = j%4
    x = X[i:i+1]
    fx = y[0,i]
    y_hat = A(x @ w)
    w = w + r*(fx - y_hat)*x.T

for j in range(4):
    i = j%4
    x = X[i:i+1]
    fx = y[0,i]
    y_hat = A(x @ w)
    print(x, y_hat)

[[ 0  0 -1]] [[0.]]
[[ 0  1 -1]] [[0.]]
[[ 1  0 -1]] [[0.]]
[[ 1  1 -1]] [[1.]]


In [311]:
w = np.random.random((3,1))
all_w = []
r = 1
for epoch in range(200):
    for i in range(4):
        x = X[i:i+1]
        fx = y[0,i]
        y_hat = A(x @ w)
        w = w + r*(fx - y_hat)*x.T
        all_w += [w[0:2]]
    accuracy = 1 - np.sum(np.abs(A(X@w)  - y.T))/4
