# Perceptron Learning in 2D

This notebook runs and visualizes Rosenblatt's original 1950's perception learning algorithm with 2-dimensional training data.

Later, in homework HW1 you will use a perceptron to classify images, and you will also use calculus to see how Rosenblatt's learning rule is related to gradient descent.

## Show setup code

The code here defines a visualization widget class called PerceptronVisualizingWidget that we will need.  It also creates a random set of 100 2-d data points `data`, divided into two classes indicated by the 100-dimensional vector `labels`.

In [64]:
%%capture
!pip install -U git+https://github.com/davidbau/baukit@main#egg=baukit

In [68]:
import torch, numpy
from copy import deepcopy
from baukit import show, Widget, PlotWidget, Range, Numberbox
from matplotlib import pyplot as plt

prng = numpy.random.RandomState(1)
data = torch.Tensor(prng.randn(100, 2))
labels = torch.Tensor(numpy.stack([(d[0] - 0.4 < -0.7 * d[1]) for d in data])) * 2 - 1

class PerceptronVisualizingWidget(Widget):
    def __init__(self, data=[], labels=[]):
        super().__init__()
        self.data = data
        self.labels = labels
        self.history = []
        self.plot = PlotWidget(self.visualize_net, nrows=1, ncols=3, figsize=(11,4)) # , bbox_inches='tight')
        scrubber = Range(min=0, max=0, value=self.plot.prop('index'))
        numbox = Numberbox(value=self.plot.prop('index'))
        self.content = [
            [
                [show.style(alignContent='center'), 'Iteration'],
                numbox,
                show.style(flex=20), scrubber
            ],
            self.plot
        ]

    def _repr_html_(self):
        return show.html(self.content)

    def add(self, net, x=None, y=None):
        with torch.no_grad():
            ok = (net(x).item() == y)
            self.history.append((deepcopy(net), x, ok, y))
        self.content[0][-1].max = len(self.history) - 1
        if len(self.history) == 1:
            self.plot.index = 0

    def visualize_net(self, fig, index=0):
        fig.subplots_adjust(0.02, 0.02, 0.98, 0.98)
        ax1, ax2, ax3 = fig.axes
        ax1.clear(); ax2.clear(); ax3.clear()
        if index >= len(self.history):
            return
        net, datum, ok, label = self.history[index]
        grid = torch.stack([
            torch.linspace(-3, 3, 200)[None, :].expand(200, 200),
            torch.linspace(3, -3, 200)[:, None].expand(200, 200),
        ])
        x, y = grid
        ax1.set_title('network output')
        score = net(grid.permute(1, 2, 0).reshape(-1, 2))
        ax1.imshow(score.reshape(200, 200).detach().cpu(), cmap='hot', extent=[-3,3,-3,3])
        ax2.imshow(score.reshape(200, 200).detach().cpu(), cmap='hot', extent=[-3,3,-3,3], alpha=0.2)

        ax2.set_title('training data')
        ax2.set_ylim(-3, 3)
        ax2.set_xlim(-3, 3)
        ax2.set_aspect(1.0)
        ax2.scatter([d[0] for d, l in zip(self.data, self.labels) if l > 0],
                    [d[1] for d, l in zip(self.data, self.labels) if l > 0])
        ax2.scatter([d[0] for d, l in zip(self.data, self.labels) if l <= 0],
                    [d[1] for d, l in zip(self.data, self.labels) if l <= 0])
        ax2.add_patch(plt.Circle(datum, 0.1, color='#FF0000' if not ok else '#00FF00', linewidth=3, fill=False))

        ax3.set_title('model weights')
        w = net.weight.cpu().detach()
        lim = max(5, w.abs().max() * 1.05)
        ax3.set_ylim(-lim, lim)
        ax3.set_xlim(-lim, lim)
        ax3.set_aspect(1.0)
        ax3.arrow(0, 0, w[0, 0], w[0, 1], width=0.02, head_width=0.2, color='purple', length_includes_head=True)
        d = label * datum
        if not ok:
            ax3.arrow(w[0, 0], w[0, 1], d[0], d[1], width=0.02, head_width=0.2, color='r', length_includes_head=True)



## Perceptron Algorithm Learning on Separable Data

The following code is a demo of Rosenblatt's Perception algorithm learning to classify linearly separable data.

To make it work, first define the perceptron:

1. First look at the Perceptron class.  It will define `__call__` which means that when you make a Perception object, it will be callable like a function.
2. But unlike a regular function, it will be *parameterized*.  To define the two kinds of parameters, copy this code into the `__init__` method:
```
def __init__(self):
    # It has two kinds of parameters: a weight for each input, and a bias
    self.weight = torch.Tensor([[2.0, 0.0]])
    self.bias = torch.Tensor([-2.0])
```
3. A perceptron takes a weighted sum of the inputs x, which we can write as `weight @ x`, with the appropriate matrix transposes; and then it applies a nonlinearity.  We will use the step function `torch.sign` as the nonlinearity.
Add the following to the `__call__` method:
```
return torch.sign(self.weight @ x.t() + self.bias)
```

In [70]:
# A perceptron is a parameterized function
class Perceptron():
    def __init__(self):
        # It has two kinds of parameters: a weight for each input, and a bias
        self.weight = None # Fill me in
        self.bias = None # Fill me in
    def __call__(self, x):
        # When it is called with a two-dimensional input, it does this.
        if x.dim() == 1: x = x.unsqueeze(0)
        return None # Fill me in


Now let us implement Rosenblatt's perception algorithm.

There are three main steps that Rosenblatt iterated forever.

1. Choose a random data point x from the data set, with label y (-1 or 1).
2. Run the network on x to make a prediction.
3. If the prediction is wrong, update the network by:
  * adding x to the network weight if the right label was 1
  * subtracting x from the network weight if the right label was -1.

In other words, add y*x to the weight.

In [None]:
net = Perceptron()

widget = PerceptronVisualizingWidget(data, labels)

# The perceptron learning algorithm.
for it in range(60):
    # Step 1: choose a random data point
    i = prng.randint(len(data))
    y = labels[i]
    x = data[i]
    pw.add(net, x, y)

    # Step 2: run the network to make a prediction
    pred = None # fill me in. How do we run the network on x?

    # Step 3: if the prediction is wrong, update the weights
    if pred.item() != y:
        # Rosenblatt's update rule
        net.weight += None # fill me in
        net.bias += y # Another detail

show(widget)


In [None]:
import time
for i in range(60):
    widget.plot.index = i
    time.sleep(0.5)

## Perceptron Algorithm Failing on non-Separable Data

Now we can try on harder data...

In [71]:
# labels = torch.Tensor(numpy.stack([(d[0].sign() == d[1].sign()) for d in data])) * 2 - 1
