# Introduction to Neural Networks 

## The Perceptron


To get an intuitive idea about Neural Networks, we will code an elementary perceptron. In this example we will illustrate some of the concepts you have just seen, build a small perceptron and make a link between Perceptron and linear classifier.

### Generating some data

Before working with the MNIST dataset, you'll first test your perceptron implementation on a "toy" dataset with just a few data points. This allows you to test your implementations with data you can easily inspect and visualise without getting lost in the complexities of the dataset itself.


Start by loading two basic libraries: `matplotlib` and `numpy`

In [None]:
# Load the libraries ...
import matplotlib.pyplot as plt
import numpy as np



In [None]:
# Also, tell jupyter to show plots inside the notebook with a magic command
# hint: http://ipython.readthedocs.io/en/stable/interactive/magics.html#magic-matplotlib
%matplotlib inline


Then let us generate some points in a 2D space that will form our dataset (you can add points later if you'd like)

In [None]:
crosses = np.array(
    [[0.5, 1.0], [1.0, 1.5], [1.5, 1.5], [2.0, 1.2], [3.0, 1.7], [1.5, 1.1], [2.1, 1.7]]
)
circles = np.array(
    [[3.0, 0.5], [4.0, 1.0], [5.0, 0.7], [4.0, 0.2], [5.1, 0.3], [4.2, 0.7]]
)

### Visualising the data

Using `matploblib`, you can display the crosses as crosses (use `marker='x'`) and the circles as circles (use `marker='o'`). You will need to specify that you don't want a line using `linestyle='none'`. You can observe that the points are very easily separable. 

In [None]:
# add your code here to visualise the points
plt.figure()
# you could use plt.scatter in a similar fashion
plt.plot(crosses[:, 0], crosses[:, 1], marker="x", linestyle="none", label="cross")
plt.plot(circles[:, 0], circles[:, 1], marker="o", linestyle="none", label="circle")
plt.legend()
plt.ylim((0, 2))
plt.xlim((0, 6))
plt.show()


### Computing the output of a Perceptron


Let us consider the problem of building a classifier that for a given **new** point will return whether it belongs to the crosses (class 1) or circles (class 0). So for example it would take `(2, 1.5)` and return `1`. 

Define a function `out_perceptron` which takes a 2d vector `x`, a 2d weight vector `w` and a bias `b` and returns the output following the step rule:

$$
\text{output} = \left\{\begin{align} 1\,\, &\text{if}\,\, \langle x, w\rangle -b \, >\,0 \\ 0\,\, &\text{otherwise}\end{align}\right.
$$

In [None]:
# add your code here...
def out_perceptron(x, w, b):
    innerProd = np.dot(x, w)
    output = 0
    if innerProd > b:
        output = 1
    return output




You can then enrich the function so that it can take a **sequence of inputs** (in the form of a matrix where each line of the matrix is one input vector) and return the corresponding **sequence of outputs**. 

One way of doing this is to loop over the rows of `X` and for each of them, use the function `out_perceptron` that you just wrote. Store the results in an array `outputs` and return that. Call that function `multi_out_perceptron`/

Once you have that, you can try optimising the function by using a matrix-vector product; call the resulting function `multi_out_perceptron_2` (and make sure it leads to the same results!)

In [None]:
# add your code here to implement multi_out_perceptron
def multi_out_perceptron(X, w, b):
    n_instances = X.shape[0]
    outputs = np.zeros(n_instances)
    for i in range(0, n_instances):
        outputs[i] = out_perceptron(X[i, :], w, b)
    return outputs




# (bonus) add your code here to implement multi_out_perceptron_2
def multi_out_perceptron_2(X, w, b):
    return (np.dot(X, w) > b).astype(float)



## = checkpoint 1 =

here, you should copy-paste the following code. If it returns `True` you're good to go on.

```python
np.random.seed(1234)
X = np.random.randn(10, 5)
w = np.random.randn(5)
b = np.random.randn()
np.all(multi_out_perceptron_2(X, w, b) == np.array([ 1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.]))
```


In [None]:

np.random.seed(1234)
X = np.random.randn(10, 5)
w = np.random.randn(5)
b = np.random.randn()
np.all(multi_out_perceptron_2(X, w, b) == np.array([ 1.,  1.,  0.,  0.,  0.,  1.,  0.,  0.,  1.,  0.]))



### Trying different weights and biases

You now have a method that can compute the outputs predicted by an **untrained** perceptron. Can you try picking different weights and biases and see how well you can classify the crosses and circles? 

**Note**: to join the crosses and circles into one `instances` matrix, you can use `np.concatenate((crosses, circles), axis=0)`.

You can maybe start with `w=[1, 1]` and `b=1` and output the result of `multi_out_perceptron`. What is your analysis?

In [None]:
# your code here...
test_w1 = [1.0, 1.0]
test_b1 = 1.0
instances = np.concatenate((crosses, circles), axis=0)
print(multi_out_perceptron(instances, test_w1, test_b1))


With the suggested weights and biases (`([1, 1],1)`), you should see something like 

> `[ 1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.  1.]` 

which is clearly not great! Now try with `w=[-0.5, 1]` and `b=-0.2`, what do you observe? 

In [None]:
# your code here...
test_w2 = [-0.5, 1.0]
test_b2 = -0.2
print(multi_out_perceptron(instances, test_w2, test_b2))


### How did we get there?

This is much better (100% correct on the training data). 
To obtain these values, we found a **separating hyperplane** (here a line) between the points. 
The equation of the line is 

$ y = 0.5x-0.2 $


**Quiz**
- **Can you explain why this line corresponds to the weights and bias we used?**
- **Is this separating line unique? does it matter?**

### Illustrating the output of the Perceptron and the separating line

Copy-paste your code to visualise the crosses and circles above and overlay the separating line in red. 

Can you modify the parameters of the line a little bit and still find a separating line that "works"? 

In [None]:
# your code here..
xx = np.linspace(0, 6)
yy1 = 0.5 * xx - 0.2
yy2 = 0.4 * xx - 0.3

plt.figure()
plt.plot(xx, yy1, color="red", label="$0.5x - 0.2$")
plt.plot(xx, yy2, color="orange", label="$0.4x - 0.3$")
plt.plot(crosses[:, 0], crosses[:, 1], marker="x", linestyle="none")
plt.plot(circles[:, 0], circles[:, 1], marker="o", linestyle="none")
plt.ylim((0, 2))
plt.xlim((0, 6))
plt.legend()
plt.show()


### Testing a few new points

Can you add the following `test_points` on the plot and discuss how they would be classified? 

In [None]:
test_points = np.array([[1, 0.5], [5, 1.5], [3, 1.1]])
test_classes = [1, 0, 1]
symbol_map = {1: "x", 0: "o"}

In [None]:
# your code here to visualise the situation
plt.figure()
plt.plot(xx, yy1, color="red", label="$0.5x - 0.2$")
plt.plot(xx, yy2, color="orange", label="$0.4x - 0.3$")
plt.plot(crosses[:, 0], crosses[:, 1], marker="x", linestyle="none")
plt.plot(circles[:, 0], circles[:, 1], marker="o", linestyle="none")
plt.ylim((0, 2))
plt.xlim((0, 6))
plt.legend()

# the points
for (x, y), s in zip(test_points, test_classes):
    marker = symbol_map[s]
    plt.plot(x, y, marker=marker, color="black", markersize=10)

plt.show()


# Loss Function

We've defined our model but how do we pick the right weights? Simply we pick the best weights to fit our training data.

To measure this "fit" we need to define a loss function.

Implement a loss function called `loss_function(predictions, test_classes)`, where `predictions` are the model outputs from the test_points (`test_classes` is defined above). The function should return a loss value, measuring how well the model predictions match the test_classes. This value should be lower if the predictions are more correct and higher if less correct.

In [None]:
predictions = multi_out_perceptron(test_points, [-0.5, 1.0], -0.2)

# def loss_function(predictions, test_classes):
def loss_function(predictions, test_classes):
    # mean squared error
    return np.mean((predictions - test_classes) ** 2)



## = checkpoint 2 =

here, you should copy-paste the following code. If it returns `True` you're good to go on.

```python
predictions_a = multi_out_perceptron(test_points, [-0.5, 1.0], -0.2)
loss_a = loss_function(predictions_a, test_classes)
predictions_b = multi_out_perceptron(test_points, [-0.4, 1.0], -0.3)
loss_b = loss_function(predictions_b, test_classes)
loss_b < loss_a
```

Don't worry if this is tricky - we will cover loss functions in the next session.

In [None]:

predictions_a = multi_out_perceptron(test_points, [-0.5, 1.0], -0.2)
loss_a = loss_function(predictions_a, test_classes)
predictions_b = multi_out_perceptron(test_points, [-0.4, 1.0], -0.3)
loss_b = loss_function(predictions_b, test_classes)
loss_b < loss_a



# Gradient Descent Demonstration

We have a model and we have a way of measuring how good it is. Next we need an efficient way to update the model weights to get the best fit, ie to minimize the loss of the model (on some training data).

We will use gradient descent to do this. We will cover this in the next session. The following implements gradient descent as a demonstration.

## Considering some function

Let's consider the following arbitrary function and its gradient:

$f(x) = \exp(-\sin(x))x^2$

$f'(x) = -x \exp(-\sin(x)) (x\cos(x)-2)$

It is convenient to define python functions which return the value of the function and its gradient at an arbitrary point $x$;


In [None]:
def function(x):
    return np.exp(-np.sin(x)) * (x**2)


def gradient(x):
    return -x * np.exp(-np.sin(x)) * (x * np.cos(x) - 2)


### Visualising the function

In [None]:
x = np.linspace(-10, 10, 500)
plt.figure()
plt.plot(x, function(x))
plt.show()

### Implementing a simple GD

Now let us implement a simple Gradient Descent that uses constant stepsizes. Define two functions:

1. simplest version which doesn't store the intermediate steps that are taken. 
2. a version which does store the steps (useful to visualize what is going on and explain some of the typical behaviour of GD).

Let's call them `simple_GD` and `simple_GD2`. The parameters of both functions will be the initial point `x0`, the stepsize, and the number of steps to be taken.

In [None]:
def simple_GD(x0, stepsize, nsteps):
    x = x0
    for k in range(0, nsteps):
        x -= stepsize * gradient(x)
    return x


def simple_GD2(x0, stepsize, nsteps):
    x = np.zeros(nsteps + 1)
    x[0] = x0
    for k in range(0, nsteps):
        x[k + 1] = x[k] - stepsize * gradient(x[k])
    return x

### Testing different situations

Try your algorithm `simple_GD` in the following cases:

* $x_0=1, \delta=0.1, n=100$
* $x_0=6, \delta=0.1, n=100$
* $x_0=8, \delta=0.01, n=100$

Can you discuss the results you obtained by having a look at the plot of the function? 

### Visualising the cases

We suggest below a function `viz` which shows the path taken by the gradient descent when computed using `simpleGD2`. 

We the use it in the different cases above in order to see what the Gradient Descent does. Try to interpret the different cases.

In [None]:
def viz(x, a=-10, b=10):
    xx = np.linspace(a, b, 100)
    yy = function(xx)
    ygd = function(x)
    fig, ax = plt.subplots(1, 2, figsize=(10, 4))
    plt.sca(ax[0])
    plt.plot(xx, yy)
    plt.plot(x, ygd, ".-", color="k", label="steps", alpha=0.5)
    plt.plot(x[0], ygd[0], marker="o", color="green", markersize=10, label="start")
    plt.plot(
        x[len(x) - 1],
        ygd[len(x) - 1],
        marker="o",
        color="red",
        markersize=10,
        label="end",
    )
    plt.title("Global Picure")
    plt.sca(ax[1])
    plt.title("Zoom (N.B. both axes diff scales)")
    xx = np.linspace(min(x), max(x), 100)
    yy = function(xx)
    plt.plot(xx, yy)
    plt.plot(x, ygd, ".-", color="k", label="steps", alpha=0.5)
    plt.plot(x[0], ygd[0], marker="o", color="green", markersize=10, label="start")
    plt.plot(
        x[len(x) - 1],
        ygd[len(x) - 1],
        marker="o",
        color="red",
        markersize=10,
        label="end",
    )
    plt.legend()

In [None]:
x1 = simple_GD2(3, 0.1, 100)
x2 = simple_GD2(6, 0.1, 100)
x3 = simple_GD2(8, 0.01, 100)
x4 = simple_GD2(3, 0.5, 100)

viz(x1)
plt.show()
viz(x2)
plt.show()
viz(x3)
plt.show()
viz(x4)
plt.show()