# Perceptron
In this notebook, we will see the basic operative of the **perceptron**, the most basic neural network.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

The perceptron, in its most basic version, is built by a single neuron with a step activation function:

$$f({\bf x}) = \phi\Big( \sum_{i=1}^n(w_i\cdot x_i)-b\Big)$$
with
$$\phi(x)=
\begin{cases}
  1 & \text{if $x>0$} \\
  0 & \text{otherwise}
\end{cases}
$$

where we combine the input of the neuron with the parameters throughout the <a href="https://en.wikipedia.org/wiki/Dot_product" _target="blank">*inner product*</a> (linear combination of inputs) and apply the activation function. For convenience, in the next code we define these two steps in separate functions of the class `neuron`:

In [None]:
class neuron:
    def __init__(self, W, b, act_func):
        self.W = W
        self.b = b
        self.act_func = act_func

    def linear_combination(self, vector):
        return np.dot(self.W,vector)-self.b

    def compute(self, vector):
        return self.act_func(self.linear_combination(vector))

In [None]:
def step(x):
    return 1 if x>0 else 0;

## The OR and AND operators
AND and OR are simple mathematical functions that can also be represented by a simple neural network (perceptron). In both cases, there are two operands (two dimensions, $n=2$), but it requires a specific configuration of the parameters:
- $w_1=w_2=1$
- $b=1.5$

for **and** operator.

<img src="images/operator_and.png" alt="and" style="width: 200px;"/>

We can set up now the neuron with the configuration mentioned above:

In [None]:
f_and = neuron(np.ones(2),1.5,step)

and test it. Does it work as the **AND** operator?

In [None]:
for i in [0,1]:
    for j in [0,1]:
        print("Result of AND(",i,",",j,"):", f_and.compute(np.array([i,j])))

For the **or** operator, the configuration of the parameters is:
- $w_1=w_2=1$
- $b=0.5$

<img src="images/operator_or.png" alt="or" style="width: 200px;"/>

In [None]:
f_or = neuron(np.ones(2),.5,step)

In [None]:
for i in [0,1]:
    for j in [0,1]:
        print("Result of OR(",i,",",j,"):", f_or.compute(np.array([i,j])))

It seems to work as expected, isn't it?

But let's explore HOW it works. Let's see the output space of the linear combination of the AND's neuron, and its transformation once the activation function is applied:

In [None]:
def plot_logical_func(model): 
    fig = plt.figure(figsize=(10, 8))
    
    x = np.linspace(-0.1, 1.1, 100)
    y = np.linspace(-0.1, 1.1, 100)
    X, Y = np.meshgrid(x, y)
    xs = np.array([0,0,1,1])
    ys = np.array([0,1,0,1])
    
    ax = fig.add_subplot(121, projection='3d')
    
    Zb = np.array([model.linear_combination(np.array([xi, yi])) for xi, yi in zip(X.ravel(), Y.ravel())])
    Zb = Zb.reshape(X.shape)  # Reshape to match X and Y grid shapes
    zbs = np.array([model.linear_combination(np.array([xi, yi])) for xi, yi in zip(xs, ys)])
    
    ax.plot_surface(X, Y, Zb, cmap='viridis', alpha=0.5)
    ax.scatter(xs, ys, zbs)#, marker=m)
    ax.set_title("Before activation")
    
    ax = fig.add_subplot(122, projection='3d')
    Za = np.array([model.compute(np.array([xi, yi])) for xi, yi in zip(X.ravel(), Y.ravel())])
    Za = Za.reshape(X.shape)  # Reshape to match X and Y grid shapes
    zas = np.array([model.compute(np.array([xi, yi])) for xi, yi in zip(xs, ys)])
    
    ax.plot_surface(X, Y, Za, cmap='viridis', alpha=0.5)
    ax.scatter(xs, ys, zas)#, marker=m)
    ax.set_title("After activation")
    
    plt.show()

In [None]:
plot_logical_func(f_and)

The 4 points represent the output of the logical function given all the combination of 2 binary inputs.

For the OR operator, we observe:

In [None]:
plot_logical_func(f_or)

In these plots, one can easily observe the non-linearity introduced by the activation functions to the linear combination the inputs.

## Questions
- Which is the effect of changing the value of the bias term, $b$?
- Can we configure a single-neuron model (perceptron) to simulate the XOR-operator? Why?

<img src="images/operator_xor.png" alt="xor" style="width: 200px;"/>

Let's build a small neural network model with a single hidden layer:

In [None]:
# one hidden-layer NN model
class nn_model:
    def __init__(self, hidden_layer, output_layer):
        self.hl = hidden_layer
        self.ol = output_layer

    def linear_combination(self, vector):
        hl_out = np.array([n.linear_combination(vector) for n in self.hl])
        # The output of a layer is the input of the following one
        ol_out = np.array([n.linear_combination(hl_out) for n in self.ol])
        return [hl_out, ol_out]
        
    def compute(self, vector):
        hl_out = np.array([n.compute(vector) for n in self.hl])
        # The output of a layer is the input of the following one
        ol_out = np.array([n.compute(hl_out) for n in self.ol])
        return [hl_out, ol_out]

Now, we can define the model for the XOR operator, using a first layer with two neurons (defined just as AND and OR operators) and a single neuron in the second (output) layer, parametrized as follows:
- $w_1=-1$: this is the parameter controlling the contribution of the AND-operator's output
- $w_2=1$: parameter for OR-operator's output
- $b=0.5$

In [None]:
f_xor = nn_model([f_and, f_or], [neuron(np.array([-1,1]),.5,step)])

In [None]:
for i in [0,1]:
    for j in [0,1]:
        print("Result of XOR(",i,",",j,"):", f_xor.compute(np.array([i,j]))[1])

In [None]:
fig = plt.figure(figsize=(10, 8))

x = np.linspace(-0.1, 1.1, 100)
y = np.linspace(-0.1, 1.1, 100)
X, Y = np.meshgrid(x, y)
xs = np.array([0,0,1,1])
ys = np.array([0,1,0,1])

ax = fig.add_subplot(121, projection='3d')

Zb = np.array([f_xor.linear_combination(np.array([xi, yi]))[1][0] for xi, yi in zip(X.ravel(), Y.ravel())])
Zb = Zb.reshape(X.shape)  # Reshape to match X and Y grid shapes
zbs = np.array([f_xor.linear_combination(np.array([xi, yi]))[1][0] for xi, yi in zip(xs, ys)])

ax.plot_surface(X, Y, Zb, cmap='viridis', alpha=0.5)
ax.scatter(xs, ys, zbs)#, marker=m)
ax.set_title("Only linear combinations")

ax = fig.add_subplot(122, projection='3d')

Za = np.array([f_xor.compute(np.array([xi, yi]))[1][0] for xi, yi in zip(X.ravel(), Y.ravel())])
Za = Za.reshape(X.shape)  # Reshape to match X and Y grid shapes
zas = np.array([f_xor.compute(np.array([xi, yi]))[1][0] for xi, yi in zip(xs, ys)])

ax.plot_surface(X, Y, Za, cmap='viridis', alpha=0.5)
ax.scatter(xs, ys, zas)#, marker=m)
ax.set_title("With non-linear transformations")

plt.show()

We can observe in the previous plots that exclusively relying in the linear combinations we could not have the expected result. The activation function introduces non-linearities in the network that allows for identifying patterns.

## Questions:
- Why do we need more than one layer of neurons?
- It is said that all neurons in a NN model would collapse into a single neuron if they just compute linear combinations. Could you tell why?


## Algebra

In this introduction to neural networks we have presented a model as a set of operations that are executed one-by-one (all the neurons of a layer can be executed in parallel, neurons from different layers, sequentially).

However, we can see NNs as a set of <a href="https://en.wikipedia.org/wiki/Linear_algebra" target="blank">algebra calculations</a>:
- we have already presented the input of neurons and the weights of each neuron as vectors.
- the weights of the neurons in the same layer can be seen as rows of a matrix $W_l$, and the corresponding bias terms as elements of a vector $\mathbf{b}_l$ such that, given layer $l$'s input $\mathbf{x}_l$, the output of the layer is the vector:

$$\mathbf{o}_l=\phi(W_l\cdot\mathbf{x}_l-\mathbf{b}_l)$$

- the concatenation of layers can be seen as the concatenation of such calculations where the input of each layer is the output of the previous one ($\mathbf{x}_l = \mathbf{o}_{l-1}$)

In [None]:
# a version of the step function which can lead with vector
def step(x):
    return (x>0).astype(float)

# one hidden-layer NN model as a concatenation of linear algebra operations
class nn_model_alg:
    def __init__(self, weights, bias_terms, f):
        self.W = weights
        self.b = bias_terms
        self.phi = f
        
    def compute(self, x):
        x_new = x.copy()
        for W, b in zip(self.W, self.b):
            x_new = self.phi(np.matmul(W,x_new)-b)
        return x_new

The parameters of the network for the XOR operator, formated as explained above, would be:

In [None]:
W0 = np.ones((2,2)) # weights of the hidden layer (2 neurons) as rows
b0 = np.array([[1.5,0.5]]).T # as a column
W1 = np.array([[-1.,1.]]) # a row
b1 = np.array([[0.5]])

f_xor = nn_model_alg([W0,W1],[b0,b1],step)

In [None]:
for i in [0,1]:
    for j in [0,1]:
        print("Result of XOR(",i,",",j,"):", f_xor.compute(np.array([[i,j]]).T)[0,0])