# Perceptron

Perceptron is the simplest neural network possible. Orignaly it was *binary classificator* (that means, that it decides if it is true or false). <br>
The algorithm was invented by *Frank Rosenblat* in 1958 (check english [wikipedia](https://en.wikipedia.org/wiki/Perceptron) for more historic details)

Consider following model of perceptron:

<img src="data/perceptron.svg" alt="perceptron" style="width: 100%;"/>

We can then rewrite that model to one formula:

$$percetpron(x_{1}, x_{2}, \dots, x_{n}) = \phi\bigg( \theta +  \sum_{i=1}^{n} w_ix_i\bigg)$$

Where $x_{1}, x_{2}, \dots, x_{n}$ are *inputs*, $w_{1}, w_{2}, \dots, w_{n}$ are *weights*, $\theta$ is a *bias* and $\phi$ is an *activation function*.

We can also rewrite this equation to *vector form*:

$$perceptron(\mathbf{x}) = \phi(\mathbf{w_{A}}\mathbf{x_{A}})$$

Where
$\mathbf{x} = [ x_{1}, x_{2}, \dots, x_{n}]$ is input vector,
$\mathbf{x_A} = [1, x_{1}, x_{2}, \dots, x_{n}]$ is *augmented input vector* and
$\mathbf{w_A} = [\theta, w_{1}, w_{2}, \dots, w_{n}]$ is *augmented weight vector*. Note that this mathematical form is much better for implementation (it is just one vector operation and one function applied on vector)

## First learning algorithm - Gradient Descent(GD)

Now, that we defined our model (we have intuitive idea, how neural unit in brain works) and we also have mathematical equation (model of a neural unit), we need to specify, how to make this neural unit learn.

We will use very well known idea of *optimatization* of something. We will define some criterium (which will basically quantify, how bad our neural unit is). We will call that **Error**.

Error is a function, which depends on augmented weight vector and we will define it as follows:

$$Error(\mathbf{w_A}) = e^2 = (perceptron(\mathbf{x}) - y_{real})^2 = (\mathbf{w_A} \cdot \mathbf{x_A} - y_{real})^2$$

Now, since our adaptive parameters are $\mathbf{w}$, we need to partial derivative of $Error$ function to determine gradients:

$$\frac{\partial Error(\mathbf{w})}{\partial \mathbf{w}} = \frac{\partial e^2}{\partial \mathbf{w}} = \frac{\partial e^2}{\partial e} \frac{\partial e}{\partial \mathbf{w}} = \frac{\partial e^2}{\partial e} \frac{\mathbf{x} \cdot \mathbf{w} - y_{real}}{\partial \mathbf{w}} = 2e\mathbf{x}$$

So we determine our gradients and since we want to minimize Error function, our update rule will be written as:

$$\mathbf{w}[k+1] = \mathbf{w}[k] + \mathbf{\Delta w}[k] $$

Where $\mathbf{\Delta w}[k]$:

$$\mathbf{\Delta w}[k] = \mu e \mathbf{x}[k] $$

Note that $\mu$ is learning rate and is usualy set around $0.01$ or $0.001$.

In [2]:
import numpy as np

In [3]:
class LNU:
    def __init__(self, NumWeights):
        self.nw = NumWeights
        self.Weights = np.zeros((self.nw, 1))
        print(self.Weights)
        print(np.shape(self.Weights))
    
    def value(self, vectorX):
        return vectorX*self.Weights
    
    def trainGD(self, MatrixX, VectorY):
        LearningRate = 0.01
        rows = np.shape(MatrixX)[0]
        for k in range(rows):
            currentX = MatrixX[k][np.newaxis, :]
            value = np.dot(currentX, self.Weights)
            error = value - VectorY[k]
            deltaWeights = - LearningRate * error * currentX
            self.Weights = self.Weights + deltaWeights.T    

In [4]:
nfeatures = 4
MatX = np.random.rand(100,nfeatures)
coefs = np.array([[1.2],[2.4],[-5.0],[4.0]])
vectorY = np.dot(MatX, coefs)

In [5]:
print(np.shape(MatX), np.shape(coefs), np.shape(vectorY))

(100, 4) (4, 1) (100, 1)


In [6]:
NeuralUnit = LNU(nfeatures)

[[0.]
 [0.]
 [0.]
 [0.]]
(4, 1)


In [7]:
NeuralUnit.trainGD(MatX, vectorY)

In [8]:
NeuralUnit.Weights

array([[0.53671256],
       [0.52225859],
       [0.06095515],
       [0.59568454]])

In [9]:
coefs

array([[ 1.2],
       [ 2.4],
       [-5. ],
       [ 4. ]])

In [37]:
epochs = 100
for i in range(epochs):
    NeuralUnit.trainGD(MatX, vectorY)

In [38]:
NeuralUnit.Weights

array([[ 1.20015675],
       [ 2.39981144],
       [-4.99965115],
       [ 3.99969195]])

In [39]:
coefs

array([[ 1.2],
       [ 2.4],
       [-5. ],
       [ 4. ]])