

# Neural Networks



Inspired by biological neural networks, Artificial Neural Network are computing systems that are made up of interconnected processing elements (perceptrons). The networks receive a series of inputs $x_1 , x_2 ..., x_n$ (or a vector $X \in \mathbb{R}^n$). These inputs are processed by perceptrons and then an output $y_1 , y_2 ..., y_m$ (or a vector $Y \in \mathbb{R}^m$) is produced.

In order to train a neural network pairs of $(X, Y)$ are provided. Given an input $X$, each perceptron modify their output so the generated output of the network is as similar as possible to $Y$. 

After the training, when a new unseen input $X'$ is provided, the network produce the estimated class $\tilde{Y}$.


## The perceptron

The perceptron applies a weighted sum of the inputs, apply activation function and then feeds forward the results. 

![ANN_perceptron](images/ANN_perceptron.png)


The process can be formulated as:

$$out=f(\sum w_ix_i+b)$$

The activation function $f$ is a non linear functions. The most popular functions are:
* ReLU: $f(x)=max(x,0)$
* TanH: $f(x)=tanh(x)$
* Sigmoid: $f(x)=\frac{1}{1+e^{-x}}$

During the training, the weights $w_i$ (and the bias $b$) are adjusted so the output approximated the real value.



## Feed-forward Neural Networks

The perceptrons can be connected to each other creating a network. Those networks that not form cycles are known as Feed-forward neural network. Often these networks are structured in layers.

![ANN_network](images/ANN_network.png)

An example of a simple network is the multilayer perceptron in which perceptrons
are structured in layers, These layers are classified in three types:
* Input Layer: A layer in which the perceptrons receive the input and feed it to
the next layer.
* Hidden layer (one or several): A layer in which perceptrons gather the inputs from the previous layer (either the input layer or the output of another hidden layer), perform the computations and feed the result to the next layer.
* Output Layer: A layer in which perceptrons perform the computations and provide the final output of the function that the network is approximating.

As the perceptrons are structured in layers the weights can be represented as $w_{input,perceptron}$ like in the following image:

![ANN_color](images/ANN_color.png)

The layer can be modeled as:

$$
\begin{bmatrix}
w_{1a} & w_{2a} & w_{3a} & w_{4a}\\ 
w_{1b} & w_{2b} & w_{3b} & w_{4b}\\ 
w_{1c} & w_{2c} & w_{3c} & w_{4c}\\ 
\end{bmatrix}
\begin{bmatrix}
x_1\\ 
x_2\\ 
x_3\\
x_4\\
\end{bmatrix}
+
\begin{bmatrix}
b_a\\ 
b_b\\ 
b_c\\
\end{bmatrix}
\overset{f}{\rightarrow}
\begin{bmatrix}
y_a\\ 
y_b\\ 
y_c\\
\end{bmatrix}
$$


This can be expressed in matrix notation. This means that the function of the layer can be represented as an input $X$ (where each column is an input), and matrix of weights $W$ and the bias as a vector $b$. This can be expressed:

$$y=f(WX+b)$$



## References

[Improving Transductive Data Selection Algorithms for Machine Translation](http://doras.dcu.ie/23726/1/thesis_AlbertoPoncelas.pdf)

https://www.jeremyjordan.me/intro-to-neural-networks/
