# Artificial Neural Networks

---

Artifical Neural Networks try to imitate the ways of working of the human brain, to that it has to emulate the way the neurons in a human brain work.

---

## Neurons

The first step to create a neural network is to create a neuron. A neuron is the most basic part of the brain, on one end the neuron has dendrites, and the other end is the axon. The dendrites of one neuron are connected to the axons of other neurons. The neurons are not physically connected, but they do get signals from another neurons. The space between neurons in a connection is called synapse.

When building an Artificial Neural Network, a neuron, sometimes called a node, is also the most basic component. The neuron get some input signals and has an output signal, equivalent to the dendrites and axons. The relation between connected neurons are the synapses.

### Layers

An Artificial Neural Network is composed of multiple layers with multiple neurons. The first layer is the input layer, the last layer is the output layer, and everything in between is part of the hidden layers. Every neuron in the input layer represents an independent variable and it is expected that all the input values are standarized. And the output layer is the prediction; it can be a cotinuous value, a binary value, or a categorical value.

### Synapses

The synapses between neurons are weighted, what that means is that every value has a different type of consideration or importance to the prediction. In each neuron, the products of the input values ($x$) and the weight of the synapses ($w$) will be added up, and an activation function ($\phi$) will determine what kind of value is passed to the next function.

$$ \phi ( \sum_{i=1}^{m} w_{i} x_{i} ) $$

---

## Activation Function

The activation function is the last step of what the neuron does, and the first step of what is being passed to the next layer. The activation function decides, based on what was calculated from the input values, what is the current neuron going to pass. There are multiple different activation functions, here are some examples:

Given $\displaystyle x = \sum_{i=1}^{m} w_{i}x_{i}$

### Threshold Function

$\phi(x) = (1$ if $x \ge 0$, $0$ if $x < 0)$

### Sigmoid Function

$\phi(x) = {1 \over 1 + e^{-x}}$

### Rectifier Function

$\phi(x) = max(x, 0)$

### Hyperbolic Tangent (tanh)

$\phi(x) = {1 - e^{-2x} \over {1 + e^{-2x}}}$

---

## How do Neural Networks work?

If we considered an Artificial Neural Network to only have input and output layers, it wouldn't be any different to a multiple linear regression or any other model. The advantage over other models resides on the hideen layers. Each neuron in the hidden layers look for an specific feature or relationship between input values, and then pass that value to the next layer taht does the same thing, until it gets to the output layer.

---

## How do Neural Networks learn?

An Artificial Neural Network learn on its own. Different from other models, that are hard coded, an ANN has the facility to "adjust" itself to the fitted data, so after every training session it will adjust the values in every neuron to have a better probability to do a good prediction. The more data you feed it, the better it will be, because it will have more reference to compare future observations.

---

## Gradient Descent

To adjust the weight values of synapses an ANN cannot look through every combination of values, because even in an ANN with only 25 synapses, it would take years for a computer to test all the combinations.

The way it adjust the weight valuse is using the Cost function, and minimize the result with every iteration of the model.

$C = {1 \over 2}(\hat{y}-y)^2$

### Stochastic Gradient Descent

The common Gradient Descent relies on the Cost function being convex, but it could be the case that the function is not convex. The Stochastic Gradiant Descent is an alternative, this method adjusts the weights for the nodes after each datarow is being tested. The Gradient Descent does it on a batch model.

---

## Backpropagation

Forward propagation is the way the data flows through the ANN. After that the errors are calculated and then back propagated for the ANN to adjust the weights. The advantage of this is that it allows the ANN to adjust all the weights at the same time.

---

## ANN Training with Stochastic Gradient Descent

**STEP 1**: Randomly initialise the weights to small numbers close to 0 (but not 0).

**STEP 2**: Input the first observation of your dataset in the input layer, each feature in one input node.

**STEP 3**: Forward-Propagation: from left to right, the neurons are activated in a way that the impact of each neuron's activation is limited by the weights. Propagate the activations until getting the predicted result $y$.

**STEP 4**: Compare the predicted result to the actual result. Measure the generated error.

**STEP 5**: Back-Propagation: from right to ledt, the error is back-propagated. Update the weights according to how much they are responsible for the error. The learning rate decides by how much we update the weights.

**STEP 6**: Repeat **STEPS 1** to **5** and update the weights after each observation (Reinforcement Learning). Or: Repeat **STEPS 1** to **5** but update the weights only after a batch of observations (Batch Learning).

---

## Additional Reading

*Deep sparse rectifier neural networks*. By Xavier Glorot et al. (2011). Link: https://proceedings.mlr.press/v15/glorot11a.

*A list of cost functions used in neural networks, alongside applications*. CrossValidated (2015). Link: https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications.

*A Neural Network in 13 lines of Python (Part 2 - Gradient Descent)*. By Andrew Trask (2015). Link: https://iamtrask.github.io/2015/07/27/python-network-part2/.

*Neural Networks and Deep Learning*. By Michael Nielsen (2015). Link: http://neuralnetworksanddeeplearning.com/chap2.html.
