# Artifical Neural Networks

## The Neuron

The image below shows a neuron, also known as a node.

![ann1](ann1.png)

The neuron takes a series of input signals and produces a single output signal. In this course, yellow nodes will signify input values, while green will signify hidden nodes and red will signify output values.

<img src="ann2.png" alt="ann2" width="200" height="100" style="float:right">

The values in the input layer correspond to a single observation (a single row in the database), measuring the values of multiple independent variables that have been standardized or normalized. Standardization ensures that the variables have mean of 0 and a variance of 1. In normalization you subtract the minimum value and divide by the maximum value to get values between 0 and 1. 

Whether you choose normalization or standardization depends on the scenario. This is necessary to ensure the network works correctly. Further reading on this can be found in Efficient BackProp by Yann LeCun et al (1998).

The output value can be continuous, binary, or categorical. If the output is categorical, we can say that the neuron has multiple outputs, corres

The connecting lines between nodes in the synapse layer (dendrites) carry weights $w_{1}...w{m}$ which are adjusted as the network is trained.

At the neuron, an activation function $\phi$ is applied to the sum of weights,

\begin{equation}
    \phi\left(\sum_{i=1}^{m} w_i x_i \right).
\end{equation}

Depending on the function and the outcome, the signal is passed on as output to the next node.

## The Activation Function

The activation function can comprise a number different forms. Below, note that $x$ without a subscript indicates the sum of weights.

### Threshold Function

The threshold function or step function, is a very simple function where if the value is less than 0, the function outputs 0, otherwise it's a 1,

\begin{equation}
    \phi(x) = 
    \begin{cases}
        1 \text{ if } x \ge 0, \\
        0 \text{ if } x < 0.
    \end{cases}
\end{equation}

![ann3](ann3.png)


### Sigmoid Function

The sigmoid function or logistic function,

\begin{equation}
    \phi(x) = \frac{1}{1 + e^{-x}},
\end{equation}

is smooth, asymptotically approaching $0$ below $x = 0$ and $1$ above $x = 1$. It's very useful in the final layer of the network, especially when the output is a probability.


![ann4](ann4.png)

### Rectifier Function 

The rectifier function,

\begin{equation}
    \phi(x) = \max(x,0),
\end{equation}

is one of the most popular functions for ANNs. See Deep Sparse Rectifier Neural Networks by Xavier Glorot et al (2011) for more on why the rectifier function is so widely used.

![ann5.png](ann5.png)

### Hyperbolic Tangent Function

The hyperbolic tangent function,

\begin{equation}
    \phi(x) = \frac{1 - e^{-2x}}{1 + e^{-2x}},
\end{equation}

is similar to the sigmoid function but it asymtotically approaches $-1$ with an increasing negative $x$.

![ann6.png](ann6.png)

### Examples

Assume the dependent variable is binary, $y = 0,1$. We could use a the threshold function, in which case $y=\phi(x)$. Or we could use the sigmoid function to get the probability, $\text{P}(y=1)=\phi(x)$, similar to logistic regression.

In neural networks, frequently the hidden layers will use rectifier functions, while the output layer will use a logistic function.

![ann8.png](ann8.png)


