# Artificial Neural Network

Use many layers of **nonlinear processing** units for **feature extraction and transformation**.  
Each successive layer **uses the output of the previous layer** for its input.  
What they learn forms a **hierarchy of concepts**.  
In this hierarchy, each level learns to transform its input data into a more and more abstract and composite representation.  

![neuron](https://miro.medium.com/max/1260/1*L_lfAEddxxAg2EqJfB5i6Q.png)

![ann](https://miro.medium.com/max/1800/1*l57B0pjXoO-1H1xZYV7QBA.png)

1) Based on the connection strength (**weights**) and **transfer function**, the activation value passes to the next node.  
2) Each of the nodes **sums the activation values** that it receives (it calculates the **weighted sum**) and modifies that sum based on its transfer function.  
3) Next, it applies an **activation function** (function that’s applied to this particular neuron).  
4) The neuron understands if it needs to pass along a signal or not.  
5) The activation runs through the network until it reaches the output nodes.  
6) The output nodes then give us the information in a way that we can understand.  


The **model performance** is evaluated by the **cost function**. 
Your network will use a **cost function** (minimize loss function) to **compare** the output and the actual expected output.  

7) **Back propagation :** The information goes back, and the neural network begins to learn with the goal of minimizing the cost function by tweaking the weights.  

### Forward Propagation
Information is entered into the input layer and **propagates forward** through the network to get our output values.

## Activation Function
**Translates** the input signals to output signals.  
**Maps** the output values on a range like 0 to 1 or -1 to 1

- **Threshold Function :** If the summed value of the input reaches a certain threshold the function passes on 0. If it’s equal to or more than zero, then it would pass on 1.

![threshold](https://miro.medium.com/max/259/1*DC237QrcxQtCa5Z5ueTVrw.png)

- **Sigmoid Function :** **Smooth, gradual progression** from 0 to 1. It’s very useful in the output layer and is heavily used for **linear regression**.

![sigmoid](https://miro.medium.com/max/259/1*VlLJGjp2N97E1T2BcCI4Hg.png)

- **Hyperbolic Tangent Function :** The value goes below zero, from -1 to 1.

![hyperbolic](https://miro.medium.com/max/259/1*XRqAB63J8SZ8EmQsCqnx4Q.png)

- **Rectifier Function :** **Smooth and gradual** after the kink at 0. This means, for example, that your output would be either *no* or a **percentage** of *yes*.  
*most efficient and biologically plausible*

![rectifier](https://miro.medium.com/max/259/1*clkGLXsbu4P0RDf5IWcj_g.png)

## Adjusted Weights 
Use a **gradient descent** to look at the angle of the slope of the weights and find out if it's positive or negative in order to continue to slope downhill to find the best weights to reach the **global minimum**.  

### Gradient Descent 
Algorithm for finding the **minimum of a function**.  
The machine is **learning the gradient**, or *direction*, that the model should take to **reduce errors**.

![gradient](https://miro.medium.com/max/366/1*kmmjFBP5vRkKOM1SP4URpA.png) 

#### Stochastic Gradient Descent
Normal gradient descent (or **batch gradient descent**) will get **stuck at a local minimum** rather than a global minimum.
![local minimum](https://miro.medium.com/max/536/1*b7Gpub8q4zVXdv4GcBSK3A.png)

To counter that, we use the **Stochastic Gradient Descent** which take the rows one by one, run the neural networks, look at the cost functions, adjust de weights, and then move to the next row. (**Adjusting** the weights for each row)  
Has much **higher fluctuations**, which allows you to find the global minimum.  

#### Mini-Batch Gradient Descent
Set a number of rows, run that many rows at a time, and then update your weights.  

![gradient comparaison](https://cdn-images-1.medium.com/max/1600/1*3L2t1Da4M3ztbB0I1Torhw.png)

### Training an artificial neural network with stochastic gradient descent

1) Randomly initiate weights to small numbers close to 0.  

2) Input the first observation of your dataset into the input layer, with each feature in one input node.  

3) **Forward propagation** — from left to right, the neurons are activated in a way that each neuron’s activation is limited by the weights. You propagate the activations until you get the predicted result.  

4) Compare the predicted result to the actual result and measure the generated error.  

5) **Backpropagation** — from right to left, the error is back propagated. The weights are updated according to how much they are responsible for the error. (The learning rate decides how much we update the weights.)  

6) **Reinforcement learning** (repeat steps 1–5 and update the weights after each observation) OR batch learning (repeat steps 1–5, but update the weights only after a batch of observations).  

7) When the whole training set has passed through the ANN, that is **one epoch**. Repeat with more epochs.  

## Application
### Classification
Binary decisions or multiple-class identification in which observations are separated into categories according to specified. Use *cross sectional* data.  
- Credit card fraud detection reportedly  
- Cursive handwriting recognition  
- Cervical smear screening system  
- Petroleum exploration (determine locations of underground oil and gas deposits)  
- Detection of bombs in suitcases  

### Time Series  
Build a **forecasting** model from the historical data set to predict future data points.  
- Foreign exchange trading systems  
- Portfolio selection and management  
- Forecasting weather patterns  
- Speech recognition network being marketed  
- Predicting/confirming myocardial infarction, a heart attack, from the output waves of an electrocardiogram (ECG)  
- Identifying dementia from analysis of electrode-electroencephalogram  

### Optimization  
Finding solution for a set of very difficult problems known as Non-Polynomial **(NP)-complete** problems  
- Traveling salesman problem  
- Job-scheduling in manufacturing and efficient routing problems involving vehicles or telecommunication.  

![application](https://i.imgur.com/ufr7Ywr.png)