# Neural Network - Theory

# What is a Neural Network?

Neural Network forms the base of deep learning, which is a subset of the machine learning field.
- It is inspired by the structure of human brain.
- A real neuron in human brain has the following components:
  - Dendrite: Input to a Neuron
  - Cell body: Information processing happens here
  - Axon: Output to the neuron

# Components of Artificial Neural Network

1. Input layer: Receives input as an array of data
2. Output layer: Predicts final output
3. Hidden layers: A black box which perform most computations required
4. Neurons: Each node is represented as neurons, similar to neurons of brain.
5. Channels: The connections connecting 2 neurons.
6. Weights: Each channels is assigned a numerical value, for which the input from one neuron would be multipled with this weight and supply to the connecting neuron.
7. Bias: Neurons in the hidden layers which received inputs from input layer would be associated with a numerical value. This numerical value is called bias.
8. Activation Function: A mathematical function that would determine whether the threshold has been crossed for a neuron to be activated & translates the data to the next neurons.

# How Artificial Neural Network work?
1. The input is being feed as arrays of data into the input layer.
2. Random weights are assigned to each interconnection between the input and hidden layer.
3. The weights are multipled with the inputs and a bias is added to form the transfer function.
$$z = \sum_{i=1}^n w_i x_i + b$$
	•	 z : The result of the summation (input to the activation function).\
	•	 w_i : Weight associated with each input  x_i.\
	•	 x_i : Individual inputs to the node.\
	•	 b : Bias term added to the summation.\
	•	 $\sum_{i=1}^n$ : Summation symbol, summing over  i  from 1 to  n  (total number of inputs).

4. Weights are assigned to the interconnection between the hidden layers.
5. The output of a transfer function (from 1 hidden layer) is fed as an input to the activation function (of the subsequent hidden layer).
6. At the end, the output layer would output the final form of a prediction, by applying suitable activation function to the output layer.

# Weights
The higher a weight of an artificial neuron is, the stronger the input which is
multiplied by it will be. Weights can also be negative, so we can say that the signal is
inhibited by the negative weight.

Depending on the weights, the computation of the neuron will be different. By adjusting the weights of an artificial neuron we can obtain the output we want for specific inputs. 

But when we have an ANN of hundreds or thousands of neurons, it would be quite complicated to find by hand all the necessary weights. But we can find algorithms which can adjust the weights of the ANN in order to obtain the desired output from the network. This process of adjusting the weights is called learning or training.

# Activation Function
There are multiple types of activation function that could be used.

1. the Sigmoid Function, $sigma(z) = \frac{1}{1 + e^{-z}}$, which is used when the model is predicting probability.
<img src="sigmoid_function_plot.png" alt="Sigmoid Function" width="500">

2.  the Threshold Function, $
\phi(x) =
\begin{cases}
1, & \text{if } x \geq 0 \\
0, & \text{if } x < 0
\end{cases}
$ , which is used when the output depends on a threshold value.
<img src="threshold_function_plot.png" alt="Threshold Function" width="500">

4. the ReLU(Rectified Linear Unit) Function, which gives an output x if x is positive, 0 otherwise.

$$\phi(x) = \max(0, x)$$

Where:
- \(x\): Input value.
- $(\phi(x))$: Output of the ReLU function.
<img src="relu_function_plot.png" alt="ReLU Function" width="500">

5. the Hyperbolic Tangent Function - similar to sigmoid function with a range of (-1,1).


$$\phi(x) = \frac{1 - e^{-2x}}{1 + e^{-2x}}$$

Where:
- $(x)$: Input value.
- $(\phi(x))$: Output of the tanh function.
<img src="tanh_function_plot.png" alt="Tanh Function" width="500">

# Backpropagation

Backpropagation is the process of updating the weights of the network in order to reduce the error in prediction.\
The backpropagation algorithm uses supervised learning, which means that we provide the algorithm with examples of the inputs and outputs we want the network to compute, and a cost function is calculated (taking into account the magnitude of loss at any point on our graph, combined with the slope).\
The output is compared with the original result and multiple iterations are done to get the maximum accuracy. \
For practical reasons, ANNs implementing the backpropagation algorithm do not
have too many layers, since the time for training the networks grows exponentially. Also, there are refinements to the backpropagation algorithm which allow a faster learning.

# Loss Function
It is a measurement of error which defines the precision lost on comparing the predicted output to the actual output.

Loss Function Formula:[(actual output) - (predicted output)]<sup>2</sup>

## Error of the entire network
The error of the network will simply be the sum of the errors of all the neurons in the output layer:
$\sum_{i=1}^n$[(actual output) - (predicted output)]<sup>2</sup>

# Gradient Descent
A graphical method of finding the minimum of a function.\
A random point on this curve is chosen and the slope at this point is calculated.
- A +ve slope == an increase in weight
- A -ve slope == a decrease in weight
- A zero slope == appropriate weight
  
Our aim is to reach a point where the slope is zero.

The formula for gradient descent is: $$\Delta w_{ji} = - \eta \frac{\partial E}{\partial w_{ji}}$$\
	•	$\Delta w_{ji}$: The change in weight for the connection from neuron $j$ to neuron $i$.\
	•	$\eta$: The learning rate.\
	•	$\frac{\partial E}{\partial w_{ji}}$: The partial derivative of the error $E$ with respect to the weight $w_{ji}$, which represents the gradient.

This formula can be interpreted in the following way: the adjustment of each weight
${\Delta w_{ji}}$ will be the negative of a constant $\eta$ multiplied by the dependance of the
i previous weight on the error of the network, which is the derivative of E in respect to ${w_{i}}$.
The size of the adjustment will depend on $\eta$, and on the contribution of the weight to the error of the function. This is, if the weight contributes a lot to the error, the  adjustment will be greater than if it contributes in a smaller amount.

# Types of Neural Network

### 1. Feedforward Neural Network
Simplest form of Artificial Neural Network, data travels only in 1 direction (input -> output)
- Applications: Vision and speech recognition

### 2. Radial Basis Function Neural Network
This model classifies the data point based on its distance from a center point.
- Applications: Power Restoration Systems
  
### 3. Kohonen Self Organizing Neural Network
Vectors of random dimensions are input to discrete map comprised of neurons.
- Applications: Used to recognizer patterns in data like in medical analysis.

### 4. Recurrent Neural Network (RNN)
The hidden layer saves its output to be used for future prediction
- Applications: Text to speech conversion model

### 5. Convolution Neural Network (CNN)
The input features are taken in batches like a filter. This allows the network to remember an image in parts!
- Applications: Used in signal and image processing

### 6. Modular Neural Network
It has a collection of different neural networks working together to get the output.

# Applications of neural networks
- Facial Recognition
- Handwriting Recognition
- Forecasting
    - Stock exchange prediction
- Music Composition
- Image Compression

# Sources

1. [Artificial Neural Networks for Beginners by Carlos Gershenson](https://arxiv.org/abs/cs/0308031)
2. [Neural Network Full Course | Neural Network Tutorial For Beginners | Neural Networks | Simplilearn](https://www.youtube.com/watch?v=ob1yS9g-Zcs&ab_channel=Simplilearn)
3. [Neural Networks in Python – A Complete Reference for Beginners](https://www.askpython.com/python/examples/neural-networks)