# Neural Networks (information)

* A Neural Network is a series of algorithms that help us recognize relationships in a dataset through a process which mimics the way the human brain works.

* It can adapt to changing input and generate the best results for us.

* A "neuron" in a neural network is a mathematical function that collects and classifies information according to a specific architecture.


### Architecture

* 3 Main components in a neural network:
    
    * The input layer.
        - The input is the beginning of the workflow, composed of artificial input neurons, and prepares the information for further processing by subsequent layers of neural networks.
        
    * The hidden layer.
        - Applies weights to the inputs and direct them through activation functions for the output. Hidden layers perform nonlinear transformations of the inputs entered into the network.
        
    * The output layer.
        - Produces output for the program.


### Activation functions

Activation functions are mathematical equations that determine the output of a neural network.

The function is attached to each neuron in the network, and determines whether it should be activated or not, based on whether each neuron's input is relevant for the model's prediction.

A transformation function that maps the input signals to output signals, for the functioning of the neural network.

It helps us to determine the output of the program by normalizing the output values in a range of 0 to 1 or -1 to 1.


### Different Activation Functions

* Linear Activation function
    * f(x) = x
    * range from -infinity to infinity

* Non Linear Activation function
    (Slope):
    * Derivative
    * Differential

* Monotonic function
    * A function which is either entirely non-increasing or non-decreasing.

* Sigmoid, tanh, relu, prelu, elu, leaky relu, etc.


# When to use Sigmoid or Softmax

* Binary classification problems (yes or no decision), we can use Sigmoid.

* Multi-class classification problems (n-different classes), we can use the Softmax function.
    * exponential of the power of the logit of a class divided by the sum of the exponentials of each class.

* Magnitudes are also called Logits.
    * Apple 0.46
    * Banana 0.34
    * Orange 0.20

(sigma) -->(z)i = e(z)i / sum(K..j=i) e(z)j

* e power z(i) in the enumerator is the exp. logit of i'th - class.
* Denominator = sum of all exponentials of the logits of each class.
* The sum of all outputs = 1.

* We find the index with maximal probability and give that as the output.

* Softmax(Apple) = e power 1.2 divided by (e power 1.2 + e power 0.9 + e power 0.4) = 0.46.

* Only use Softmax when predicting more than 2 classes, else use Sigmoid.


# Variety of Activation functions

#### Sigmoid

* A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve.

* The main reason for using the Sigmoid function is that the function exists between -1 and 1.

* It is used for binary classification (yes or no).

* It can also be used for the prediction of an outcome.

#### Tanh

Generally used for classification between 2 classes.

* Tanh function is also known as Hyperbolic tangent Activation function.
* Tanh is similar to the Sigmoid.
* Range is between -1 and 1.
* Looks like an "S"-curve too.
* Tanh has a large area and better slope compared to Sigmoid. This helps models using Tanh activation to learn better.

#### Relu

R(z) = max(0, z)

Relu is used for Convolutional Neural networks.

* Is also known as Rectified Linear Unit Activation function.
* Most used and accepted function.

* It is half rectified from bottom, which means if you feed any negative data to Relu, it will return you a zero, and if you feed positive data it will return the exact number.

* The range is from 0 to infinity.

* The major issue is that all the negative values become 0, which decreases the ability of the model to fit or train from the data properly.

* It will not map any negative values properly in a graph, because they are made 0.

* Monotonic.

#### Leaky Relu 

* Leaky Relu activation functions help us to increase the range of the ReLU function.

* Usually A=(0.01) but you can change the value in different scenarios.
    * If changed, it's called Randomized Relu.

* Range from -infinity to infinity.

* Monotonic.

# Other Activation functions

* Prelu
* Elu


# Features

* Optimizers.
* The Dropout Layer for neural networks.
* Hyper Parameter Tuning.
* Batch Normalization.


### Different types of Neural Networks

* Convolutional Neural Networks.
* Recurrent Neural Networks.