# Activation Functions

## Most commonly used activation functions

- **sigmoid function**: it is a function that maps any value to a value between 0 and 1. It is a non-linear function used in the output layer of a neural network to predict the probability of an input belonging to a certain class. It is also used in the hidden layers of a neural network to capture non-linearities. It is given by the following equation:

$$\sigma(x) = \frac{1}{1+e^{-x}}$$

- **ReLU**: (Rectified Linear Unit) it is a function that maps any value to a value between 0 and infinity (only positive values). It is a non-linear function used in the hidden layers of a neural network to capture non-linearities. It is given by the following equation:

$$ReLU(x) = max(0,x)$$

- **Linear activation function**: it is a function that maps any value to itself. It is a linear function used in the output layer of a neural network to predict a continuous value. Sometimes people say, we're not using an activation function. It is given by the following equation:

$$f(x) = x$$


## Chosing activation functions

### Output layer

It depends on the target label or the ground truth label y is. 

- If y is a binary label, then we use sigmoid function in the output layer. Because the algorithm learns to predict the probability of 1.

- If y is a multi-class label, then we use softmax function in the output layer. Because the algorithm learns to predict the probability of each class.

- If y is a continuous value, then we use linear activation function in the output layer. Because the algorithm learns to predict a continuous value. Y can be any number positive or negative.

- If y is a positive continuous value, then we use ReLU activation function in the output layer. Because the algorithm learns to predict a positive continuous value. Y can be any number positive or zero.




### Hidden layers

- ReLU is the most common choice. Why? Because it is a non-linear function and it is easy to compute.

- Sigmoid is slower. And it goes flat on two sides. ReLU only goes flat on the negative values.

## TensorFlow Implementation

In [2]:
import tensorflow as tf
from tensorflow.keras import Sequential, layers

model = Sequential([
    layers.Dense(25, activation='relu'),
    layers.Dense(15, activation='relu'),
    layers.Dense(1, activation='sigmoid') #or "linear", "softmax", "tanh"
])

## Other activation functions

- LeakyReLU

- GeLU

- ELU

- SELU

- Swish

- Softplus

- Softsign

## Why do we need activation functions?

- Without activation functions or using linear activation function in all neurons, the neural network is just a linear regression model. No point in using the neural network.

- Activation functions introduce non-linearities to the neural network. Without non-linearities, the neural network is just a linear regression model.

- A model that uses a linear activation function in all hidden neurons and a sigmoid activation function in the output neuron is equivalent to a logistic regression model.

- **Don't use the linear activation in hidden layers**. Use ReLU instead.