Activation functions are mathematical functions used in artificial neural networks to introduce non-linearity into the network's computations. They determine the output of a neuron or a layer of neurons based on the weighted sum of inputs. Activation functions play a crucial role in enabling neural networks to learn complex patterns and relationships in data.

There are several types of activation functions, each with its own characteristics and applications:

1. **Sigmoid Function (Logistic)**: The sigmoid function maps input values to a range between 0 and 1, resembling an S-shaped curve. It's often used in the context of binary classification problems but can suffer from vanishing gradient problems in deep networks.

2. **Hyperbolic Tangent (Tanh)**: Similar to the sigmoid, the tanh function maps input values to a range between -1 and 1, providing a slightly shifted and steeper curve. It also suffers from vanishing gradient issues.

3. **Rectified Linear Unit (ReLU)**: The ReLU activation function replaces negative inputs with zero and leaves positive inputs unchanged. It's computationally efficient and has been widely adopted due to its ability to mitigate vanishing gradient problems.

4. **Leaky ReLU**: Leaky ReLU is an extension of the ReLU, allowing a small gradient for negative inputs. This helps to address the "dying ReLU" problem where some neurons never activate during training.

5. **Parametric ReLU (PReLU)**: PReLU is similar to Leaky ReLU but with a learnable parameter that determines the slope for negative inputs. This parameter can be optimized during training.

6. **Exponential Linear Unit (ELU)**: ELU is another variant of ReLU that mitigates the vanishing gradient issue and includes a small negative slope for negative inputs. It also has a parameter that controls the saturation point for extremely negative inputs.

7. **Scaled Exponential Linear Unit (SELU)**: SELU is a self-normalizing activation function that aims to maintain a mean and variance of activations close to 0 and 1, respectively. It's designed to work well in deep networks without requiring extensive parameter tuning.

8. **Softmax**: Softmax is used primarily in the output layer for multiclass classification problems. It converts a vector of raw scores into a probability distribution over multiple classes.

Certainly, here are the mathematical expressions for the activation functions mentioned earlier:

1. **Sigmoid Function (Logistic)**:\( \sigma(x) = \frac{1}{1 + e^{-x}} \)

2. **Hyperbolic Tangent (Tanh)**:
   \( \tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}} \)

3. **Rectified Linear Unit (ReLU)**:
   \( \text{ReLU}(x) = \max(0, x) \)

4. **Leaky ReLU**:
   \( \text{Leaky ReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha x & \text{if } x < 0 \end{cases} \)
   where \( \alpha \) is a small positive constant (typically a small fraction like 0.01).

5. **Parametric ReLU (PReLU)**:
   \( \text{PReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha x & \text{if } x < 0 \end{cases} \)
   where \( \alpha \) is a learnable parameter.

6. **Exponential Linear Unit (ELU)**:
   \( \text{ELU}(x) = \begin{cases} x & \text{if } x \geq 0 \\ \alpha (e^x - 1) & \text{if } x < 0 \end{cases} \)
   where \( \alpha \) is a positive constant, typically around 1.0.

7. **Scaled Exponential Linear Unit (SELU)**:
   \( \text{SELU}(x) = \lambda \begin{cases} x & \text{if } x > 0 \\ \alpha e^x - \alpha & \text{if } x \leq 0 \end{cases} \)
   where \( \alpha \) and \( \lambda \) are hyperparameters that help maintain activations near mean 0 and variance 1.

8. **Softmax** (for \( i \)th element of the input vector \( \mathbf{x} \)):
   \( \text{Softmax}(x_i) = \frac{e^{x_i}}{\sum_{j=1}^N e^{x_j}} \)
   where \( N \) is the number of classes.

### 1. Sigmoid Activation Function

In [3]:
## Creating a sigmoid function
import math

def sigmoid(x):
    return 1/(1 + math.exp(-x))

In [18]:
## Testing different values for sigmoid function
for i in range(-100, 101, 25):
    result = sigmoid(i)
    formatted_result = "{:.2e}".format(result)
    print(f"The range of sigmoid at i = {i} is: {formatted_result}")

The range of sigmoid at i = -100 is: 3.72e-44
The range of sigmoid at i = -75 is: 2.68e-33
The range of sigmoid at i = -50 is: 1.93e-22
The range of sigmoid at i = -25 is: 1.39e-11
The range of sigmoid at i = 0 is: 5.00e-01
The range of sigmoid at i = 25 is: 1.00e+00
The range of sigmoid at i = 50 is: 1.00e+00
The range of sigmoid at i = 75 is: 1.00e+00
The range of sigmoid at i = 100 is: 1.00e+00


### 2. Tanh Activation Function

In [21]:
def tanh(x):
    return (math.exp(x) - math.exp(-x)) / (math.exp(x) + math.exp(-x))

In [22]:
## Testing different values for tanh function
for i in range(-100, 101, 25):
    result = tanh(i)
    formatted_result = "{:.2e}".format(result)
    print(f"The range of tanh at i = {i} is: {formatted_result}")

The range of tanh at i = -100 is: -1.00e+00
The range of tanh at i = -75 is: -1.00e+00
The range of tanh at i = -50 is: -1.00e+00
The range of tanh at i = -25 is: -1.00e+00
The range of tanh at i = 0 is: 0.00e+00
The range of tanh at i = 25 is: 1.00e+00
The range of tanh at i = 50 is: 1.00e+00
The range of tanh at i = 75 is: 1.00e+00
The range of tanh at i = 100 is: 1.00e+00


### 3. Relu Activation Function

In [24]:
def relu(x):
    return max(0,x)

In [25]:
## Testing different values for relu function
for i in range(-100, 101, 25):
    result = relu(i)
    formatted_result = "{:.2e}".format(result)
    print(f"The range of relu at i = {i} is: {formatted_result}")

The range of relu at i = -100 is: 0.00e+00
The range of relu at i = -75 is: 0.00e+00
The range of relu at i = -50 is: 0.00e+00
The range of relu at i = -25 is: 0.00e+00
The range of relu at i = 0 is: 0.00e+00
The range of relu at i = 25 is: 2.50e+01
The range of relu at i = 50 is: 5.00e+01
The range of relu at i = 75 is: 7.50e+01
The range of relu at i = 100 is: 1.00e+02


### 4. Leaky Relu

In [27]:
def leaky_relu(x):
    return max(0.1*x,x)

In [28]:
## Testing different values for leaky relu function
for i in range(-100, 101, 25):
    result = leaky_relu(i)
    formatted_result = "{:.2e}".format(result)
    print(f"The range of leaky relu at i = {i} is: {formatted_result}")

The range of leaky relu at i = -100 is: -1.00e+01
The range of leaky relu at i = -75 is: -7.50e+00
The range of leaky relu at i = -50 is: -5.00e+00
The range of leaky relu at i = -25 is: -2.50e+00
The range of leaky relu at i = 0 is: 0.00e+00
The range of leaky relu at i = 25 is: 2.50e+01
The range of leaky relu at i = 50 is: 5.00e+01
The range of leaky relu at i = 75 is: 7.50e+01
The range of leaky relu at i = 100 is: 1.00e+02
