<a href="https://colab.research.google.com/github/behrangEhi/ML-DL-Projects/blob/main/Activation_Functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Sigmoid Activation Function & Derivative

### Advantage and Disadvantage of Sigmoid Activation Function

#### Advantage:
- **Smooth Gradient:** The sigmoid activation function provides a smooth gradient, which helps prevent abrupt changes or "jumps" in output values during training. This smoothness aids in stable and consistent learning in neural networks[1][2].
- **Output Range:** Output values of the sigmoid function are bound between 0 and 1, making it suitable for models where predictions need to be in the range of probabilities or percentages. This property is particularly useful in binary classification tasks where the output represents the probability of a class[1][2].
- **Non-Linearity:** Sigmoid is a non-linear activation function, allowing neural networks to learn and model complex, non-linear relationships in the data. This non-linearity is crucial for capturing intricate patterns and structures in the input data[2].

#### Disadvantage:
- **Vanishing Gradient:** One of the main drawbacks of the sigmoid function is the issue of vanishing gradients. As the output approaches the extremes (0 or 1), the gradient of the sigmoid function becomes very small, leading to the vanishing gradient problem. This can hinder the training of deep neural networks, especially in cases where gradients become close to zero, slowing down learning or causing convergence issues[1][2].
- **Saturated Neurons:** Sigmoid neurons can saturate, meaning that for very high or very low input values, the neuron's output becomes flat, resulting in gradients close to zero. This saturation can lead to the problem of "dead neurons," where neurons stop learning effectively due to minimal gradient updates[1].

In summary, while the sigmoid activation function offers advantages like smooth gradients, bounded outputs, and non-linearity, it also comes with challenges such as vanishing gradients and neuron saturation. Understanding these pros and cons is crucial for selecting the appropriate activation function based on the specific requirements and characteristics of the neural network being developed.

Citations:

[1] https://iq.opengenus.org/sigmoid-logistic-activation/

[4] https://insideaiml.com/blog/Sigmoid-Activation-Function-1031

In [None]:
import  numpy as np
import matplotlib.pyplot as plt

# Sigmoid Activation Function
def sigmoid(x):
  return 1/(1 + np.exp(-x))

# Derivative of sigmoid
def der_sigmoid(x):
  return sigmoid(x) * (1 - sigmoid(x))

# Generative data to plot
x_data = np.linspace(-10, 10, 100)
y_data = sigmoid(x_data)
dy_data = der_sigmoid(x_data)

# Plotting
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('Sigmoid Activation Function & Derivative')
plt.legend(['sigmoid', 'der_sigmoid'])
plt.grid()
plt.show()

# Hyperbolic Tangent (tanh) Activation Function & Derivative

### Advantage and Disadvantage of Hyperbolic Tangent (tanh) Activation Function

#### Advantage:
- **Range:** The tanh activation function maps input values to the range of $$(-1, 1)$$, providing a wider output range compared to the sigmoid function, which ranges from 0 to 1. This broader range can help in capturing a wider variety of patterns and features in the data[1].
- **Zero-Centered:** Tanh is a zero-centered activation function, meaning that its output has a mean value of 0. This property can aid in faster convergence during training and help mitigate issues like vanishing gradients, especially in deep neural networks[1].
- **Non-Linearity:** Like the sigmoid function, tanh is a non-linear activation function, allowing neural networks to model complex, non-linear relationships in the data. This non-linearity is essential for capturing intricate patterns and structures in the input data[1].

#### Disadvantage:
- **Vanishing Gradient:** One of the main drawbacks of the tanh activation function is the potential for the vanishing gradient problem. As the output approaches the extremes (-1 or 1), the gradient of the tanh function becomes very small, leading to challenges in learning deep neural networks and slowing down convergence[1].
- **Saturated Neurons:** Tanh neurons can saturate, causing the neuron's output to become flat for very high or very low input values. This saturation can result in gradients close to zero, leading to the problem of "dead neurons" that stop learning effectively due to minimal gradient updates[1].

In summary, the tanh activation function offers advantages such as a wider output range, zero-centered output, and non-linearity, but it also comes with challenges like the vanishing gradient problem and neuron saturation. Understanding these pros and cons is crucial for selecting the appropriate activation function based on the specific requirements and characteristics of the neural network being developed.

Citations:

[1] https://www.aitude.com/comparison-of-sigmoid-tanh-and-relu-activation-functions/

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Hyperbolic Tangent (htan) Activation Function
def htan(x):
 return (np.exp(x) - np.exp(-x))/(np.exp(x) + np.exp(-x))
# htan derivative
def der_htan(x):
 return 1 - htan(x) * htan(x)
# Generating data for Graph
x_data = np.linspace(-6,6,100)
y_data = htan(x_data)
dy_data = der_htan(x_data)
# Graph
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('htan Activation Function & Derivative')
plt.legend(['htan','der_htan'])
plt.grid()

# The ReLU (Rectified Linear Unit) activation function

The ReLU (Rectified Linear Unit) activation function has several advantages and disadvantages in deep learning:

### Advantages of ReLU Activation Function:
- **Nonlinearity Introduction:** ReLU introduces nonlinearity to neural networks, aiding in learning complex relationships in data[1].
- **Vanishing Gradient Mitigation:** Helps mitigate the vanishing gradient problem during model training, enabling better convergence of gradient descent[2].
- **Computational Efficiency:** More computationally efficient than functions like Sigmoid, as it only requires a max() function without expensive exponential operations[3].
- **Linear Behavior:** Easier optimization due to its linear or close to linear behavior, making it popular for deep learning models[4].

### Disadvantages of ReLU Activation Function:
- **Zeroing Negative Values:** All negative values immediately become zero, potentially affecting the model's ability to fit or train properly[1].
- **Dying ReLU Problem:** Neurons can become "dead" during training if too many activations fall below zero, hindering learning; this can be mitigated by using Leaky ReLU[3].
- **Not Zero-Centric:** Not zero-centric, giving zero for negative values, which can impact the network's robustness[4].

ReLU's advantages include its simplicity, effectiveness in training deep models, and computational efficiency, while its main drawback lies in the zeroing of negative values, which can lead to the dying ReLU problem. Leaky ReLU is a variant that addresses some of these issues by allowing a small linear component for negative values.

Citations:

[1] https://builtin.com/machine-learning/relu-activation-function

[2] https://www.datasciencecentral.com/deep-learning-advantages-of-relu-over-sigmoid-function-in-deep/

[3] https://artemoppermann.com/activation-functions-in-deep-learning-sigmoid-tanh-relu/

[4] https://www.linkedin.com/pulse/top-10-activation-functions-advantages-disadvantages-dash

[5] https://vidyasheela.com/post/what-are-the-advantages-and-disadvantages-of-relu-activation-function

In [None]:
import  numpy as np
import matplotlib.pyplot as plt

# Rectified Linear Unit (ReLU)
def ReLU(x):
 data = [max(0,value) for value in x]
 return np.array(data, dtype=float)

# Derivative for ReLU
def der_ReLU(x):
 data = [1 if value>0 else 0 for value in x]
 return np.array(data, dtype=float)


# Generative data to plot
x_data = np.linspace(-10, 10, 100)
y_data = ReLU(x_data)
dy_data = der_ReLU(x_data)

# Plotting
plt.plot(x_data, y_data, x_data, dy_data)
plt.title('ReLU Activation Function & Derivative')
plt.legend(['ReLU','der_ReLU'])
plt.grid()
plt.show()