## What is an Activation Function?

An activation function in a neural network introduces non-linearity into the model. This non-linearity allows the network to learn and represent more complex patterns in the data. Activation functions take the weighted sum of inputs and biases and apply a mathematical transformation to produce the output.

## Why are Activation Functions Used in CNNs?

**1. Introduce Non-Linearity:**
Real-world data is often non-linear. Activation functions enable neural networks to learn and model these non-linearities.

**2. Feature Learning:**
They help the network to learn complex features and hierarchical patterns in the data.

**3. Gradient Propagation:**
Activation functions aid in the propagation of gradients during backpropagation, which is essential for learning through gradient descent.

## Types of Activation Functions Used in CNNs

**1. ReLU (Rectified Linear Unit):**
    
* Function: $\operatorname{ReLU}(x)=\max (0, x)$
    
* Usage: ReLU is the most commonly used activation function in CNNs.
* Advantages:
    * Computationally efficient (simple thresholding).
    * Helps mitigate the vanishing gradient problem.
* Disadvantages:
    * Can suffer from the "dying ReLU" problem, where neurons can become inactive and stop learning if the inputs are always negative.

**2. Leaky ReLU:** 

* Function: $\begin{cases}0.01 x & \text { if } x<0 \\ x & \text { if } x \geq 0\end{cases}$
* Usage: Addresses the dying ReLU problem by allowing a small, non-zero gradient when the input is negative.
* Advantages:
    * Prevents neurons from becoming inactive.
* Disadvantages:
    * Still linear for positive values, which might not capture complex patterns as effectively as other non-linear functions

**3. Parametric ReLU (PReLU):**

* Function: $\begin{cases}\alpha x & \text { if } x<0 \\ x & \text { if } x \geq 0\end{cases}$
    Here, α is a learnable parameter.
* Usage: Similar to Leaky ReLU but with a trainable parameter for negative values.
* Advantages:
    * Allows the model to learn the optimal value of α.
* Disadvantages:
    * Adds additional parameters to the model, increasing complexity.

**4. Sigmoid:**

* Function: $\sigma(x)=\frac{1}{1+e^{-x}}$
* Usage:Previously common in early neural networks; now less frequent in hidden layers of CNNs but used in output layers for binary classification.
* Advantages:
    * Outputs values in the range (0, 1), useful for probability estimation.
* Disadvantages:
    * Can suffer from vanishing gradients.
    * Computationally expensive compared to ReLU.

**5. Tanh (Hyperbolic Tangent):** 
* Function: $\tanh (x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}$
* Usage: Used in some networks before the popularity of ReLU.
* Advantages:
    * Outputs values in the range (-1, 1), zero-centered which can help with the optimization.
* Disadvantages:
    * Similar to sigmoid, it can suffer from vanishing gradients.

**6. Softmax:**

* Function: $\operatorname{Softmax}\left(x_i\right)=\frac{e^{x_i}}{\sum_{j=1}^N e^{x_j}}$
Converts logits into probabilities.
* Usage: Commonly used in the output layer for multi-class classification tasks.
* Advantages:
    * Provides a probability distribution over classes.
* Disadvantages:
    * Not used in hidden layers due to computational complexity.


## How Activation Functions are Used in CNNs

**1. Applied After Convolution:**
After the convolution operation, the activation function is applied to introduce non-linearity.

**2. Layer-Wise Integration:**
Each convolutional and fully connected layer typically follows a pattern of convolution, activation, and pooling.

**3. Gradient-Based Learning:**
During backpropagation, the gradient of the loss function with respect to the activation function is computed, which helps in adjusting the weights.

## Advantages and Disadvantages of Activation Functions

### Advantages:

**1. Non-Linearity:**
Allows the network to model complex, non-linear relationships in data.

**2. Gradient Propagation:**
Facilitates the propagation of gradients for learning.

**3. Efficiency:**
Functions like ReLU are computationally simple and efficient to compute.

### Disadvantages:

**1. Gradient Issues:**
Some functions, like sigmoid and tanh, can suffer from vanishing gradients, hindering learning.

**2. Dead Neurons:**
ReLU can lead to dead neurons where neurons stop learning entirely if they consistently output zero.

**3. Parameter Complexity:**
Functions like PReLU introduce additional parameters, increasing model complexity.

## What is Non-Linearity in CNN?

Non-linearity in Convolutional Neural Networks (CNNs) refers to the introduction of non-linear transformations to the input data at various stages of the network. This is achieved through the use of activation functions after convolutional layers. Non-linear activation functions enable the network to learn complex patterns and relationships within the data, which cannot be captured by purely linear operations.

### Why is Non-Linearity Important in CNNs?

**1. Complex Pattern Recognition:**
Real-world data is inherently non-linear. Non-linear transformations allow CNNs to model and learn these complex patterns.

**2. Hierarchical Feature Learning:**
Non-linearity enables the network to build hierarchical representations of the input data, with each layer learning increasingly abstract features.

**3. Increased Model Capacity:**
It enhances the expressive power of the network, allowing it to approximate any continuous function given enough layers and neurons.