# 🔹 ReLU, Leaky ReLU, and Parametric ReLU Activation Functions

## 1. Theory

The **Rectified Linear Unit (ReLU)** and its variants (**Leaky ReLU**, **Parametric ReLU**) are widely used activation functions in deep learning.

- **ReLU**: Outputs the input directly if positive, otherwise outputs zero.
- **Leaky ReLU**: Allows a small, non-zero gradient for negative inputs to prevent dying neurons.
- **Parametric ReLU (PReLU)**: Similar to Leaky ReLU but the negative slope is a learnable parameter.

---

### **Mathematical Formulas**

#### ReLU:
$$
f(x) = \max(0, x)
$$

#### Leaky ReLU:
$$
f(x) =
\begin{cases}
x & \text{if } x > 0 \\
\alpha x & \text{if } x \leq 0
\end{cases}
$$

Where \( \alpha \) is a small constant (e.g., 0.01).

#### Parametric ReLU:
$$
f(x) =
\begin{cases}
x & \text{if } x > 0 \\
a x & \text{if } x \leq 0
\end{cases}
$$

Where \( a \) is a **learnable parameter** instead of a fixed constant.

---

### **Derivatives**

- For ReLU:

$$
f'(x) =
\begin{cases}
1 & x > 0 \\
0 & x \leq 0
\end{cases}
$$

- For Leaky ReLU:

$$
f'(x) =
\begin{cases}
1 & x > 0 \\
\alpha & x \leq 0
\end{cases}
$$

- For Parametric ReLU:

$$
f'(x) =
\begin{cases}
1 & x > 0 \\
a & x \leq 0
\end{cases}
$$

---

## 2. Graphical Intuition


---

## 3. Advantages and Disadvantages

| **Activation**     | **Advantages**                                                     | **Disadvantages**                                              |
|---------------------|--------------------------------------------------------------------|----------------------------------------------------------------|
| **ReLU**           | Simple, fast, avoids vanishing gradients for \(x>0\).             | Dying ReLU problem (neurons stuck at 0 for negative inputs).   |
| **Leaky ReLU**     | Solves dying ReLU by allowing small gradients when \(x<0\).       | Slightly more computational cost.                              |
| **Parametric ReLU**| Learns optimal slope for negative inputs during training.         | Adds extra parameters, increasing model complexity.           |

---

## 4. Use Cases

- **ReLU**: Default choice for most deep neural networks (CNNs, MLPs, ResNets).
- **Leaky ReLU**: Preferred when dying ReLU is observed during training.
- **Parametric ReLU**: Used in architectures where adaptive negative slopes improve performance (e.g., very deep CNNs).

---

## 5. Interview Questions and Answers

### **Q1: Why are Leaky ReLU and PReLU better than standard ReLU?**
**Answer:**  
- They prevent the dying ReLU problem by allowing a small gradient for negative inputs.
- PReLU goes further by learning the slope parameter \( a \) during training, improving adaptability.

---

### **Q2: When should you use PReLU over Leaky ReLU?**
**Answer:**  
- Use PReLU when you have sufficient data and computational resources, as it adds extra parameters to learn.
- Use Leaky ReLU when you want a simple, fixed-slope alternative.

---

## ✅ Conclusion
ReLU and its variants are the **standard activations for hidden layers** in modern deep learning.  
Leaky ReLU and PReLU **address the dying neuron issue**, making them more robust in certain scenarios.

