## **🔵 Activation Functions in Neural Networks**
Activation functions are a crucial part of neural networks. They determine whether a neuron should be activated or not by introducing **non-linearity** into the model.

### **🔹 Why Do We Need Activation Functions?**
A neural network without activation functions is just a **linear model** (like logistic regression). Activation functions help the network **learn complex patterns and relationships** in data.

1. **Introduces Non-Linearity** 🌀  
   - Real-world problems (like image recognition, speech processing) are non-linear.  
   - Without activation functions, neural networks would behave **like a simple linear function**, limiting their power.
   
2. **Helps Backpropagation** 🔄  
   - Activation functions introduce gradients, which help in optimizing the model using **gradient descent**.



## **🔵 Types of Activation Functions**
There are **three major types** of activation functions:

1. **Linear Activation**
2. **Non-Linear Activations**
   - **Sigmoid**
   - **Tanh**
   - **ReLU (Rectified Linear Unit)**
   - **Leaky ReLU & Parametric ReLU**
   - **ELU (Exponential Linear Unit)**
3. **Softmax Activation (for classification tasks)**



## **1️⃣ Linear Activation Function**
🔸 **Equation**:  
$$
f(x) = ax + b
$$
🔸 **Graph**: A straight line  
🔸 **Problem**: Cannot capture complex patterns  

📌 **Used in:** Output layers for regression problems.



## **2️⃣ Non-Linear Activation Functions**
### **📌 Sigmoid Activation Function**
🔸 **Equation**:  
$$
f(x) = \frac{1}{1 + e^{-x}}
$$

🔹 **Pros**:
✔ Smooth, differentiable.  
✔ Output is **always between 0 and 1** (great for probability predictions).

🔹 **Cons**:
❌ Causes **vanishing gradient** problem (small gradients slow down learning).  
❌ Not zero-centered (output is always positive).  

📌 **Used in:** Binary classification (output layer).  



### **📌 Tanh (Hyperbolic Tangent) Activation Function**
🔸 **Equation**:  
$$
f(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$

🔹 **Pros**:  
✔ Outputs range between **-1 and 1** (zero-centered).  
✔ Works better than sigmoid for deep networks.

🔹 **Cons**:  
❌ Also suffers from **vanishing gradient problem**.

📌 **Used in:** Hidden layers in some RNNs.



### **📌 ReLU (Rectified Linear Unit)**
🔸 **Equation**:  
$$
f(x) = \max(0, x)
$$

🔹 **Pros**:  
✔ Does **not** suffer from vanishing gradients.  
✔ Computationally efficient.

🔹 **Cons**:  
❌ Can cause **dying ReLU problem** (neurons stuck at 0).  

📌 **Used in:** Most modern deep learning architectures (CNNs, RNNs, Transformers).  



### **📌 Leaky ReLU (Fixes Dying ReLU)**
🔸 **Equation**:  
$$
f(x) =
\begin{cases} 
x, & x > 0 \\
0.01x, & x \leq 0
\end{cases}
$$

🔹 **Pros**:  
✔ Solves **dying ReLU problem** by allowing small negative gradients.  

📌 **Used in:** Deep neural networks.



### **📌 ELU (Exponential Linear Unit)**
🔸 **Equation**:  
$$
f(x) =
\begin{cases} 
x, & x > 0 \\
\alpha (e^x - 1), & x \leq 0
\end{cases}
$$

🔹 **Pros**:  
✔ Solves **dying ReLU problem**  
✔ More stable training  

📌 **Used in:** Deep networks with **complex architectures**.



## **3️⃣ Softmax Activation Function (For Multi-Class Classification)**
🔸 **Equation**:  
$$
\sigma(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}}
$$

🔹 **Pros**:  
✔ Converts outputs into **probabilities** (sum = 1).  
✔ Helps in multi-class classification.

📌 **Used in:** Output layer of **multi-class classification** models.

## **🎯 Summary Table**
| Activation | Used In | Pros | Cons |
|------------|--------|------|------|
| **Linear** | Regression | Simple | Cannot capture complexity |
| **Sigmoid** | Binary Classification | Probability output | Vanishing gradient |
| **Tanh** | RNNs | Zero-centered | Vanishing gradient |
| **ReLU** | CNNs, Deep Learning | No vanishing gradient | Dying neurons |
| **Leaky ReLU** | Deep Learning | Solves dying ReLU | Small overhead |
| **ELU** | Complex Networks | Better than ReLU | More computation |
| **Softmax** | Multi-class Classification | Probability output | Can be slow |



## **🔎 Final Thoughts**
- **ReLU** is the most widely used for hidden layers.  
- **Softmax** is used in **classification** problems.  
- **Tanh** and **Sigmoid** are rarely used in deep networks today.  

![](images/sigmoid.png)

![](images/tanh.png)

![](images/relu.png)

![](images/elu.png)




---

### **🔹 What Are Activation Functions? (Super Simple Explanation)**  
Think of a **neural network** as a factory processing raw materials (inputs) into useful products (outputs). But before making the final product, we need a **decision-making system** to decide whether a particular part should be **used or discarded**.  

That **decision-making system** is the **activation function**! 🚀  



### **🔹 Why Do We Need Activation Functions?**  
Without activation functions, a neural network is just **a boring calculator doing linear math** 📉. It won't be able to recognize complex patterns like **faces, voices, or handwritten digits**.  

Activation functions help the network **learn and make smart decisions** by introducing **non-linearity** 🔄.



### **🔹 Types of Activation Functions (Like Different Switches)**
1. **Sigmoid → Soft Switch** 🔘   
   - Example: A robot deciding whether a glass is half-full (probabilities).  
   - Output is between **0 and 1** (good for probability-based decisions).  
   - **Problem**: It reacts very **slowly** to big changes (vanishing gradient issue).

2. **Tanh → Stronger Soft Switch** 🌗  
   - Example: A sensor measuring **temperature changes** from cold (-1) to hot (+1).  
   - Works better than sigmoid but still has **slow learning issues**.

3. **ReLU → Simple On/Off Switch** ⚡  
   - Example: If a signal is positive, **turn it ON**; otherwise, keep it OFF.  
   - Used **most commonly** in deep learning because it’s fast! 🚀  
   - **Problem**: Some neurons get permanently stuck OFF (called **dying ReLU**).

4. **Leaky ReLU → Improved On/Off Switch** ⚡💡  
   - Example: If a signal is **negative**, don’t completely turn it off—just let a **small current flow**.  
   - Fixes **dying ReLU problem**.

5. **Softmax → Decision Maker for Many Options** 🎯  
   - Example: If you need to **choose one out of 10 items**, it helps pick the most probable one.  
   - Used in the **last layer of classification models**.


### **🔹 Simple Summary in Layman Terms**
| Activation Function | Like a... | Used For | Problem Fixed |
|----------------------|-----------|----------|---------------|
| **Sigmoid** | Dimmer Switch (0 to 1) | Probability Predictions | Slow learning |
| **Tanh** | Thermometer (-1 to +1) | Sensor-based predictions | Slow learning |
| **ReLU** | Light Switch (ON/OFF) | Hidden layers in deep learning | Some neurons die |
| **Leaky ReLU** | Improved Light Switch | Same as ReLU but better | No dead neurons |
| **Softmax** | Voting System | Choosing **one** from **many** | Works only at output |


### **🔹 Takeaway**  
- **Use ReLU** for hidden layers (fast & efficient).  
- **Use Softmax** in the output layer for multi-class problems.  
- **Use Sigmoid/Tanh** for simpler problems (but not deep networks).  

---

Yes! Let's go step by step and manually calculate each activation function for a given input. We'll take $ x = -2, -1, 0, 1, 2$ and compute the values manually.



### 1. **Sigmoid Function**
$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

For each $ x$:

| $ x$  | $ e^{-x}$ | $ 1 + e^{-x}$ | $ \sigma(x)$ |
|----------|-------------|----------------|-------------|
| -2       | $ e^2 \approx 7.389$ | $ 8.389$ | $ \frac{1}{8.389} \approx 0.119$ |
| -1       | $ e^1 \approx 2.718$ | $ 3.718$ | $ \frac{1}{3.718} \approx 0.269$ |
| 0        | $ e^0 = 1$ | $ 2$ | $ \frac{1}{2} = 0.5$ |
| 1        | $ e^{-1} \approx 0.368$ | $ 1.368$ | $ \frac{1}{1.368} \approx 0.731$ |
| 2        | $ e^{-2} \approx 0.135$ | $ 1.135$ | $ \frac{1}{1.135} \approx 0.881$ |



### 2. **Tanh Function**
$$
\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}
$$

For each $ x$:

| $ x$  | $ e^x$ | $ e^{-x}$ | $ e^x - e^{-x}$ | $ e^x + e^{-x}$ | $ \tanh(x)$ |
|----------|-------------|-------------|-------------|-------------|-------------|
| -2       | $ e^{-2} \approx 0.135$ | $ e^2 \approx 7.389$ | $ -7.254$ | $ 7.524$ | $ \frac{-7.254}{7.524} \approx -0.964$ |
| -1       | $ e^{-1} \approx 0.368$ | $ e^1 \approx 2.718$ | $ -2.350$ | $ 3.086$ | $ \frac{-2.350}{3.086} \approx -0.761$ |
| 0        | $ e^0 = 1$ | $ e^0 = 1$ | $ 0$ | $ 2$ | $ 0$ |
| 1        | $ e^1 \approx 2.718$ | $ e^{-1} \approx 0.368$ | $ 2.350$ | $ 3.086$ | $ \frac{2.350}{3.086} \approx 0.761$ |
| 2        | $ e^2 \approx 7.389$ | $ e^{-2} \approx 0.135$ | $ 7.254$ | $ 7.524$ | $ \frac{7.254}{7.524} \approx 0.964$ |



### 3. **ReLU Function**
$$
ReLU(x) = \max(0, x)
$$

| $ x$  | $ ReLU(x)$ |
|----------|-------------|
| -2       | 0 |
| -1       | 0 |
| 0        | 0 |
| 1        | 1 |
| 2        | 2 |



### 4. **Leaky ReLU Function** ($ \alpha = 0.01$)
$$
LeakyReLU(x) = \begin{cases} 
x, & x > 0 \\
\alpha x, & x \leq 0
\end{cases}
$$

| $ x$  | $ LeakyReLU(x)$ |
|----------|----------------|
| -2       | $ -2 \times 0.01 = -0.02$ |
| -1       | $ -1 \times 0.01 = -0.01$ |
| 0        | $ 0$ |
| 1        | $ 1$ |
| 2        | $ 2$ |



### 5. **ELU Function** ($ \alpha = 1$)
$$
ELU(x) = \begin{cases} 
x, & x > 0 \\
\alpha (e^x - 1), & x \leq 0
\end{cases}
$$

| $ x$  | $ e^x - 1$ | $ ELU(x)$ |
|----------|-------------|-------------|
| -2       | $ e^{-2} - 1 \approx -0.865$ | $ -0.865$ |
| -1       | $ e^{-1} - 1 \approx -0.632$ | $ -0.632$ |
| 0        | $ e^0 - 1 = 0$ | $ 0$ |
| 1        | $ 1$ | $ 1$ |
| 2        | $ 2$ | $ 2$ |



### 6. **Softmax Function** (for $ x = [-2, -1, 0, 1, 2]$)
$$
Softmax(x_i) = \frac{e^{x_i}}{\sum e^{x}}
$$

#### Step 1: Compute $ e^x$

$$
e^{-2} \approx 0.135, \quad e^{-1} \approx 0.368, \quad e^0 = 1, \quad e^1 \approx 2.718, \quad e^2 \approx 7.389
$$

#### Step 2: Compute sum of exponentials

$$
0.135 + 0.368 + 1 + 2.718 + 7.389 = 11.61
$$

#### Step 3: Compute Softmax values

| $ x$  | $ e^x$ | Softmax(x) |
|----------|-------------|-------------|
| -2       | 0.135 | $ \frac{0.135}{11.61} \approx 0.012$ |
| -1       | 0.368 | $ \frac{0.368}{11.61} \approx 0.032$ |
| 0        | 1 | $ \frac{1}{11.61} \approx 0.086$ |
| 1        | 2.718 | $ \frac{2.718}{11.61} \approx 0.234$ |
| 2        | 7.389 | $ \frac{7.389}{11.61} \approx 0.636$ |



### Summary of Manual Calculations

- **Sigmoid** smoothly maps values between (0,1).
- **Tanh** maps values between (-1,1).
- **ReLU** sets negative values to 0.
- **Leaky ReLU** allows small negative values.
- **ELU** is similar to Leaky ReLU but smooth.
- **Softmax** normalizes values into probabilities.

---