# 🧠 Sigmoid Activation Function

## 1. Theory

The **Sigmoid Activation Function** is a non-linear function that maps any real-valued number into the range (0,1).  
It is widely used in early neural networks to introduce non-linearity and to model probabilities.

---

### **Mathematical Formula**

$$
\sigma(x) = \frac{1}{1 + e^{-x}}
$$

- For large positive \( x \), \( \sigma(x) \to 1 \)
- For large negative \( x \), \( \sigma(x) \to 0 \)
- For \( x = 0 \), \( \sigma(0) = 0.5 \)

---

### **Derivative**

The derivative of the sigmoid function is:

$$
\sigma'(x) = \sigma(x)(1 - \sigma(x))
$$

- This derivative is **maximum at \( x = 0 \)** and equals 0.25
- For very large or very small \( x \), the derivative approaches 0 (causing vanishing gradients)
The function formula and chart are as follows

![alt](img/sig.svg)

![alt](img/2.png)
---

## 2. Graphical Intuition

The sigmoid curve is S-shaped:


## 3. Advantages and Disadvantages

| **Aspect**          | **Advantages**                                                   | **Disadvantages**                                               |
|---------------------|------------------------------------------------------------------|----------------------------------------------------------------|
| **Range**           | Maps input to (0,1), useful for probability interpretation.      | Saturates at extremes, causing vanishing gradients.            |
| **Smoothness**      | Differentiable everywhere, enabling gradient-based learning.     | Output is not zero-centered, leading to zig-zagging updates.   |
| **Non-linearity**   | Allows networks to learn non-linear decision boundaries.         | Computationally expensive (requires exponentials).             |
| **Interpretability**| Outputs can be treated as probabilities in binary classification.| Not ideal for deep networks (ReLU is preferred).               |

---
Advantages of Sigmoid Function : -

1. Smooth gradient, preventing “jumps” in output values.
2. Output values bound between 0 and 1, normalizing the output of each neuron.
3. Clear predictions, i.e very close to 1 or 0.


Sigmoid has three major disadvantages:
* Prone to gradient vanishing
* Function output is not zero-centered
* Power operations are relatively time consuming
## 4. Use Cases

The sigmoid activation is primarily used in:

1. **Binary Classification**  
   - Output layer of logistic regression  
   - Output neuron in binary classification neural networks

2. **Probabilistic Models**  
   - Models where output represents a probability score

3. **Gating Mechanisms**  
   - Used in gates of LSTM (Long Short-Term Memory) networks (though with modifications)

---

## ✅ Conclusion
The **Sigmoid** activation is simple and interpretable, making it effective for **binary classification** tasks.  
However, it is prone to the **vanishing gradient problem**, so modern deep networks prefer **ReLU** or its variants for hidden layers.

