# ⚡ Activation Functions - Adding Non-Linearity

Welcome to the third notebook! We've learned about neurons and how they compute weighted sums. Now we're going to add the **secret ingredient** that makes neural networks powerful: **activation functions**!

## 🎯 What You'll Learn

By the end of this notebook, you'll understand:
- **Why** we need activation functions (the non-linearity problem)
- The most important activation functions and how they work
- How to implement each one from scratch
- When to use which activation function
- How activation functions enable neural networks to learn complex patterns

**Prerequisites:** Notebooks 1 and 2, basic understanding of neurons and weights.

In [None]:
# Import our tools
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec

# For nice plots
plt.style.use('seaborn-v0_8-darkgrid')
np.random.seed(42)

---
## 🤔 The Problem: Why We Need Activation Functions

Let's start with a fundamental question: **Why can't we just stack neurons without activation functions?**

### The Linear Limitation

Remember, a neuron without activation is just:
```
output = w₁·x₁ + w₂·x₂ + ... + b
```

This is a **linear function**. It creates straight lines (or flat planes in higher dimensions).

### What Happens When We Stack Linear Functions?

In [None]:
# Let's see what happens when we stack linear transformations

# Input
x = 5.0

# Layer 1: Linear transformation
w1 = 2.0
b1 = 3.0
h1 = w1 * x + b1  # h1 = 2*5 + 3 = 13

# Layer 2: Another linear transformation
w2 = 1.5
b2 = -2.0
output = w2 * h1 + b2  # output = 1.5*13 - 2 = 17.5

print("Two-layer network (no activation):")
print(f"  Input: x = {x}")
print(f"  Layer 1: h1 = {w1}·{x} + {b1} = {h1}")
print(f"  Layer 2: output = {w2}·{h1} + {b2} = {output}")

# Now let's collapse this into a single layer
# y = w₂·(w₁·x + b₁) + b₂
# y = (w₂·w₁)·x + (w₂·b₁ + b₂)
w_combined = w2 * w1  # Combined weight
b_combined = w2 * b1 + b2  # Combined bias
output_direct = w_combined * x + b_combined

print("\nEquivalent single-layer network:")
print(f"  output = {w_combined}·{x} + {b_combined} = {output_direct}")

print("\n🔍 Key Insight:")
print(f"  Two layers give same result as one layer!")
print(f"  Both outputs: {output} = {output_direct}")
print(f"\n  💡 Stacking linear functions = Another linear function!")
print(f"  💡 Adding more layers doesn't help without activation functions!")

Two-layer network (no activation):
  Input: x = 5.0
  Layer 1: h1 = 2.0·5.0 + 3.0 = 13.0
  Layer 2: output = 1.5·13.0 + -2.0 = 17.5

Equivalent single-layer network:
  output = 3.0·5.0 + 2.5 = 17.5

🔍 Key Insight:
  Two layers give same result as one layer!
  Both outputs: 17.5 = 17.5

  💡 Stacking linear functions = Another linear function!
  💡 Adding more layers doesn't help without activation functions!


### 🎨 Visualizing the Problem

Let's see this visually with a classification problem:

In [None]:
# Create a non-linear dataset (circle pattern)
np.random.seed(42)
n_points = 200

# Inner circle (class 0)
r_inner = np.random.uniform(0, 1, n_points//2)
theta_inner = np.random.uniform(0, 2*np.pi, n_points//2)
x_inner = r_inner * np.cos(theta_inner)
y_inner = r_inner * np.sin(theta_inner)

# Outer circle (class 1)
r_outer = np.random.uniform(1.5, 2.5, n_points//2)
theta_outer = np.random.uniform(0, 2*np.pi, n_points//2)
x_outer = r_outer * np.cos(theta_outer)
y_outer = r_outer * np.sin(theta_outer)

# Plot
plt.figure(figsize=(10, 5))

plt.subplot(1, 2, 1)
plt.scatter(x_inner, y_inner, c='red', alpha=0.6, s=30, label='Class 0 (inner)', edgecolors='k', linewidth=0.5)
plt.scatter(x_outer, y_outer, c='green', alpha=0.6, s=30, label='Class 1 (outer)', edgecolors='k', linewidth=0.5)
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Non-Linear Data (Concentric Circles)', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.axis('equal')

plt.subplot(1, 2, 2)
plt.scatter(x_inner, y_inner, c='red', alpha=0.6, s=30, edgecolors='k', linewidth=0.5)
plt.scatter(x_outer, y_outer, c='green', alpha=0.6, s=30, edgecolors='k', linewidth=0.5)
# Try to draw a straight line to separate them
x_line = np.linspace(-3, 3, 100)
y_line = 0.5 * x_line + 0.2  # Some random line
plt.plot(x_line, y_line, 'b--', linewidth=3, label='Best linear boundary')
plt.xlabel('Feature 1', fontsize=12)
plt.ylabel('Feature 2', fontsize=12)
plt.title('Linear Boundary FAILS!', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.axis('equal')

plt.tight_layout()
plt.show()

print("❌ Problem: No straight line can separate these circles!")
print("✅ Solution: We need NON-LINEAR decision boundaries!")
print("⚡ How? Add ACTIVATION FUNCTIONS to introduce non-linearity!")

---
## 💡 Analogies: Understanding Activation Functions

### Analogy 1: Light Switch vs Dimmer Switch

**Step Function (Binary Activation)**
- Like a **light switch**: either ON or OFF
- No middle ground
- Simple but too rigid

**Sigmoid/Tanh (Smooth Activation)**
- Like a **dimmer switch**: smoothly adjusts from off to on
- Many values in between
- More flexible!

**ReLU (Rectified Linear)**
- Like a **one-way valve**: lets positive flow through, blocks negative
- Simple and effective
- Most popular today!

### Analogy 2: Signal Processing

Think of activation functions as **filters**:
- **Step**: "Only pass strong signals" (threshold)
- **Sigmoid**: "Squash everything into a range" (normalize)
- **ReLU**: "Keep positive, discard negative" (one-way gate)
- **Tanh**: "Center everything around zero" (balanced output)

---
## 📊 Activation Function #1: Step Function

The **step function** was one of the first activation functions used in neural networks.

### Formula
$$f(x) = \begin{cases} 
1 & \text{if } x \geq 0 \\
0 & \text{if } x < 0
\end{cases}$$

In words: **"If input is positive or zero, output 1. Otherwise, output 0."**

In [None]:
def step_function(x):
    """
    Step function (also called Heaviside function).
    
    Output is binary: 0 or 1
    - If input >= 0: return 1 (neuron 'fires')
    - If input < 0: return 0 (neuron stays quiet)
    
    Args:
        x: Input value or array
    
    Returns:
        Binary output (0 or 1)
    """
    # Use numpy's where: if condition is true, return 1, else return 0
    return np.where(x >= 0, 1, 0)

# Test it
test_values = np.array([-2, -1, 0, 1, 2])
step_outputs = step_function(test_values)

print("Step Function Test:")
for val, out in zip(test_values, step_outputs):
    print(f"  input: {val:>3} → output: {out}")

---
## 🎯 Key Takeaways

Congratulations! You now understand activation functions! Here's what we covered:

### Why We Need Activation Functions
- Stacking linear functions just creates another linear function
- Without activation, deep networks collapse to a single layer
- Activation functions add **non-linearity** to enable learning complex patterns

### The Main Activation Functions

| Function | Formula | Range | Best Use |
|----------|---------|-------|----------|
| **Step** | 0 if x<0, 1 if x≥0 | {0, 1} | Historical only |
| **Sigmoid** | 1/(1+e^(-x)) | (0, 1) | Output layer (binary) |
| **Tanh** | (e^x - e^(-x))/(e^x + e^(-x)) | (-1, 1) | RNNs, hidden layers |
| **ReLU** | max(0, x) | [0, ∞) | **Hidden layers (DEFAULT)** |
| **Leaky ReLU** | max(αx, x) | (-∞, ∞) | Fix dying ReLU |

### Practical Guidelines

**For Hidden Layers:**
1. Start with **ReLU** (default choice)
2. If ReLU fails, try **Leaky ReLU**
3. For RNNs, use **Tanh**

**For Output Layer:**
1. Binary classification → **Sigmoid**
2. Multi-class classification → **Softmax** (covered later)
3. Regression → **No activation** (or ReLU if outputs must be positive)

---

## 🚀 What's Next?

Excellent work! You've now mastered the three fundamental building blocks:

1. ✅ **What neural networks are** (Notebook 1)
2. ✅ **How a single neuron works** (Notebook 2)
3. ✅ **Activation functions for non-linearity** (Notebook 3)

In the next notebook, you'll learn how to combine multiple neurons into **layers** and build complete neural networks!

**Ready to build networks?** → [Continue to Notebook 4: Neural Network Layers](04_neural_network_layer.ipynb)

---

*Great job completing Notebook 3! You're making excellent progress! 🌟*