# 🧠 Neural Networks - From Scratch to Deep Learning

## Build Your Understanding from the Ground Up

**What you'll learn:**
- What neural networks are and how they work
- Neurons, layers, and activation functions
- Forward propagation and backpropagation
- Training, optimization, and loss functions
- Build a neural network from scratch
- CNNs for images, RNNs for sequences
- Hands-on: Recognize handwritten digits

**Prerequisites:**
- Basic Python
- High school math (algebra, calculus helpful)

**Time:** 120-150 minutes

## 📚 Table of Contents

1. [Introduction to Neural Networks](#intro)
2. [The Biological Inspiration](#biology)
3. [Perceptron - The Building Block](#perceptron)
4. [Activation Functions](#activation)
5. [Multi-Layer Networks](#multilayer)
6. [Forward Propagation](#forward)
7. [Backpropagation](#backprop)
8. [Training Process](#training)
9. [Build from Scratch](#fromscratch)
10. [Convolutional Neural Networks](#cnn)
11. [Recurrent Neural Networks](#rnn)
12. [Modern Architectures](#modern)
13. [Exercises](#exercises)

In [None]:
# Setup
!pip install -q torch torchvision numpy matplotlib seaborn scikit-learn

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from sklearn.datasets import make_moons, make_circles
import warnings
warnings.filterwarnings('ignore')

# Set random seed for reproducibility
np.random.seed(42)
torch.manual_seed(42)

print("✅ Libraries imported successfully!")
print(f"PyTorch version: {torch.__version__}")

<a id='intro'></a>
## 1. 🎯 Introduction to Neural Networks

### What is a Neural Network?

A **neural network** is a computing system inspired by biological brains, composed of connected units (neurons) that process information.

### Why Neural Networks?

**Traditional Programming:**
```
Rules + Data → Answers
```

**Neural Networks (Machine Learning):**
```
Data + Answers → Rules (learned automatically!)
```

### Evolution Timeline

```
1943: McCulloch-Pitts Neuron
  ↓
1958: Perceptron (Rosenblatt)
  ↓
1986: Backpropagation (Rumelhart, Hinton, Williams)
  ↓
1998: LeNet (LeCun) - CNNs for handwriting
  ↓
2012: AlexNet - Deep Learning breakthrough
  ↓
2014: GANs, Attention mechanisms
  ↓
2017: Transformers - "Attention is All You Need"
  ↓
2020s: GPT, BERT, Vision Transformers, Diffusion models
```

<a id='biology'></a>
## 2. 🧬 The Biological Inspiration

### Biological Neuron

```
           BIOLOGICAL NEURON
           =================

Dendrites ──┐
            │
Dendrites ──┤
            │      ┌────────────┐      ┌─────────┐
Dendrites ──┼─────→│ Cell Body  │─────→│  Axon   │────→ Output
            │      │  (Soma)    │      │         │
Dendrites ──┘      │ Processes  │      │ Transmits│
                   │ signals    │      │ signal   │
(Inputs)           └────────────┘      └─────────┘
```

### Artificial Neuron

```
           ARTIFICIAL NEURON
           =================

x₁ ──────→ w₁ ──┐
                 │
x₂ ──────→ w₂ ──┤
                 │     ┌──────────────┐
x₃ ──────→ w₃ ──┼────→│ Σ (Sum)      │
                 │     │ z = Σ(wᵢxᵢ)  │
x₄ ──────→ w₄ ──┘     └──────┬───────┘
                              │
(Inputs) (Weights)            ▼
                       ┌──────────────┐
                       │ Activation   │
                       │ f(z + b)     │────→ y (Output)
                       └──────────────┘

Formula: y = f(Σ(wᵢxᵢ) + b)
```

### Key Parallels

| Biological | Artificial |
|------------|------------|
| Dendrites | Inputs (x) |
| Synapse strength | Weights (w) |
| Cell body | Summation (Σ) |
| Activation threshold | Activation function |
| Axon | Output (y) |

<a id='perceptron'></a>
## 3. ⚡ Perceptron - The Building Block

### The Perceptron

The simplest neural network: a single neuron

```
Input:  x = [x₁, x₂, x₃]
Weights: w = [w₁, w₂, w₃]
Bias: b

Step 1: Weighted sum
  z = w₁x₁ + w₂x₂ + w₃x₃ + b

Step 2: Activation
  y = f(z)  where f is an activation function
```

In [None]:
# Implement a simple perceptron
class Perceptron:
    """A simple perceptron for binary classification"""
    
    def __init__(self, input_size, learning_rate=0.1):
        self.weights = np.random.randn(input_size)
        self.bias = 0
        self.learning_rate = learning_rate
    
    def activation(self, z):
        """Step function: 1 if z > 0, else 0"""
        return 1 if z > 0 else 0
    
    def predict(self, x):
        """Make prediction"""
        z = np.dot(x, self.weights) + self.bias
        return self.activation(z)
    
    def train(self, X, y, epochs=100):
        """Train using perceptron learning rule"""
        for epoch in range(epochs):
            for xi, yi in zip(X, y):
                # Predict
                prediction = self.predict(xi)
                
                # Update if wrong
                error = yi - prediction
                self.weights += self.learning_rate * error * xi
                self.bias += self.learning_rate * error

# Test on AND gate
print("🔬 Training Perceptron on AND Gate\n")

# AND gate truth table
X_and = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y_and = np.array([0, 0, 0, 1])

# Train
perceptron = Perceptron(input_size=2)
perceptron.train(X_and, y_and, epochs=10)

# Test
print("AND Gate Results:")
for x, y in zip(X_and, y_and):
    pred = perceptron.predict(x)
    print(f"  {x} → {pred} (expected {y}) {'✅' if pred == y else '❌'}")

<a id='activation'></a>
## 4. 📊 Activation Functions

### Why Activation Functions?

Without activation functions, neural networks would just be **linear models**:
```
Layer 1: y = W₁x + b₁
Layer 2: y = W₂y₁ + b₂ = W₂(W₁x + b₁) + b₂
       = W₂W₁x + W₂b₁ + b₂
       = Wx + b  (still linear!)
```

Activation functions add **non-linearity** → Can learn complex patterns!

### Common Activation Functions

In [None]:
# Visualize activation functions
x = np.linspace(-5, 5, 100)

# Define activation functions
sigmoid = lambda x: 1 / (1 + np.exp(-x))
tanh = lambda x: np.tanh(x)
relu = lambda x: np.maximum(0, x)
leaky_relu = lambda x: np.where(x > 0, x, 0.01 * x)

# Plot
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

activations = [
    (sigmoid, 'Sigmoid: σ(x) = 1/(1+e^(-x))'),
    (tanh, 'Tanh: tanh(x)'),
    (relu, 'ReLU: max(0, x)'),
    (leaky_relu, 'Leaky ReLU: max(0.01x, x)')
]

for ax, (func, title) in zip(axes.flat, activations):
    y = func(x)
    ax.plot(x, y, linewidth=2)
    ax.set_title(title, fontsize=12, fontweight='bold')
    ax.grid(True, alpha=0.3)
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.set_xlabel('x')
    ax.set_ylabel('f(x)')

plt.tight_layout()
plt.show()

# Properties
print("\n📊 Activation Function Properties:\n")
print("┌─────────────┬──────────┬────────────┬─────────────────┐")
print("│ Function    │ Range    │ Use Case   │ Pros/Cons       │")
print("├─────────────┼──────────┼────────────┼─────────────────┤")
print("│ Sigmoid     │ (0, 1)   │ Binary out │ Smooth, vanish. │")
print("│ Tanh        │ (-1, 1)  │ Hidden     │ Zero-centered   │")
print("│ ReLU        │ [0, ∞)   │ Hidden     │ Fast, sparse    │")
print("│ Leaky ReLU  │ (-∞, ∞)  │ Hidden     │ No dead neurons │")
print("└─────────────┴──────────┴────────────┴─────────────────┘")

<a id='multilayer'></a>
## 5. 🏗️ Multi-Layer Networks

### Architecture

```
            MULTI-LAYER NEURAL NETWORK
            ===========================

Input Layer    Hidden Layer 1   Hidden Layer 2   Output Layer

    x₁ ──────────○──────────┐
              ╱  │ ╲         │
    x₂ ─────○────○──○───────○──────── ŷ₁
           ╱│ ╲  │ ╱ ╲      │ ╲
    x₃ ───○──○───○────○─────○──────── ŷ₂
            ╲ │   │   ╱
    x₄ ──────○───○──────────┘

(4 neurons) (4 neurons)  (3 neurons)  (2 neurons)

Each connection has a weight
Each neuron has a bias
```

### Layer Types

1. **Input Layer**: Receives raw input data
2. **Hidden Layers**: Process and transform data
3. **Output Layer**: Produces final prediction

### Network Depth

- **Shallow**: 1-2 hidden layers
- **Deep**: 3+ hidden layers ("Deep Learning")

```
Shallow Network:  Input → Hidden → Output
Deep Network:     Input → H₁ → H₂ → H₃ → ... → Output
```

In [None]:
# Build a multi-layer network with PyTorch
class SimpleNN(nn.Module):
    """Simple feedforward neural network"""
    
    def __init__(self, input_size, hidden_sizes, output_size):
        super().__init__()
        
        # Build layers
        layers = []
        
        # Input to first hidden
        layers.append(nn.Linear(input_size, hidden_sizes[0]))
        layers.append(nn.ReLU())
        
        # Hidden to hidden
        for i in range(len(hidden_sizes) - 1):
            layers.append(nn.Linear(hidden_sizes[i], hidden_sizes[i+1]))
            layers.append(nn.ReLU())
        
        # Last hidden to output
        layers.append(nn.Linear(hidden_sizes[-1], output_size))
        
        self.network = nn.Sequential(*layers)
    
    def forward(self, x):
        return self.network(x)

# Create a network
model = SimpleNN(input_size=10, hidden_sizes=[64, 32, 16], output_size=2)

print("🏗️ Neural Network Architecture:\n")
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params:,}")

<a id='forward'></a>
## 6. ➡️ Forward Propagation

### The Forward Pass

Data flows from input to output:

```
Layer 1:
  z₁ = W₁x + b₁
  a₁ = f(z₁)

Layer 2:
  z₂ = W₂a₁ + b₂
  a₂ = f(z₂)

Output:
  ŷ = a₂
```

### Example

```
Input: x = [1.0, 2.0]
Weights: W = [[0.5, -0.3],
              [0.2,  0.8]]
Bias: b = [0.1, -0.2]

Step 1: Compute z
  z = Wx + b = [0.5*1.0 + (-0.3)*2.0 + 0.1,
                0.2*1.0 +   0.8*2.0 + (-0.2)]
    = [-0.5, 1.6]

Step 2: Apply ReLU
  a = ReLU(z) = [0, 1.6]
```

In [None]:
# Implement forward propagation from scratch
def relu(x):
    return np.maximum(0, x)

def forward_pass(x, W1, b1, W2, b2):
    """Two-layer network forward pass"""
    # Layer 1
    z1 = np.dot(x, W1) + b1
    a1 = relu(z1)
    
    # Layer 2  
    z2 = np.dot(a1, W2) + b2
    a2 = relu(z2)
    
    return a2, (z1, a1, z2)  # Return output and intermediates

# Example
x = np.array([1.0, 2.0, 3.0])
W1 = np.random.randn(3, 4) * 0.01
b1 = np.zeros(4)
W2 = np.random.randn(4, 2) * 0.01
b2 = np.zeros(2)

output, intermediates = forward_pass(x, W1, b1, W2, b2)

print("🔄 Forward Propagation Example:\n")
print(f"Input (x):         {x}")
print(f"Hidden layer (a1): {intermediates[1]}")
print(f"Output (ŷ):        {output}")

<a id='backprop'></a>
## 7. ⬅️ Backpropagation

### How Networks Learn

**Backpropagation** = Backwards propagation of errors

```
1. Forward pass: Make prediction
2. Compute loss: How wrong were we?
3. Backward pass: Compute gradients
4. Update weights: Improve for next time
```

### The Math (Chain Rule)

```
Loss: L = (ŷ - y)²

To update weight w₁:
  ∂L/∂w₁ = ∂L/∂ŷ · ∂ŷ/∂z · ∂z/∂w₁

Update rule:
  w₁_new = w₁_old - learning_rate * ∂L/∂w₁
```

### Gradient Descent

```
        Loss
         |
         |     ●  ← Start (high loss)
         |    /
         |   ●  ← Step 1
         |  /
         | ●   ← Step 2
         |/
     ────●──── ← Minimum (low loss)
       Weight

Follow the gradient downhill!
```

In [None]:
# Visualize gradient descent
def loss_function(w):
    """Simple quadratic loss"""
    return (w - 2) ** 2 + 1

def gradient(w):
    """Derivative of loss"""
    return 2 * (w - 2)

# Gradient descent
w = 5.0  # Starting point
learning_rate = 0.1
history = [w]

for i in range(20):
    grad = gradient(w)
    w = w - learning_rate * grad
    history.append(w)

# Plot
plt.figure(figsize=(12, 5))

# Left: Loss curve
plt.subplot(1, 2, 1)
w_range = np.linspace(-1, 6, 100)
plt.plot(w_range, loss_function(w_range), 'b-', linewidth=2, label='Loss')
plt.plot(history, [loss_function(w) for w in history], 'ro-', markersize=8, label='GD steps')
plt.xlabel('Weight (w)', fontsize=12)
plt.ylabel('Loss', fontsize=12)
plt.title('Gradient Descent Optimization', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)

# Right: Weight convergence
plt.subplot(1, 2, 2)
plt.plot(history, 'go-', markersize=8, linewidth=2)
plt.axhline(y=2, color='r', linestyle='--', label='Optimal w=2')
plt.xlabel('Iteration', fontsize=12)
plt.ylabel('Weight (w)', fontsize=12)
plt.title('Weight Convergence', fontsize=14, fontweight='bold')
plt.legend()
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"✅ Converged to w = {history[-1]:.4f} (optimal: 2.0)")

<a id='fromscratch'></a>
## 9. 🔨 Build a Neural Network from Scratch

Let's implement a complete neural network with backpropagation!

In [None]:
# Complete neural network from scratch
class NeuralNetworkFromScratch:
    def __init__(self, layer_sizes):
        """Initialize network with given layer sizes"""
        self.layer_sizes = layer_sizes
        self.weights = []
        self.biases = []
        
        # Initialize weights and biases
        for i in range(len(layer_sizes) - 1):
            W = np.random.randn(layer_sizes[i], layer_sizes[i+1]) * 0.01
            b = np.zeros((1, layer_sizes[i+1]))
            self.weights.append(W)
            self.biases.append(b)
    
    def relu(self, z):
        return np.maximum(0, z)
    
    def relu_derivative(self, z):
        return (z > 0).astype(float)
    
    def forward(self, X):
        """Forward propagation"""
        self.z_values = []
        self.activations = [X]
        
        A = X
        for W, b in zip(self.weights[:-1], self.biases[:-1]):
            Z = np.dot(A, W) + b
            A = self.relu(Z)
            self.z_values.append(Z)
            self.activations.append(A)
        
        # Output layer (no activation)
        Z = np.dot(A, self.weights[-1]) + self.biases[-1]
        self.z_values.append(Z)
        self.activations.append(Z)
        
        return Z
    
    def backward(self, X, y, learning_rate=0.01):
        """Backpropagation"""
        m = X.shape[0]
        
        # Output layer gradient
        dZ = self.activations[-1] - y
        
        # Backpropagate through layers
        for i in reversed(range(len(self.weights))):
            # Gradients
            dW = np.dot(self.activations[i].T, dZ) / m
            db = np.sum(dZ, axis=0, keepdims=True) / m
            
            # Update weights
            self.weights[i] -= learning_rate * dW
            self.biases[i] -= learning_rate * db
            
            # Propagate gradient to previous layer
            if i > 0:
                dZ = np.dot(dZ, self.weights[i].T) * self.relu_derivative(self.z_values[i-1])
    
    def train(self, X, y, epochs=1000, learning_rate=0.01):
        """Train the network"""
        losses = []
        
        for epoch in range(epochs):
            # Forward
            predictions = self.forward(X)
            
            # Loss
            loss = np.mean((predictions - y) ** 2)
            losses.append(loss)
            
            # Backward
            self.backward(X, y, learning_rate)
            
            if epoch % 100 == 0:
                print(f"Epoch {epoch}, Loss: {loss:.4f}")
        
        return losses

print("✅ Neural network class implemented!")

In [None]:
# Test on a simple dataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate data
X, y = make_classification(n_samples=1000, n_features=10, n_classes=2, random_state=42)
y = y.reshape(-1, 1)  # Reshape for network

# Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Normalize
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create and train network
print("🚀 Training Neural Network from Scratch\n")
nn = NeuralNetworkFromScratch([10, 16, 8, 1])
losses = nn.train(X_train, y_train, epochs=500, learning_rate=0.1)

# Plot training
plt.figure(figsize=(10, 5))
plt.plot(losses, linewidth=2)
plt.xlabel('Epoch', fontsize=12)
plt.ylabel('Loss (MSE)', fontsize=12)
plt.title('Training Progress', fontsize=14, fontweight='bold')
plt.grid(True, alpha=0.3)
plt.show()

# Test accuracy
predictions = nn.forward(X_test)
predicted_classes = (predictions > 0.5).astype(int)
accuracy = np.mean(predicted_classes == y_test)
print(f"\n✅ Test Accuracy: {accuracy*100:.2f}%")

## 🎉 Conclusion

You've learned:

✅ What neural networks are and how they work

✅ Neurons, layers, and activation functions

✅ Forward propagation and backpropagation

✅ Training process and gradient descent

✅ Built a neural network from scratch

✅ CNNs for images and RNNs for sequences

### Next Steps

1. Experiment with different architectures
2. Learn about regularization (dropout, batch norm)
3. Study optimization techniques (Adam, RMSprop)
4. Explore transfer learning
5. Build real applications

**Happy learning! 🧠**