# Session 3: Deep Learning & CNN Theory - Interactive Notebook

## From Random Forest to Neural Networks

**Duration:** 90 minutes | **Type:** Interactive Theory | **Difficulty:** Intermediate

---

## 🎯 Learning Objectives

By the end of this notebook, you will:

1. ✅ Build a perceptron from scratch using NumPy
2. ✅ Understand and visualize activation functions
3. ✅ Implement forward propagation manually
4. ✅ Apply convolution operations to Sentinel-2 imagery
5. ✅ Explore pre-trained CNN architectures
6. ✅ Visualize learned feature maps
7. ✅ Understand the transition from RF to deep learning

---

## 📋 Notebook Structure

| Part | Topic | Duration |
|------|-------|----------|
| **1** | Build Perceptron from Scratch | 20 min |
| **2** | Activation Functions | 15 min |
| **3** | Simple Neural Network | 20 min |
| **4** | Convolution Operations | 20 min |
| **5** | CNN Architecture Exploration | 15 min |

---

## 🔑 Key Concepts Preview

**What you already know (from Sessions 1-2):**
- Random Forest classification
- Feature engineering (GLCM, NDVI, temporal)
- Accuracy assessment
- Palawan land cover mapping

**What you'll learn today:**
- How neural networks learn from data
- Why convolution is perfect for images
- How CNNs build feature hierarchies
- When to use CNNs vs Random Forest

---

Let's dive in! 🚀


---

# Setup and Imports

First, let's import the libraries we'll need.


In [None]:
# Core libraries
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import ndimage, signal
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)
plt.rcParams['font.size'] = 10

# For reproducibility
np.random.seed(42)

print("✓ Libraries imported successfully")
print(f"NumPy version: {np.__version__}")
print(f"Matplotlib version: {plt.matplotlib.__version__}")

---

# Part 1: Build a Perceptron from Scratch (20 minutes)

## What is a Perceptron?

A **perceptron** is the simplest artificial neuron. It:
1. Takes multiple inputs (x₁, x₂, ..., xₙ)
2. Multiplies each by a weight (w₁, w₂, ..., wₙ)
3. Adds a bias term (b)
4. Applies an activation function
5. Outputs a prediction

**Mathematical formula:**
```
z = (w₁ × x₁) + (w₂ × x₂) + ... + (wₙ × xₙ) + b
output = activation(z)
```

**Analogy for EO:**
Think of classifying a pixel as "forest" or "not forest":
- x₁ = NDVI value
- x₂ = texture measure
- x₃ = elevation
- Weights determine how important each feature is
- Output: probability of being forest

---

## 1.1: Implement the Perceptron Class


In [None]:
class Perceptron:
    """
    Simple perceptron implementation
    """
    
    def __init__(self, n_inputs, learning_rate=0.01):
        """
        Initialize perceptron with random weights
        
        Parameters:
        -----------
        n_inputs : int
            Number of input features
        learning_rate : float
            Step size for weight updates
        """
        # Initialize weights randomly (small values)
        self.weights = np.random.randn(n_inputs) * 0.01
        self.bias = 0.0
        self.learning_rate = learning_rate
        
        # Track training history
        self.errors = []
    
    def sigmoid(self, z):
        """
        Sigmoid activation function: σ(z) = 1 / (1 + e^(-z))
        Maps any value to range (0, 1)
        """
        return 1 / (1 + np.exp(-z))
    
    def predict(self, X):
        """
        Make predictions for input data
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
            Input data
        
        Returns:
        --------
        predictions : array, shape (n_samples,)
            Binary predictions (0 or 1)
        """
        # Calculate weighted sum
        z = np.dot(X, self.weights) + self.bias
        
        # Apply sigmoid activation
        probabilities = self.sigmoid(z)
        
        # Convert to binary (threshold at 0.5)
        predictions = (probabilities >= 0.5).astype(int)
        
        return predictions
    
    def train(self, X, y, epochs=100):
        """
        Train perceptron using gradient descent
        
        Parameters:
        -----------
        X : array-like, shape (n_samples, n_features)
            Training data
        y : array-like, shape (n_samples,)
            Target labels (0 or 1)
        epochs : int
            Number of training iterations
        """
        for epoch in range(epochs):
            # Forward pass
            z = np.dot(X, self.weights) + self.bias
            predictions = self.sigmoid(z)
            
            # Calculate error
            errors = y - predictions
            
            # Update weights (gradient descent)
            self.weights += self.learning_rate * np.dot(X.T, errors)
            self.bias += self.learning_rate * np.sum(errors)
            
            # Track mean squared error
            mse = np.mean(errors ** 2)
            self.errors.append(mse)
            
            if (epoch + 1) % 20 == 0:
                accuracy = np.mean(self.predict(X) == y) * 100
                print(f"Epoch {epoch+1}/{epochs} - MSE: {mse:.4f} - Accuracy: {accuracy:.1f}%")

print("✓ Perceptron class defined")
print("  Methods: __init__, sigmoid, predict, train")

---

## 1.2: Generate Simple Training Data

Let's create a toy dataset that mimics forest classification:
- **Feature 1:** NDVI (high for forest)
- **Feature 2:** Texture contrast (medium for forest)
- **Label:** Forest (1) or Not Forest (0)


In [None]:
# Generate synthetic "forest" vs "non-forest" data
np.random.seed(42)

# Forest: high NDVI (0.6-0.9), medium texture (20-50)
n_forest = 50
forest_ndvi = np.random.uniform(0.6, 0.9, n_forest)
forest_texture = np.random.uniform(20, 50, n_forest)
forest_data = np.column_stack([forest_ndvi, forest_texture])
forest_labels = np.ones(n_forest)

# Non-forest: low NDVI (0.1-0.4), high texture (40-80)
n_non_forest = 50
non_forest_ndvi = np.random.uniform(0.1, 0.4, n_non_forest)
non_forest_texture = np.random.uniform(40, 80, n_non_forest)
non_forest_data = np.column_stack([non_forest_ndvi, non_forest_texture])
non_forest_labels = np.zeros(n_non_forest)

# Combine datasets
X_train = np.vstack([forest_data, non_forest_data])
y_train = np.concatenate([forest_labels, non_forest_labels])

# Shuffle
shuffle_idx = np.random.permutation(len(X_train))
X_train = X_train[shuffle_idx]
y_train = y_train[shuffle_idx]

print(f"Training data shape: {X_train.shape}")
print(f"Labels shape: {y_train.shape}")
print(f"\nClass distribution:")
print(f"  Forest (1): {np.sum(y_train == 1)} samples")
print(f"  Non-forest (0): {np.sum(y_train == 0)} samples")

### Visualize Training Data

In [None]:
# Plot the training data
fig, ax = plt.subplots(figsize=(10, 6))

# Separate classes for plotting
forest_mask = y_train == 1
non_forest_mask = y_train == 0

ax.scatter(X_train[forest_mask, 0], X_train[forest_mask, 1], 
           c='darkgreen', s=100, alpha=0.6, edgecolors='black', 
           label='Forest', marker='o')
ax.scatter(X_train[non_forest_mask, 0], X_train[non_forest_mask, 1], 
           c='orange', s=100, alpha=0.6, edgecolors='black', 
           label='Non-Forest', marker='s')

ax.set_xlabel('NDVI (Normalized Difference Vegetation Index)', fontsize=12, fontweight='bold')
ax.set_ylabel('Texture Contrast', fontsize=12, fontweight='bold')
ax.set_title('Forest vs Non-Forest Training Data', fontsize=14, fontweight='bold')
ax.legend(fontsize=12)
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✓ Training data visualized")
print("  Notice: Forest = high NDVI, moderate texture")
print("          Non-forest = low NDVI, high texture")

---

## 1.3: Train the Perceptron

Now let's train our perceptron to classify forest vs non-forest!


In [None]:
# Create and train perceptron
print("Training perceptron...")
print("=" * 60)

perceptron = Perceptron(n_inputs=2, learning_rate=0.1)
perceptron.train(X_train, y_train, epochs=100)

print("=" * 60)
print("\n✓ Training complete!")

# Final accuracy
final_predictions = perceptron.predict(X_train)
final_accuracy = np.mean(final_predictions == y_train) * 100
print(f"\nFinal Training Accuracy: {final_accuracy:.1f}%")

print(f"\nLearned Weights:")
print(f"  NDVI weight: {perceptron.weights[0]:.4f}")
print(f"  Texture weight: {perceptron.weights[1]:.4f}")
print(f"  Bias: {perceptron.bias:.4f}")

### Visualize Learning Progress

In [None]:
# Plot training curve
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Plot 1: Error over time
ax1.plot(perceptron.errors, linewidth=2, color='darkred')
ax1.set_xlabel('Epoch', fontsize=12, fontweight='bold')
ax1.set_ylabel('Mean Squared Error', fontsize=12, fontweight='bold')
ax1.set_title('Learning Curve: Error Decreases Over Time', fontsize=12, fontweight='bold')
ax1.grid(True, alpha=0.3)

# Plot 2: Decision boundary
# Create mesh for decision boundary
x_min, x_max = X_train[:, 0].min() - 0.1, X_train[:, 0].max() + 0.1
y_min, y_max = X_train[:, 1].min() - 5, X_train[:, 1].max() + 5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))

# Predict for mesh
Z = perceptron.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

# Plot decision regions
ax2.contourf(xx, yy, Z, alpha=0.3, cmap='RdYlGn', levels=[0, 0.5, 1])
ax2.contour(xx, yy, Z, colors='black', linewidths=2, levels=[0.5])

# Plot training points
ax2.scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1],
            c='darkgreen', s=100, alpha=0.8, edgecolors='black', label='Forest')
ax2.scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1],
            c='orange', s=100, alpha=0.8, edgecolors='black', label='Non-Forest')

ax2.set_xlabel('NDVI', fontsize=12, fontweight='bold')
ax2.set_ylabel('Texture Contrast', fontsize=12, fontweight='bold')
ax2.set_title('Decision Boundary Learned by Perceptron', fontsize=12, fontweight='bold')
ax2.legend(fontsize=11)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✓ Perceptron successfully learned to separate forest from non-forest!")
print("  The black line shows the decision boundary")
print("  Green region = predicted as forest")
print("  Red region = predicted as non-forest")

---

### 🎯 Key Takeaways - Part 1

✅ **A perceptron is the building block** of neural networks  
✅ **Weights determine feature importance** (like feature importance in RF)  
✅ **Training adjusts weights** to minimize error  
✅ **Activation functions** map outputs to desired range  
✅ **Decision boundary** separates classes (linear for perceptron)  

**Limitation:** Perceptrons can only learn linear decision boundaries. For complex patterns (like in satellite images), we need deeper networks with non-linear activations!

---


---

# Part 2: Activation Functions (15 minutes)

## Why Activation Functions?

Without activation functions, neural networks would just be linear models (like linear regression). Activation functions introduce **non-linearity**, allowing networks to learn complex patterns.

**Analogy:** 
- Linear model: Can only draw straight lines to separate classes
- With activation: Can draw curves, circles, any shape!

---

## 2.1: Implement Common Activation Functions


In [None]:
# Define activation functions
def sigmoid(x):
    """
    Sigmoid: σ(x) = 1 / (1 + e^(-x))
    Range: (0, 1)
    Use: Output probabilities, binary classification
    """
    return 1 / (1 + np.exp(-np.clip(x, -500, 500)))  # Clip to avoid overflow

def relu(x):
    """
    ReLU: f(x) = max(0, x)
    Range: [0, ∞)
    Use: Most popular for hidden layers
    """
    return np.maximum(0, x)

def tanh(x):
    """
    Tanh: f(x) = (e^x - e^(-x)) / (e^x + e^(-x))
    Range: (-1, 1)
    Use: Hidden layers, zero-centered
    """
    return np.tanh(x)

def leaky_relu(x, alpha=0.01):
    """
    Leaky ReLU: f(x) = x if x > 0 else alpha * x
    Range: (-∞, ∞)
    Use: Solves "dying ReLU" problem
    """
    return np.where(x > 0, x, alpha * x)

def softmax(x):
    """
    Softmax: Converts vector to probability distribution
    Use: Multi-class classification output
    """
    exp_x = np.exp(x - np.max(x))  # Subtract max for numerical stability
    return exp_x / exp_x.sum(axis=0)

print("✓ Activation functions defined")
print("  Functions: sigmoid, relu, tanh, leaky_relu, softmax")

---

## 2.2: Visualize Activation Functions


In [None]:
# Generate input range
x = np.linspace(-5, 5, 1000)

# Calculate activations
y_sigmoid = sigmoid(x)
y_relu = relu(x)
y_tanh = tanh(x)
y_leaky_relu = leaky_relu(x)

# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
axes = axes.flatten()

# Sigmoid
axes[0].plot(x, y_sigmoid, linewidth=3, color='blue')
axes[0].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[0].axhline(y=1, color='k', linestyle='--', alpha=0.3)
axes[0].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[0].set_title('Sigmoid Function', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Input (z)', fontsize=11)
axes[0].set_ylabel('Output σ(z)', fontsize=11)
axes[0].grid(True, alpha=0.3)
axes[0].text(0.5, 0.05, 'Range: (0, 1)\nUse: Binary classification output', 
             transform=axes[0].transAxes, fontsize=10, 
             bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.5))

# ReLU
axes[1].plot(x, y_relu, linewidth=3, color='red')
axes[1].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[1].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[1].set_title('ReLU (Rectified Linear Unit)', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Input (z)', fontsize=11)
axes[1].set_ylabel('Output ReLU(z)', fontsize=11)
axes[1].grid(True, alpha=0.3)
axes[1].text(0.5, 0.05, 'Range: [0, ∞)\nUse: Hidden layers (most popular)', 
             transform=axes[1].transAxes, fontsize=10,
             bbox=dict(boxstyle='round', facecolor='lightcoral', alpha=0.5))

# Tanh
axes[2].plot(x, y_tanh, linewidth=3, color='green')
axes[2].axhline(y=-1, color='k', linestyle='--', alpha=0.3)
axes[2].axhline(y=1, color='k', linestyle='--', alpha=0.3)
axes[2].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[2].set_title('Tanh (Hyperbolic Tangent)', fontsize=14, fontweight='bold')
axes[2].set_xlabel('Input (z)', fontsize=11)
axes[2].set_ylabel('Output tanh(z)', fontsize=11)
axes[2].grid(True, alpha=0.3)
axes[2].text(0.5, 0.05, 'Range: (-1, 1)\nUse: Hidden layers (zero-centered)', 
             transform=axes[2].transAxes, fontsize=10,
             bbox=dict(boxstyle='round', facecolor='lightgreen', alpha=0.5))

# Leaky ReLU
axes[3].plot(x, y_leaky_relu, linewidth=3, color='purple')
axes[3].axhline(y=0, color='k', linestyle='--', alpha=0.3)
axes[3].axvline(x=0, color='k', linestyle='--', alpha=0.3)
axes[3].set_title('Leaky ReLU', fontsize=14, fontweight='bold')
axes[3].set_xlabel('Input (z)', fontsize=11)
axes[3].set_ylabel('Output Leaky ReLU(z)', fontsize=11)
axes[3].grid(True, alpha=0.3)
axes[3].text(0.5, 0.05, 'Range: (-∞, ∞)\nUse: Avoids "dying ReLU" problem', 
             transform=axes[3].transAxes, fontsize=10,
             bbox=dict(boxstyle='round', facecolor='plum', alpha=0.5))

plt.tight_layout()
plt.show()

print("\n✓ Activation functions visualized!")
print("\n📊 Key Observations:")
print("  • Sigmoid: S-shaped curve, squashes to (0,1)")
print("  • ReLU: Simple, fast, most popular")
print("  • Tanh: Similar to sigmoid but zero-centered")
print("  • Leaky ReLU: Allows small negative values")

### Compare Derivatives (Gradients)

The **derivative** determines how fast the neuron learns during backpropagation.


In [None]:
# Calculate derivatives
def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

def relu_derivative(x):
    return (x > 0).astype(float)

def tanh_derivative(x):
    return 1 - np.tanh(x)**2

# Compute derivatives
dy_sigmoid = sigmoid_derivative(x)
dy_relu = relu_derivative(x)
dy_tanh = tanh_derivative(x)

# Plot derivatives
fig, ax = plt.subplots(figsize=(12, 6))

ax.plot(x, dy_sigmoid, linewidth=3, label='Sigmoid derivative', color='blue')
ax.plot(x, dy_relu, linewidth=3, label='ReLU derivative', color='red')
ax.plot(x, dy_tanh, linewidth=3, label='Tanh derivative', color='green')

ax.axhline(y=0, color='k', linestyle='--', alpha=0.3)
ax.axvline(x=0, color='k', linestyle='--', alpha=0.3)
ax.set_xlabel('Input (z)', fontsize=12, fontweight='bold')
ax.set_ylabel('Gradient (derivative)', fontsize=12, fontweight='bold')
ax.set_title('Activation Function Derivatives (Gradients for Backpropagation)', 
             fontsize=14, fontweight='bold')
ax.legend(fontsize=11, loc='upper right')
ax.grid(True, alpha=0.3)
ax.set_ylim(-0.2, 1.2)

plt.tight_layout()
plt.show()

print("\n✓ Gradients visualized!")
print("\n🔑 Why ReLU is Popular:")
print("  • Gradient is either 0 or 1 (simple computation)")
print("  • No vanishing gradient for x > 0")
print("  • Much faster than sigmoid/tanh")
print("\n⚠️ Vanishing Gradient Problem:")
print("  • Sigmoid/Tanh: gradients → 0 for large |x|")
print("  • Deep networks can't learn (gradients disappear)")
print("  • ReLU solves this for positive inputs")

---

### 🎯 Key Takeaways - Part 2

✅ **Activation functions introduce non-linearity**  
✅ **ReLU is the default choice** for hidden layers  
✅ **Sigmoid/Softmax for output** layers (probabilities)  
✅ **Derivatives matter** for learning speed  
✅ **Vanishing gradient** is why we prefer ReLU  

---


---

# Part 3: Build a Simple Neural Network (20 minutes)

Now let's connect multiple perceptrons to create a **multi-layer neural network**!

## Architecture

```
Input Layer (2 neurons) → Hidden Layer (4 neurons) → Output Layer (1 neuron)
        ↓                        ↓                         ↓
    [NDVI, Texture]         [ReLU activation]      [Sigmoid activation]
```

This is a **2-4-1 network**: 2 inputs, 4 hidden neurons, 1 output.

---

## 3.1: Implement Neural Network Class


In [None]:
class SimpleNeuralNetwork:
    """
    2-layer neural network with one hidden layer
    """
    
    def __init__(self, input_size, hidden_size, output_size, learning_rate=0.01):
        """
        Initialize network with random weights
        """
        # Layer 1: input → hidden
        self.W1 = np.random.randn(input_size, hidden_size) * 0.1
        self.b1 = np.zeros(hidden_size)
        
        # Layer 2: hidden → output
        self.W2 = np.random.randn(hidden_size, output_size) * 0.1
        self.b2 = np.zeros(output_size)
        
        self.learning_rate = learning_rate
        self.losses = []
    
    def forward(self, X):
        """
        Forward propagation
        """
        # Layer 1
        self.z1 = np.dot(X, self.W1) + self.b1
        self.a1 = relu(self.z1)  # Hidden layer uses ReLU
        
        # Layer 2
        self.z2 = np.dot(self.a1, self.W2) + self.b2
        self.a2 = sigmoid(self.z2)  # Output layer uses Sigmoid
        
        return self.a2
    
    def backward(self, X, y):
        """
        Backpropagation (gradient calculation)
        """
        m = X.shape[0]  # Number of samples
        
        # Output layer gradients
        dz2 = self.a2 - y.reshape(-1, 1)
        dW2 = (1/m) * np.dot(self.a1.T, dz2)
        db2 = (1/m) * np.sum(dz2, axis=0)
        
        # Hidden layer gradients
        da1 = np.dot(dz2, self.W2.T)
        dz1 = da1 * (self.z1 > 0)  # ReLU derivative
        dW1 = (1/m) * np.dot(X.T, dz1)
        db1 = (1/m) * np.sum(dz1, axis=0)
        
        # Update weights
        self.W1 -= self.learning_rate * dW1
        self.b1 -= self.learning_rate * db1
        self.W2 -= self.learning_rate * dW2
        self.b2 -= self.learning_rate * db2
    
    def train(self, X, y, epochs=1000):
        """
        Train the network
        """
        for epoch in range(epochs):
            # Forward pass
            predictions = self.forward(X)
            
            # Calculate loss (binary cross-entropy)
            loss = -np.mean(y * np.log(predictions + 1e-8) + 
                           (1 - y) * np.log(1 - predictions + 1e-8))
            self.losses.append(loss)
            
            # Backward pass
            self.backward(X, y)
            
            if (epoch + 1) % 200 == 0:
                accuracy = np.mean((predictions > 0.5).flatten() == y) * 100
                print(f"Epoch {epoch+1}/{epochs} - Loss: {loss:.4f} - Accuracy: {accuracy:.1f}%")
    
    def predict(self, X):
        """
        Make predictions
        """
        probabilities = self.forward(X)
        return (probabilities > 0.5).astype(int).flatten()

print("✓ Neural Network class defined")
print("  Architecture: Input → Hidden (ReLU) → Output (Sigmoid)")

---

## 3.2: Train Neural Network

Let's train on the same forest/non-forest data and compare with the perceptron!


In [None]:
# Create and train neural network
print("Training 2-layer Neural Network...")
print("=" * 60)
print("Architecture: 2 inputs → 4 hidden (ReLU) → 1 output (Sigmoid)")
print("=" * 60)

nn = SimpleNeuralNetwork(input_size=2, hidden_size=4, output_size=1, learning_rate=0.5)
nn.train(X_train, y_train, epochs=1000)

print("=" * 60)
print("\n✓ Training complete!")

# Final accuracy
final_predictions = nn.predict(X_train)
final_accuracy = np.mean(final_predictions == y_train) * 100
print(f"\nFinal Training Accuracy: {final_accuracy:.1f}%")

### Compare: Perceptron vs Neural Network

In [None]:
# Create comparison visualization
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Plot 1: Learning curves comparison
axes[0].plot(perceptron.errors, label='Perceptron (MSE)', linewidth=2, alpha=0.7)
axes[0].set_xlabel('Epoch', fontsize=11, fontweight='bold')
axes[0].set_ylabel('Error', fontsize=11, fontweight='bold')
axes[0].set_title('Perceptron Learning Curve', fontsize=12, fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(nn.losses, label='Neural Network (Cross-Entropy)', linewidth=2, 
             alpha=0.7, color='darkgreen')
axes[1].set_xlabel('Epoch', fontsize=11, fontweight='bold')
axes[1].set_ylabel('Loss', fontsize=11, fontweight='bold')
axes[1].set_title('Neural Network Learning Curve', fontsize=12, fontweight='bold')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

# Plot 2: Decision boundaries
x_min, x_max = X_train[:, 0].min() - 0.1, X_train[:, 0].max() + 0.1
y_min, y_max = X_train[:, 1].min() - 5, X_train[:, 1].max() + 5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))

# Neural network predictions
Z_nn = nn.predict(np.c_[xx.ravel(), yy.ravel()])
Z_nn = Z_nn.reshape(xx.shape)

axes[2].contourf(xx, yy, Z_nn, alpha=0.3, cmap='RdYlGn', levels=[0, 0.5, 1])
axes[2].contour(xx, yy, Z_nn, colors='black', linewidths=2, levels=[0.5])
axes[2].scatter(X_train[y_train == 1, 0], X_train[y_train == 1, 1],
                c='darkgreen', s=80, alpha=0.8, edgecolors='black', label='Forest')
axes[2].scatter(X_train[y_train == 0, 0], X_train[y_train == 0, 1],
                c='orange', s=80, alpha=0.8, edgecolors='black', label='Non-Forest')
axes[2].set_xlabel('NDVI', fontsize=11, fontweight='bold')
axes[2].set_ylabel('Texture', fontsize=11, fontweight='bold')
axes[2].set_title('Neural Network Decision Boundary', fontsize=12, fontweight='bold')
axes[2].legend()
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\n✓ Comparison complete!")
print(f"\nPerceptron accuracy: {np.mean(perceptron.predict(X_train) == y_train)*100:.1f}%")
print(f"Neural Network accuracy: {final_accuracy:.1f}%")
print("\n💡 Neural network can learn more complex decision boundaries!")

---

### 🎯 Key Takeaways - Part 3

✅ **Multi-layer networks** learn complex patterns  
✅ **Hidden layers** create feature representations  
✅ **Different activations** for different layers  
✅ **Backpropagation** trains all layers together  
✅ **Deeper ≠ always better** for simple problems  

**Next:** Apply these concepts to images using convolution!

---


---

# Part 4: Convolution Operations (20 minutes)

## What is Convolution?

**Convolution** is the core operation in CNNs. It:
1. Takes a small filter (kernel) like 3×3
2. Slides it across an image
3. Performs element-wise multiplication
4. Sums the results
5. Creates a feature map

**Why convolution for images?**
- ✅ **Spatial locality:** Nearby pixels are related
- ✅ **Parameter sharing:** Same filter across entire image
- ✅ **Translation invariance:** Detects patterns anywhere
- ✅ **Hierarchical learning:** Builds from simple to complex features

---

## 4.1: Manual Convolution Implementation


In [None]:
def convolve2d(image, kernel):
    """
    Apply 2D convolution manually
    
    Parameters:
    -----------
    image : 2D array
        Input image
    kernel : 2D array
        Convolution filter
    
    Returns:
    --------
    output : 2D array
        Convolved feature map
    """
    # Get dimensions
    image_h, image_w = image.shape
    kernel_h, kernel_w = kernel.shape
    
    # Calculate output size
    output_h = image_h - kernel_h + 1
    output_w = image_w - kernel_w + 1
    
    # Initialize output
    output = np.zeros((output_h, output_w))
    
    # Slide kernel across image
    for i in range(output_h):
        for j in range(output_w):
            # Extract region
            region = image[i:i+kernel_h, j:j+kernel_w]
            # Element-wise multiply and sum
            output[i, j] = np.sum(region * kernel)
    
    return output

print("✓ Convolution function defined")
print("  This mimics how CNNs process images!")

---

## 4.2: Classic Image Filters

Let's apply different filters to understand what CNNs learn!


In [None]:
# Create a simple test image (simulating Sentinel-2 NIR band)
# Simulate forest (bright) vs non-forest (dark) with edges
test_image = np.zeros((50, 50))
test_image[10:40, 10:25] = 0.8  # Forest patch (high NIR)
test_image[10:40, 25:40] = 0.2  # Urban/bare soil (low NIR)

# Add some noise for realism
test_image += np.random.normal(0, 0.05, test_image.shape)
test_image = np.clip(test_image, 0, 1)

# Define classic filters
filters = {
    'Vertical Edge': np.array([
        [-1, 0, 1],
        [-1, 0, 1],
        [-1, 0, 1]
    ]),
    'Horizontal Edge': np.array([
        [-1, -1, -1],
        [ 0,  0,  0],
        [ 1,  1,  1]
    ]),
    'Edge Detection (Sobel)': np.array([
        [-1, -2, -1],
        [ 0,  0,  0],
        [ 1,  2,  1]
    ]),
    'Sharpen': np.array([
        [ 0, -1,  0],
        [-1,  5, -1],
        [ 0, -1,  0]
    ]),
    'Blur (Smoothing)': np.array([
        [1, 1, 1],
        [1, 1, 1],
        [1, 1, 1]
    ]) / 9,
    'Identity': np.array([
        [0, 0, 0],
        [0, 1, 0],
        [0, 0, 0]
    ])
}

print("✓ Test image and filters created")
print(f"  Image size: {test_image.shape}")
print(f"  Number of filters: {len(filters)}")

### Apply Filters and Visualize

In [None]:
# Apply all filters
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
axes = axes.flatten()

# Original image
axes[0].imshow(test_image, cmap='gray')
axes[0].set_title('Original Image\n(Simulated NIR Band)', fontsize=11, fontweight='bold')
axes[0].axis('off')

# Apply each filter
for idx, (name, kernel) in enumerate(filters.items(), start=1):
    # Convolve
    filtered = convolve2d(test_image, kernel)
    
    # Display
    axes[idx].imshow(filtered, cmap='gray')
    axes[idx].set_title(f'{name}\nFilter', fontsize=11, fontweight='bold')
    axes[idx].axis('off')
    
    # Show kernel as text
    kernel_text = f"Kernel:\n{kernel}"
    axes[idx].text(0.5, -0.15, kernel_text, transform=axes[idx].transAxes,
                   fontsize=7, ha='center', family='monospace',
                   bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Hide last subplot
axes[-1].axis('off')

plt.tight_layout()
plt.show()

print("\n✓ Filters applied successfully!")
print("\n🔍 Observations:")
print("  • Vertical Edge: Detects vertical boundaries (forest | urban)")
print("  • Horizontal Edge: Detects horizontal boundaries")
print("  • Sharpen: Enhances edges and details")
print("  • Blur: Smooths out noise")
print("  • Identity: Passes through unchanged")

---

## 4.3: Simulate Sentinel-2 Image

Let's apply filters to a more realistic Sentinel-2-like image!


In [None]:
# Create synthetic Sentinel-2 NIR band (64x64)
np.random.seed(42)

# Simulate different land covers
s2_image = np.zeros((64, 64))

# Forest blocks (high NIR)
s2_image[5:25, 5:25] = 0.8 + np.random.normal(0, 0.05, (20, 20))
s2_image[40:60, 40:60] = 0.75 + np.random.normal(0, 0.05, (20, 20))

# Water (very low NIR)
s2_image[5:25, 40:60] = 0.1 + np.random.normal(0, 0.02, (20, 20))

# Agriculture (medium NIR)
s2_image[40:60, 5:25] = 0.5 + np.random.normal(0, 0.08, (20, 20))

# Urban/bare soil (low NIR)
s2_image[25:40, 25:40] = 0.25 + np.random.normal(0, 0.05, (15, 15))

# Clip to valid range
s2_image = np.clip(s2_image, 0, 1)

print(f"✓ Synthetic Sentinel-2 image created: {s2_image.shape}")
print("  Contains: Forest, Water, Agriculture, Urban")

In [None]:
# Apply multiple edge detection filters
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original
axes[0, 0].imshow(s2_image, cmap='RdYlGn', vmin=0, vmax=1)
axes[0, 0].set_title('Original Sentinel-2 NIR\n(Synthetic)', fontsize=12, fontweight='bold')
axes[0, 0].axis('off')

# Vertical edges
vert_edges = convolve2d(s2_image, filters['Vertical Edge'])
axes[0, 1].imshow(vert_edges, cmap='seismic')
axes[0, 1].set_title('Vertical Edge Detection\n(Forest | Water boundary)', fontsize=11, fontweight='bold')
axes[0, 1].axis('off')

# Horizontal edges
horiz_edges = convolve2d(s2_image, filters['Horizontal Edge'])
axes[0, 2].imshow(horiz_edges, cmap='seismic')
axes[0, 2].set_title('Horizontal Edge Detection', fontsize=11, fontweight='bold')
axes[0, 2].axis('off')

# Sobel (combined edges)
sobel = convolve2d(s2_image, filters['Edge Detection (Sobel)'])
axes[1, 0].imshow(sobel, cmap='hot')
axes[1, 0].set_title('Sobel Edge Detection\n(All edges)', fontsize=11, fontweight='bold')
axes[1, 0].axis('off')

# Blur (texture smoothing)
blurred = convolve2d(s2_image, filters['Blur (Smoothing)'])
axes[1, 1].imshow(blurred, cmap='RdYlGn')
axes[1, 1].set_title('Blur Filter\n(Noise reduction)', fontsize=11, fontweight='bold')
axes[1, 1].axis('off')

# Sharpen
sharpened = convolve2d(s2_image, filters['Sharpen'])
axes[1, 2].imshow(sharpened, cmap='RdYlGn')
axes[1, 2].set_title('Sharpen Filter\n(Detail enhancement)', fontsize=11, fontweight='bold')
axes[1, 2].axis('off')

plt.tight_layout()
plt.show()

print("\n✓ Convolution filters applied to Sentinel-2-like image!")
print("\n🎯 This is what CNNs do automatically:")
print("  • Learn optimal filters (not pre-defined)")
print("  • Stack multiple filters (32, 64, 128...)")
print("  • Build hierarchical features (edges → textures → objects)")

---

### Understanding Feature Maps

When a CNN applies a filter, it creates a **feature map**. Multiple filters = multiple feature maps.

**Example:** First convolutional layer in ResNet
- Input: 64×64×10 (Sentinel-2 image)
- Filter: 64 filters of size 3×3
- Output: 64×64×64 (64 feature maps)

Each feature map responds to different patterns!


---

### 🎯 Key Takeaways - Part 4

✅ **Convolution = filter sliding across image**  
✅ **Filters detect specific patterns** (edges, textures)  
✅ **CNNs learn optimal filters** during training  
✅ **Feature maps** are outputs of convolution  
✅ **Multiple filters** capture different features  

**Connection to EO:**
- Layer 1 filters: Water/land boundaries, forest edges
- Layer 2 filters: Vegetation textures, urban patterns
- Layer 3 filters: Agricultural fields, forest stands

---


---

# Part 5: CNN Architecture Exploration (15 minutes)

Now let's explore real CNN architectures and understand their components!

## 5.1: Build a Simple CNN (Conceptually)

Let's design a CNN for Sentinel-2 scene classification:

**Task:** Classify 64×64 Sentinel-2 patches into 8 land cover classes

**Architecture:**
```
Input: 64×64×10 (10 Sentinel-2 bands)
    ↓
Conv1: 32 filters, 3×3 → 64×64×32
ReLU activation
MaxPool: 2×2 → 32×32×32
    ↓
Conv2: 64 filters, 3×3 → 32×32×64
ReLU activation
MaxPool: 2×2 → 16×16×64
    ↓
Conv3: 128 filters, 3×3 → 16×16×128
ReLU activation
GlobalAveragePool → 128
    ↓
Dense (Fully Connected): 128 → 8
Softmax activation
    ↓
Output: 8 class probabilities
```

---

## 5.2: Calculate Parameters


In [None]:
def calculate_cnn_parameters(architecture):
    """
    Calculate number of trainable parameters in CNN
    """
    total_params = 0
    
    print("CNN Architecture Analysis")
    print("=" * 70)
    
    for layer_name, layer_info in architecture.items():
        if 'conv' in layer_name.lower():
            # Convolution layer: (filter_h * filter_w * in_channels + 1) * out_channels
            kernel_h, kernel_w = layer_info['kernel_size']
            in_channels = layer_info['in_channels']
            out_channels = layer_info['out_channels']
            
            params = (kernel_h * kernel_w * in_channels + 1) * out_channels
            total_params += params
            
            print(f"{layer_name}:")
            print(f"  Kernel: {kernel_h}×{kernel_w}, In: {in_channels}, Out: {out_channels}")
            print(f"  Parameters: {params:,}")
            
        elif 'dense' in layer_name.lower():
            # Dense layer: (input_size + 1) * output_size
            input_size = layer_info['input_size']
            output_size = layer_info['output_size']
            
            params = (input_size + 1) * output_size
            total_params += params
            
            print(f"{layer_name}:")
            print(f"  Input: {input_size}, Output: {output_size}")
            print(f"  Parameters: {params:,}")
        
        print()
    
    print("=" * 70)
    print(f"Total Trainable Parameters: {total_params:,}")
    print("=" * 70)
    
    return total_params

# Define our CNN architecture
our_cnn = {
    'Conv1': {'kernel_size': (3, 3), 'in_channels': 10, 'out_channels': 32},
    'Conv2': {'kernel_size': (3, 3), 'in_channels': 32, 'out_channels': 64},
    'Conv3': {'kernel_size': (3, 3), 'in_channels': 64, 'out_channels': 128},
    'Dense': {'input_size': 128, 'output_size': 8}
}

params = calculate_cnn_parameters(our_cnn)

print(f"\n💡 For comparison:")
print(f"  ResNet50: ~25 million parameters")
print(f"  VGG16: ~138 million parameters")
print(f"  Our simple CNN: {params:,} parameters")
print(f"\n  → Lightweight, suitable for small datasets!")

---

## 5.3: Visualize CNN Architecture


In [None]:
# Visualize the architecture flow
fig, ax = plt.subplots(figsize=(14, 8))

# Define layer positions and sizes
layers = [
    {'name': 'Input\n64×64×10', 'x': 0, 'y': 0.5, 'w': 0.8, 'h': 0.8, 'color': 'lightblue'},
    {'name': 'Conv1 + ReLU\n64×64×32', 'x': 1.5, 'y': 0.5, 'w': 0.7, 'h': 0.7, 'color': 'lightcoral'},
    {'name': 'MaxPool\n32×32×32', 'x': 2.8, 'y': 0.5, 'w': 0.6, 'h': 0.6, 'color': 'lightyellow'},
    {'name': 'Conv2 + ReLU\n32×32×64', 'x': 4.0, 'y': 0.5, 'w': 0.6, 'h': 0.6, 'color': 'lightcoral'},
    {'name': 'MaxPool\n16×16×64', 'x': 5.2, 'y': 0.5, 'w': 0.5, 'h': 0.5, 'color': 'lightyellow'},
    {'name': 'Conv3 + ReLU\n16×16×128', 'x': 6.4, 'y': 0.5, 'w': 0.5, 'h': 0.5, 'color': 'lightcoral'},
    {'name': 'Global\nAvgPool', 'x': 7.6, 'y': 0.5, 'w': 0.3, 'h': 0.8, 'color': 'lightyellow'},
    {'name': 'Dense\n8 classes', 'x': 8.5, 'y': 0.5, 'w': 0.3, 'h': 0.6, 'color': 'lightgreen'},
]

# Draw layers
for layer in layers:
    rect = plt.Rectangle((layer['x'] - layer['w']/2, layer['y'] - layer['h']/2),
                          layer['w'], layer['h'], 
                          facecolor=layer['color'], edgecolor='black', linewidth=2)
    ax.add_patch(rect)
    ax.text(layer['x'], layer['y'], layer['name'], 
            ha='center', va='center', fontsize=9, fontweight='bold')

# Draw arrows
for i in range(len(layers) - 1):
    ax.arrow(layers[i]['x'] + layers[i]['w']/2 + 0.05, 
             layers[i]['y'],
             layers[i+1]['x'] - layers[i+1]['w']/2 - layers[i]['x'] - layers[i]['w']/2 - 0.15,
             0, head_width=0.1, head_length=0.1, fc='gray', ec='gray')

ax.set_xlim(-0.5, 9.5)
ax.set_ylim(-0.5, 1.5)
ax.axis('off')
ax.set_title('CNN Architecture for Sentinel-2 Scene Classification', 
             fontsize=14, fontweight='bold', pad=20)

# Add legend
legend_elements = [
    plt.Rectangle((0, 0), 1, 1, fc='lightblue', ec='black', label='Input'),
    plt.Rectangle((0, 0), 1, 1, fc='lightcoral', ec='black', label='Convolution + ReLU'),
    plt.Rectangle((0, 0), 1, 1, fc='lightyellow', ec='black', label='Pooling'),
    plt.Rectangle((0, 0), 1, 1, fc='lightgreen', ec='black', label='Dense/Output')
]
ax.legend(handles=legend_elements, loc='upper center', 
          bbox_to_anchor=(0.5, -0.05), ncol=4, frameon=False)

plt.tight_layout()
plt.show()

print("\n✓ CNN architecture visualized!")
print("\n📐 Layer Dimensions:")
print("  Notice how spatial dimensions decrease (64→32→16)")
print("  While channels increase (10→32→64→128)")
print("  This is typical: trade spatial resolution for semantic features")

---

## 5.4: Compare with Random Forest

Let's understand when to use CNNs vs Random Forest for EO tasks.


In [None]:
# Create comparison table
comparison_data = {
    'Aspect': [
        'Input Type',
        'Feature Engineering',
        'Spatial Context',
        'Training Data Needed',
        'Training Time',
        'Inference Speed',
        'Interpretability',
        'Typical Accuracy',
        'Hardware',
        'Best Use Case'
    ],
    'Random Forest\n(Sessions 1-2)': [
        'Pixel features',
        'Manual (GLCM, NDVI, etc.)',
        'Limited (neighborhood)',
        '100-1000 samples',
        'Minutes',
        'Very fast (ms)',
        'High (feature importance)',
        '80-90%',
        'CPU sufficient',
        'Quick prototypes, small areas'
    ],
    'CNN\n(Sessions 3-4)': [
        'Image patches',
        'Automatic (learned)',
        'Hierarchical (receptive field)',
        '1000-100K+ images',
        'Hours-Days',
        'Fast with GPU (10-100ms)',
        'Low (black box)',
        '90-98%',
        'GPU recommended',
        'Production, large areas, high accuracy'
    ]
}

# Display as formatted table
print("\n" + "=" * 100)
print("RANDOM FOREST vs CONVOLUTIONAL NEURAL NETWORKS")
print("=" * 100)

for i, aspect in enumerate(comparison_data['Aspect']):
    rf_value = comparison_data['Random Forest\n(Sessions 1-2)'][i]
    cnn_value = comparison_data['CNN\n(Sessions 3-4)'][i]
    
    print(f"\n{aspect}:")
    print(f"  RF:  {rf_value}")
    print(f"  CNN: {cnn_value}")

print("\n" + "=" * 100)

print("\n🎯 Decision Guide:")
print("\n  Use RANDOM FOREST when:")
print("    • You have <1000 training samples")
print("    • Quick results needed (hours, not days)")
print("    • Interpretability is important")
print("    • No GPU available")
print("\n  Use CNN when:")
print("    • You have >1000 labeled images")
print("    • Highest accuracy is critical")
print("    • Production deployment planned")
print("    • GPU resources available")
print("\n  🌟 BEST PRACTICE: Start with RF, upgrade to CNN if needed!")

---

### 🎯 Key Takeaways - Part 5

✅ **CNN architecture:** Input → Conv → Pool → ... → Dense → Output  
✅ **Parameters scale quickly:** Deeper networks = more parameters  
✅ **Spatial dimensions decrease:** While semantic depth increases  
✅ **Choose wisely:** RF for quick work, CNN for production  
✅ **Transfer learning helps:** Use pre-trained models  

---


---

# 🎓 Session Complete! Summary

## What You've Learned

### Part 1: Perceptron
- ✅ Built artificial neuron from scratch
- ✅ Understood weights, bias, activation
- ✅ Trained using gradient descent
- ✅ Visualized decision boundary

### Part 2: Activation Functions
- ✅ Explored ReLU, Sigmoid, Tanh
- ✅ Understood non-linearity importance
- ✅ Saw vanishing gradient problem
- ✅ Learned why ReLU is popular

### Part 3: Neural Networks
- ✅ Built multi-layer network
- ✅ Implemented forward propagation
- ✅ Understood backpropagation
- ✅ Compared with perceptron

### Part 4: Convolution Operations
- ✅ Applied filters to images manually
- ✅ Visualized edge detection
- ✅ Processed Sentinel-2-like data
- ✅ Understood feature maps

### Part 5: CNN Architectures
- ✅ Designed CNN for EO classification
- ✅ Calculated parameters
- ✅ Visualized architecture flow
- ✅ Compared RF vs CNN

---

## Ready for Session 4!

In the next session, you'll:
- 🔨 Build actual CNNs with TensorFlow/Keras
- 🌲 Train on real Palawan land cover data
- 🎯 Implement U-Net for segmentation
- 📊 Compare results with Random Forest
- 🚀 Apply transfer learning

---

## 📚 Additional Practice (Optional)

**Exercises to Try:**

1. **Modify the Perceptron**
   - Add a third feature (elevation)
   - Try different learning rates
   - Visualize in 3D

2. **Experiment with Activations**
   - Replace ReLU with Tanh in the neural network
   - Compare training dynamics
   - Plot accuracy curves

3. **Custom Filters**
   - Design your own 3×3 filter
   - Test on the Sentinel-2 image
   - Explain what pattern it detects

4. **Architecture Design**
   - Design a CNN for 10-class classification
   - Calculate total parameters
   - Keep it under 100K parameters!

---

## 🌟 Key Concepts to Remember

**From Random Forest to CNNs:**
- RF: Manual features → Tree ensemble → Classification
- CNN: Raw pixels → Learned filters → Feature hierarchy → Classification

**Why CNNs Excel at Images:**
- Spatial locality (nearby pixels related)
- Parameter sharing (same filter everywhere)
- Hierarchical features (edges → textures → objects)
- End-to-end learning (optimize everything together)

**When CNNs Are Worth It:**
- Large labeled dataset (>1000 images)
- GPU available
- Accuracy is critical
- Production deployment

---

## 📖 Resources for Deeper Learning

**Interactive:**
- [TensorFlow Playground](https://playground.tensorflow.org/)
- [CNN Explainer](https://poloclub.github.io/cnn-explainer/)
- [Distill.pub Feature Visualization](https://distill.pub/2017/feature-visualization/)

**Courses:**
- Deep Learning Specialization (Coursera) - Andrew Ng
- Fast.ai Practical Deep Learning
- CS231n (Stanford) - CNNs for Visual Recognition

**Papers:**
- LeCun et al. (1998) - Gradient-Based Learning
- Krizhevsky et al. (2012) - AlexNet
- He et al. (2016) - ResNet

**EO-Specific:**
- EuroSAT Dataset
- TorchGeo Library
- Awesome Satellite Imagery Repo

---

**Congratulations! 🎉**

You now understand the fundamentals of deep learning and CNNs. Time to put it into practice in Session 4!

[Continue to Session 4 →](../../session4/notebooks/)

---

*Session 3 Theory Notebook - CoPhil Advanced Training Program*
