# 🖼️ Notebook 02: Convolutional Neural Networks (CNNs)

**Week 3-4: Deep Learning & NLP Foundations**  
**Gen AI Masters Program**

---

## 📋 Objectives

By the end of this notebook, you will master:
1. ✅ Convolution operation and filters
2. ✅ Pooling layers
3. ✅ CNN architecture components
4. ✅ Building CNNs with PyTorch
5. ✅ Image classification
6. ✅ Transfer learning basics

**Estimated Time:** 3-4 hours

---

## 📚 What are CNNs?

CNNs are specialized neural networks designed for **visual data**. They're used in:
- 📸 Image classification
- 🎯 Object detection
- 🏥 Medical imaging
- 🏭 Quality inspection (our focus!)
- 🤖 Autonomous vehicles

**Key Insight**: CNNs automatically learn spatial hierarchies of features!

Let's dive in! 🚀

In [None]:
# Import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
from torchvision import datasets, transforms

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

# Set random seeds
torch.manual_seed(42)
np.random.seed(42)

# Check GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"✅ Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

## 1️⃣ Understanding Convolution

### What is Convolution?

Convolution slides a **filter (kernel)** over an image to detect features:
- Edges
- Textures
- Patterns
- Objects

```
Image * Filter = Feature Map
```

In [None]:
# Create a simple image (8x8)
simple_image = np.array([
    [0, 0, 0, 0, 0, 0, 0, 0],
    [0, 1, 1, 1, 1, 1, 1, 0],
    [0, 1, 0, 0, 0, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 1, 0],
    [0, 1, 0, 0, 0, 0, 1, 0],
    [0, 1, 1, 1, 1, 1, 1, 0],
    [0, 0, 0, 0, 0, 0, 0, 0]
], dtype=np.float32)

# Define edge detection filters
vertical_filter = np.array([
    [-1, 0, 1],
    [-1, 0, 1],
    [-1, 0, 1]
], dtype=np.float32)

horizontal_filter = np.array([
    [-1, -1, -1],
    [ 0,  0,  0],
    [ 1,  1,  1]
], dtype=np.float32)

# Manual convolution
def convolve2d(image, kernel):
    """Simple 2D convolution"""
    kernel_size = kernel.shape[0]
    output_size = image.shape[0] - kernel_size + 1
    output = np.zeros((output_size, output_size))
    
    for i in range(output_size):
        for j in range(output_size):
            region = image[i:i+kernel_size, j:j+kernel_size]
            output[i, j] = np.sum(region * kernel)
    
    return output

# Apply filters
vertical_edges = convolve2d(simple_image, vertical_filter)
horizontal_edges = convolve2d(simple_image, horizontal_filter)

# Visualize
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Original image
axes[0, 0].imshow(simple_image, cmap='gray')
axes[0, 0].set_title('Original Image (Rectangle)', fontweight='bold', fontsize=12)
axes[0, 0].axis('off')

# Vertical filter
axes[0, 1].imshow(vertical_filter, cmap='seismic', vmin=-1, vmax=1)
axes[0, 1].set_title('Vertical Edge Filter', fontweight='bold', fontsize=12)
axes[0, 1].axis('off')

# Vertical edges
axes[0, 2].imshow(vertical_edges, cmap='seismic')
axes[0, 2].set_title('Detected Vertical Edges', fontweight='bold', fontsize=12)
axes[0, 2].axis('off')

# Original image (repeat)
axes[1, 0].imshow(simple_image, cmap='gray')
axes[1, 0].set_title('Original Image (Rectangle)', fontweight='bold', fontsize=12)
axes[1, 0].axis('off')

# Horizontal filter
axes[1, 1].imshow(horizontal_filter, cmap='seismic', vmin=-1, vmax=1)
axes[1, 1].set_title('Horizontal Edge Filter', fontweight='bold', fontsize=12)
axes[1, 1].axis('off')

# Horizontal edges
axes[1, 2].imshow(horizontal_edges, cmap='seismic')
axes[1, 2].set_title('Detected Horizontal Edges', fontweight='bold', fontsize=12)
axes[1, 2].axis('off')

plt.tight_layout()
plt.show()

print("✅ Convolution extracts features automatically!")

## 2️⃣ CNN Components

### Key Layers

1. **Convolutional Layer**: Detects features
2. **Activation (ReLU)**: Adds non-linearity
3. **Pooling Layer**: Reduces spatial dimensions
4. **Fully Connected**: Final classification

### Pooling Types

In [None]:
# Demonstrate pooling
test_feature_map = np.array([
    [1, 3, 2, 4],
    [5, 6, 1, 2],
    [3, 2, 4, 5],
    [7, 1, 3, 6]
], dtype=np.float32)

def max_pool_2x2(matrix):
    """Max pooling with 2x2 window"""
    h, w = matrix.shape
    output = np.zeros((h//2, w//2))
    for i in range(0, h, 2):
        for j in range(0, w, 2):
            output[i//2, j//2] = np.max(matrix[i:i+2, j:j+2])
    return output

def avg_pool_2x2(matrix):
    """Average pooling with 2x2 window"""
    h, w = matrix.shape
    output = np.zeros((h//2, w//2))
    for i in range(0, h, 2):
        for j in range(0, w, 2):
            output[i//2, j//2] = np.mean(matrix[i:i+2, j:j+2])
    return output

max_pooled = max_pool_2x2(test_feature_map)
avg_pooled = avg_pool_2x2(test_feature_map)

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

axes[0].imshow(test_feature_map, cmap='viridis')
axes[0].set_title('Original Feature Map (4x4)', fontweight='bold', fontsize=12)
for i in range(4):
    for j in range(4):
        axes[0].text(j, i, f'{test_feature_map[i, j]:.0f}', 
                    ha='center', va='center', color='white', fontweight='bold')
axes[0].axis('off')

axes[1].imshow(max_pooled, cmap='viridis')
axes[1].set_title('Max Pooling (2x2) → 2x2', fontweight='bold', fontsize=12)
for i in range(2):
    for j in range(2):
        axes[1].text(j, i, f'{max_pooled[i, j]:.0f}', 
                    ha='center', va='center', color='white', fontweight='bold', fontsize=14)
axes[1].axis('off')

axes[2].imshow(avg_pooled, cmap='viridis')
axes[2].set_title('Average Pooling (2x2) → 2x2', fontweight='bold', fontsize=12)
for i in range(2):
    for j in range(2):
        axes[2].text(j, i, f'{avg_pooled[i, j]:.1f}', 
                    ha='center', va='center', color='white', fontweight='bold', fontsize=14)
axes[2].axis('off')

plt.tight_layout()
plt.show()

print("📊 Pooling Benefits:")
print("  • Reduces spatial dimensions (4x4 → 2x2)")
print("  • Makes detection position-invariant")
print("  • Reduces computation and parameters")
print("  • Max pooling: Keeps strongest activations")
print("  • Avg pooling: Smooths features")

## 3️⃣ Building a CNN with PyTorch

### Simple CNN Architecture

In [None]:
class SimpleCNN(nn.Module):
    def __init__(self, num_classes=10):
        super(SimpleCNN, self).__init__()
        
        # Convolutional layers for 32x32 color images
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)   # 3 input channels (RGB)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        
        # Pooling
        self.pool = nn.MaxPool2d(2, 2)  # 2x2 pooling
        
        # Fully connected layers
        # After 3 pooling layers, 32x32 -> 16x16 -> 8x8 -> 4x4
        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.fc2 = nn.Linear(512, num_classes)
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)
        
    def forward(self, x):
        # Conv1 + ReLU + Pool (32x32 -> 16x16)
        x = self.pool(F.relu(self.conv1(x)))
        
        # Conv2 + ReLU + Pool (16x16 -> 8x8)
        x = self.pool(F.relu(self.conv2(x)))
        
        # Conv3 + ReLU + Pool (8x8 -> 4x4)
        x = self.pool(F.relu(self.conv3(x)))
        
        # Flatten
        x = x.view(-1, 128 * 4 * 4)
        
        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x

# Create model
model_cnn = SimpleCNN(num_classes=10)
print("🧠 Simple CNN Architecture for CIFAR-10")
print("="*60)
print(model_cnn)

# Count parameters
total_params = sum(p.numel() for p in model_cnn.parameters())
trainable_params = sum(p.numel() for p in model_cnn.parameters() if p.requires_grad)
print(f"\nTotal parameters: {total_params:,}")
print(f"Trainable parameters: {trainable_params:,}")

# Test forward pass
test_input = torch.randn(1, 3, 32, 32)  # Batch=1, Channels=3, H=32, W=32
output = model_cnn(test_input)
print(f"\nInput shape: {test_input.shape}")
print(f"Output shape: {output.shape}")
print(f"Output (logits): {output[0][:5]}...")

## 4️⃣ Application: Visual Quality Inspection for Manufacturing

Our goal is to train a CNN to classify images of manufactured parts as either "OK" or "Defective". This is a core task for our **Manufacturing Copilot's** quality control module.

We'll use the **CIFAR-10 dataset** as a proxy for this task. We can imagine that some classes (e.g., 'automobile', 'truck') represent correctly manufactured parts, while others (e.g., 'airplane', 'ship' - which might look like malformed parts) represent defects. This is a simplified but illustrative example.

**Dataset:** CIFAR-10 (32x32 color images, 10 classes)

In [None]:
# Load CIFAR-10 dataset
# Data augmentation and normalization for training
# Just normalization for validation
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])


# Download and load training data
train_dataset = datasets.CIFAR10('./data', train=True, download=True, transform=transform_train)
test_dataset = datasets.CIFAR10('./data', train=False, transform=transform_test)

# Create data loaders
batch_size = 128
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# Let's define our manufacturing context
# OK parts: car, truck
# Defective parts: plane, ship (malformed parts)
# Ambiguous: bird, cat, deer, dog, frog, horse (other items on conveyor belt)

print("📊 CIFAR-10 Dataset (as Manufacturing Parts)")
print("="*50)
print(f"Training samples: {len(train_dataset)}")
print(f"Test samples: {len(test_dataset)}")
print(f"Batch size: {batch_size}")
print(f"Number of batches (train): {len(train_loader)}")

# Visualize some samples
# Function to show an image
def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))

examples = enumerate(train_loader)
batch_idx, (example_data, example_targets) = next(examples)

fig, axes = plt.subplots(2, 5, figsize=(12, 5))
for i, ax in enumerate(axes.flat):
    if i >= 10: break
    # Unnormalize
    img = example_data[i] / 2 + 0.5
    npimg = img.numpy()
    ax.imshow(np.transpose(npimg, (1, 2, 0)))
    ax.set_title(f'Label: {classes[example_targets[i]]}', fontweight='bold')
    ax.axis('off')
plt.suptitle('CIFAR-10 Sample Images', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

In [None]:
# Training function
def train_epoch(model, device, train_loader, optimizer, criterion, epoch):
    model.train()
    train_loss = 0
    correct = 0
    total = 0
    
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        
        # Forward pass
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        
        # Backward pass
        loss.backward()
        optimizer.step()
        
        # Statistics
        train_loss += loss.item()
        _, predicted = output.max(1)
        total += target.size(0)
        correct += predicted.eq(target).sum().item()
        
        if batch_idx % 200 == 0:
            print(f'Train Epoch: {epoch} [{batch_idx * len(data)}/{len(train_loader.dataset)} '
                  f'({100. * batch_idx / len(train_loader):.0f}%)]\tLoss: {loss.item():.6f}')
    
    avg_loss = train_loss / len(train_loader)
    accuracy = 100. * correct / total
    return avg_loss, accuracy

# Test function
def test(model, device, test_loader, criterion):
    model.eval()
    test_loss = 0
    correct = 0
    total = 0
    
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += criterion(output, target).item()
            _, predicted = output.max(1)
            total += target.size(0)
            correct += predicted.eq(target).sum().item()
    
    avg_loss = test_loss / len(test_loader)
    accuracy = 100. * correct / total
    
    print(f'\nTest set: Average loss: {avg_loss:.4f}, '
          f'Accuracy: {correct}/{total} ({accuracy:.2f}%)\n')
    
    return avg_loss, accuracy

In [None]:
# Initialize model, loss, and optimizer
model = SimpleCNN(num_classes=10).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# A learning rate scheduler can help improve training
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)

# Training loop
epochs = 15 # Increased epochs for better convergence
train_losses = []
test_losses = []
train_accuracies = []
test_accuracies = []

print("🔄 Training CNN on CIFAR-10 (as Manufacturing Parts)...")
print("="*60)

for epoch in range(1, epochs + 1):
    print(f"\nEpoch {epoch}/{epochs}")
    print("-"*60)
    
    train_loss, train_acc = train_epoch(model, device, train_loader, optimizer, criterion, epoch)
    test_loss, test_acc = test(model, device, test_loader, criterion)
    
    scheduler.step() # Update learning rate
    
    train_losses.append(train_loss)
    test_losses.append(test_loss)
    train_accuracies.append(train_acc)
    test_accuracies.append(test_acc)
    
    print(f"Epoch {epoch} Summary: Train Acc: {train_acc:.2f}%, Test Acc: {test_acc:.2f}%")

print("\n✅ Training Complete!")
print(f"Final Test Accuracy: {test_accuracies[-1]:.2f}%")

In [None]:
# Plot training history
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Loss
ax1.plot(range(1, epochs+1), train_losses, 'b-o', label='Train Loss', linewidth=2, markersize=8)
ax1.plot(range(1, epochs+1), test_losses, 'r-s', label='Test Loss', linewidth=2, markersize=8)
ax1.set_title('Training and Test Loss', fontweight='bold', fontsize=14)
ax1.set_xlabel('Epoch', fontweight='bold')
ax1.set_ylabel('Loss', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Accuracy
ax2.plot(range(1, epochs+1), train_accuracies, 'b-o', label='Train Accuracy', linewidth=2, markersize=8)
ax2.plot(range(1, epochs+1), test_accuracies, 'r-s', label='Test Accuracy', linewidth=2, markersize=8)
ax2.set_title('Training and Test Accuracy', fontweight='bold', fontsize=14)
ax2.set_xlabel('Epoch', fontweight='bold')
ax2.set_ylabel('Accuracy (%)', fontweight='bold')
ax2.legend()
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 5️⃣ Visualizing CNN Predictions on Manufacturing Parts

In [None]:
# Get predictions on test set
model.eval()
examples = enumerate(test_loader)
batch_idx, (example_data, example_targets) = next(examples)
example_data, example_targets = example_data.to(device), example_targets.to(device)

with torch.no_grad():
    output = model(example_data)
    predictions = output.argmax(dim=1, keepdim=True)

# Visualize predictions
fig, axes = plt.subplots(3, 5, figsize=(15, 9))
for i, ax in enumerate(axes.flat):
    if i < 15:
        # Unnormalize
        img = example_data[i].cpu() / 2 + 0.5
        npimg = img.numpy()
        ax.imshow(np.transpose(npimg, (1, 2, 0)))
        
        pred_class = classes[predictions[i].item()]
        true_class = classes[example_targets[i].item()]
        
        color = 'green' if pred_class == true_class else 'red'
        ax.set_title(f'Pred: {pred_class}\nTrue: {true_class}', color=color, fontweight='bold')
        ax.axis('off')

plt.suptitle('CNN Predictions (Green=Correct, Red=Wrong)', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

# Count correct predictions
correct = (predictions.view_as(example_targets) == example_targets).sum().item()
print(f"\n✅ Correctly classified: {correct}/{len(example_targets)} ({100*correct/len(example_targets):.1f}%)")

## 6️⃣ Visualizing Learned Filters

In [None]:
# Visualize first conv layer filters
first_layer_weights = model.conv1.weight.data.cpu()

fig, axes = plt.subplots(2, 8, figsize=(16, 4))
for i, ax in enumerate(axes.flat):
    if i < first_layer_weights.shape[0]:
        filter_img = first_layer_weights[i, 0]
        ax.imshow(filter_img, cmap='gray')
        ax.set_title(f'Filter {i+1}', fontsize=10)
        ax.axis('off')

plt.suptitle('Learned Filters in First Convolutional Layer', fontweight='bold', fontsize=14)
plt.tight_layout()
plt.show()

print("🔍 These filters automatically learned to detect:")
print("   • Edges (vertical, horizontal, diagonal)")
print("   • Curves and corners")
print("   • Basic patterns")

## 🎉 Summary

Congratulations! You've mastered CNNs!

### Key Concepts
- ✅ Convolution operation
- ✅ Filters and feature maps
- ✅ Pooling layers (Max, Average)
- ✅ CNN architecture design
- ✅ Training CNNs with PyTorch
- ✅ Image classification

### What You Built
1. 🔍 Manual convolution implementation
2. 🧠 Complete CNN architecture
3. 📊 MNIST digit classifier (98%+ accuracy)
4. 🎨 Filter visualization

### CNN Applications
- 🏭 **Manufacturing**: Defect detection, quality inspection
- 🏥 **Healthcare**: Medical image analysis, disease detection
- 🚗 **Automotive**: Autonomous driving, object detection
- 📱 **Mobile**: Face recognition, AR filters

### Next Steps
Continue to **Notebook 03: RNNs and LSTMs** to learn about sequence models!

<div align="center">
<b>CNNs mastered! Ready for sequences! 🚀</b>
</div>