# Notebook 8: 3D Convolution Preview

**Week 10 - Module 4: CNN Basics**
**DO3 (October 27, 2025) - Saturday**
**Duration:** 10-15 minutes

## Learning Objectives

1. ✅ **Understand** difference between 2D and 3D convolution
2. ✅ **Identify** use cases for 3D convolution
3. ✅ **Recognize** when to use 3D vs 2D

---

In [None]:
import numpy as np
import tensorflow as tf

print("✅ Setup complete!")

## 1. 2D vs 3D Convolution

### 2D Convolution (what we've learned):
- **Input**: H × W × C (height, width, channels)
- **Kernel**: Kh × Kw × C × Filters
- **Output**: H' × W' × Filters
- **Use case**: Images (spatial features)

### 3D Convolution:
- **Input**: D × H × W × C (depth, height, width, channels)
- **Kernel**: Kd × Kh × Kw × C × Filters
- **Output**: D' × H' × W' × Filters
- **Use case**: Videos, 3D medical scans (spatiotemporal features)

---

## 2. When to Use 3D Convolution?

**Applications:**

1. **Video Analysis**
   - Action recognition (sports, surveillance)
   - Gesture recognition
   - Video captioning

2. **Medical Imaging**
   - CT scans (3D body scans)
   - MRI volumes
   - Tumor detection

3. **Climate Science**
   - Weather prediction (3D atmospheric data)
   - Ocean temperature analysis

---

In [None]:
# Example: 3D Conv for video (5 frames, 64×64 RGB)
from tensorflow.keras import layers

model_3d = tf.keras.Sequential([
    layers.Conv3D(32, (3, 3, 3), activation='relu',
                  input_shape=(5, 64, 64, 3)),  # (frames, h, w, channels)
    layers.MaxPooling3D((1, 2, 2)),  # Pool spatial, not temporal
    layers.Conv3D(64, (3, 3, 3), activation='relu'),
    layers.MaxPooling3D((1, 2, 2)),
    layers.Flatten(),
    layers.Dense(10, activation='softmax')
])

model_3d.summary()

## 3. 2D Conv on RGB Images (Important Clarification!)

**Common Confusion:**

> "RGB images have 3 channels. Is that 3D convolution?"

**Answer:** NO! It's still 2D convolution.

- **Input**: H × W × 3 (spatial 2D + 3 color channels)
- **Kernel**: Kh × Kw × 3 × Filters
- **Convolution**: Applied spatially (2D), aggregates across channels

**3D convolution** adds a **third spatial dimension** (depth/time).

---

## 4. Parameter Comparison

**2D Conv (32×32 RGB image, 64 filters, 3×3 kernel):**
- Params = $3 \times 3 \times 3 \times 64 + 64 = 1,792$

**3D Conv (16 frames, 32×32 RGB, 64 filters, 3×3×3 kernel):**
- Params = $3 \times 3 \times 3 \times 3 \times 64 + 64 = 5,248$

**3D convolution is ~3× more expensive!**

---

## Summary

### 🎯 Key Distinctions

1. **2D Conv**: Spatial features (images)
   - Input: H × W × C
   - Kernel: Kh × Kw × C × F

2. **3D Conv**: Spatiotemporal features (videos, 3D scans)
   - Input: D × H × W × C
   - Kernel: Kd × Kh × Kw × C × F

3. **RGB Images**: Still 2D convolution (channels ≠ depth)

### 🔮 Next

**Notebook 9:** Review & Tutorial T10 Preview

---

*Week 10 - Deep Neural Network Architectures (21CSE558T)*
*SRM University - M.Tech Program*