# Cats vs Dogs Image Classification with CNNs

**Author:** Anik Tahabilder  
**Project:** 6 of 22 - Kaggle ML Portfolio  
**Dataset:** Microsoft Cats vs Dogs  
**Difficulty:** 6/10 | **Resume Value:** 7/10 | **Learning Value:** 8/10 | **Impact:** 8/10

---

## What is Image Classification?

**Image Classification** is a computer vision task where we teach computers to recognize and categorize objects in images. It's one of the most fundamental problems in deep learning.

### Real-World Applications:

| Application | Example |
|-------------|----------|
| **Medical Diagnosis** | Detecting cancer from X-rays |
| **Autonomous Vehicles** | Identifying pedestrians, stop signs |
| **Face Recognition** | Unlocking phones, security systems |
| **Quality Control** | Detecting defects in manufacturing |
| **Wildlife Monitoring** | Identifying endangered species |

---

## Why Cats vs Dogs?

This is the "Hello World" of computer vision! It's:
- **Simple enough** to understand the fundamentals
- **Complex enough** to require deep learning (not solvable with simple ML)
- **Visually intuitive** - you can see what the model learns
- **Practically relevant** - techniques transfer to real problems

### The Challenge:

Traditional machine learning struggles with images because:
- Images have **high dimensionality** (224x224x3 = 150,528 pixels!)
- **Position matters** - a cat in the top-left vs bottom-right is still a cat
- **Scale varies** - cats can be close-up or far away
- **Lighting changes** - same cat in sunlight vs shadow looks different

**Solution:** Convolutional Neural Networks (CNNs)!

---

## What You'll Learn

In this notebook, we'll:

1. **Understand CNNs** - How they work, why they're powerful
2. **Load and explore** image data
3. **Preprocess images** - Resizing, normalization, augmentation
4. **Build CNNs from scratch** - Custom architectures
5. **Use Transfer Learning** - Leverage pre-trained models (VGG16, ResNet)
6. **Visualize** what the model learns
7. **Compare** different architectures

---

## Table of Contents

1. [Part 1: Understanding CNNs](#part1)
2. [Part 2: Setup and Data Loading](#part2)
3. [Part 3: Exploratory Data Analysis](#part3)
4. [Part 4: Image Preprocessing](#part4)
5. [Part 5: Data Augmentation](#part5)
6. [Part 6: Building a Simple CNN](#part6)
7. [Part 7: Training the Model](#part7)
8. [Part 8: Model Evaluation](#part8)
9. [Part 9: Transfer Learning with VGG16](#part9)
10. [Part 10: Model Comparison](#part10)
11. [Part 11: Visualizing What CNNs Learn](#part11)
12. [Part 12: Making Predictions](#part12)
13. [Part 13: Summary and Key Takeaways](#part13)

---

<a id='part1'></a>
# Part 1: Understanding Convolutional Neural Networks (CNNs)

## Why Not Regular Neural Networks?

Imagine a 224x224 RGB image:
- Total pixels: 224 × 224 × 3 = **150,528 inputs**
- With just 1000 neurons in first layer: **150 million parameters**!
- This leads to:
  - **Overfitting** (too many parameters)
  - **Slow training** (massive computations)
  - **Ignoring spatial structure** (treats pixels as independent)

---

## The CNN Solution

CNNs are inspired by how the **human visual cortex** works:
- Neurons respond to specific regions (receptive fields)
- Early layers detect simple patterns (edges, colors)
- Later layers detect complex patterns (eyes, ears, faces)

---

## Key Building Blocks

### 1. Convolutional Layer

**What it does:** Scans the image with small filters (kernels) to detect patterns.

```
Input Image (5x5)       Filter/Kernel (3x3)       Output Feature Map
┌─────────────┐         ┌─────┐                   ┌─────────┐
│ 1 0 1 0 1  │         │ 1 0 1│                   │  ?  ?  ?│
│ 0 1 0 1 0  │    *    │ 0 1 0│        =          │  ?  ?  ?│
│ 1 0 1 0 1  │         │ 1 0 1│                   │  ?  ?  ?│
│ 0 1 0 1 0  │         └─────┘                   └─────────┘
│ 1 0 1 0 1  │
└─────────────┘
```

**Key Properties:**
- **Parameter Sharing:** Same filter scans entire image
- **Translation Invariance:** Detects pattern anywhere in image
- **Fewer Parameters:** 3×3 filter has only 9 weights!

### Common Filter Types:

| Filter Type | What it Detects |
|-------------|----------------|
| **Edge Detector** | Vertical/horizontal/diagonal edges |
| **Sharpen** | Enhances edges |
| **Blur** | Smooths image |
| **Custom** | Learned by the network! |

---

### 2. Activation Function (ReLU)

**ReLU (Rectified Linear Unit):** $f(x) = max(0, x)$

- Introduces **non-linearity** (allows learning complex patterns)
- Simple and fast to compute
- Helps with vanishing gradient problem

```
Before ReLU:  [-2, 5, -1, 3]
After ReLU:   [0, 5, 0, 3]   (negative values → 0)
```

---

### 3. Pooling Layer

**What it does:** Reduces spatial dimensions (downsampling).

**Max Pooling (2×2):**
```
Input (4x4)              Output (2x2)
┌──────────┐             ┌────┐
│ 1  3│2  4│             │ 3  4│
│ 2  1│1  3│     →       │ 8  9│
├──────────┤             └────┘
│ 0  2│5  1│
│ 1  8│9  2│
└──────────┘
```

**Benefits:**
- Reduces computation (smaller feature maps)
- Provides spatial invariance (small shifts don't matter)
- Prevents overfitting (reduces parameters)

---

### 4. Fully Connected Layer

After several conv/pool layers:
- **Flatten** feature maps into a vector
- **Dense layers** combine features for final classification
- **Output layer** (2 neurons for Cat vs Dog, softmax activation)

---

## Typical CNN Architecture

```
Input Image (224x224x3)
      ↓
[Conv → ReLU → Pool] × N    ← Feature Extraction
      ↓
Flatten
      ↓
[Dense → ReLU] × M           ← Classification
      ↓
Output (Dense + Softmax)
```

**Pattern:**
- Image size **decreases** (224 → 112 → 56 → 28...)
- Number of filters **increases** (32 → 64 → 128 → 256...)
- Network learns **hierarchical features**:
  - Early layers: edges, colors, textures
  - Middle layers: parts (ears, eyes, nose)
  - Later layers: whole objects (cat face, dog body)

---

## CNN vs Traditional ML

| Aspect | Traditional ML | CNN |
|--------|---------------|-----|
| **Feature Engineering** | Manual (HOG, SIFT, etc.) | Automatic (learned) |
| **Spatial Structure** | Lost (flatten image) | Preserved (convolutions) |
| **Parameters** | Millions (fully connected) | Thousands (shared weights) |
| **Accuracy** | 60-70% (cats vs dogs) | 95-99%+ |

---

Now let's implement this!

---

<a id='part2'></a>
# Part 2: Setup and Data Loading

## Required Libraries

For image classification with CNNs, we need:

| Library | Purpose |
|---------|----------|
| **TensorFlow/Keras** | Building and training neural networks |
| **NumPy** | Numerical operations |
| **Matplotlib** | Visualization |
| **OpenCV/PIL** | Image processing |
| **scikit-learn** | Evaluation metrics |

In [None]:
# Core libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from PIL import Image
import warnings
warnings.filterwarnings('ignore')

# Suppress TensorFlow warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'

# TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, optimizers, callbacks
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array
from tensorflow.keras.applications import VGG16, ResNet50
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import (Dense, Conv2D, MaxPooling2D, Flatten, 
                                      Dropout, BatchNormalization, GlobalAveragePooling2D)

# Sklearn for metrics
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.model_selection import train_test_split

# Visualization settings
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12

# Set random seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print("="*60)
print("SETUP COMPLETE")
print("="*60)
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")
print(f"NumPy version: {np.__version__}")

# Check for GPU availability
try:
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        print(f"\nGPU Available: {len(gpus)} GPU(s) detected")
        print("GPU will be used for training!")
        # Optional: Set memory growth to avoid OOM errors
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
    else:
        print("\nNo GPU detected")
        print("Using CPU for training (slower, but works!)")
except Exception as e:
    print("\nGPU check failed, using CPU")
    print("Using CPU for training (slower, but works!)")

## Dataset Information

**Microsoft Cats vs Dogs Dataset:**
- **Source:** Kaggle
- **Size:** ~25,000 images
- **Classes:** 2 (Cat, Dog)
- **Format:** JPEG images of various sizes
- **Challenge:** Real-world images (different sizes, angles, lighting)

In [None]:
# Dataset paths
# On Kaggle, data is in /kaggle/input/
# Locally, adjust the path accordingly

# Check if running on Kaggle
if os.path.exists('/kaggle/input'):
    BASE_DIR = '/kaggle/input/microsoft-catsvsdogs-dataset/PetImages'
    print("Running on Kaggle")
else:
    # For local execution, adjust this path
    BASE_DIR = 'PetImages'  # Change to your local path
    print("Running locally")

CAT_DIR = os.path.join(BASE_DIR, 'Cat')
DOG_DIR = os.path.join(BASE_DIR, 'Dog')

print(f"\nDataset location: {BASE_DIR}")
print(f"Cat images: {CAT_DIR}")
print(f"Dog images: {DOG_DIR}")

In [None]:
# Count images in each category
num_cats = len([f for f in os.listdir(CAT_DIR) if f.endswith('.jpg')])
num_dogs = len([f for f in os.listdir(DOG_DIR) if f.endswith('.jpg')])
total_images = num_cats + num_dogs

print("="*60)
print("DATASET STATISTICS")
print("="*60)
print(f"Cat images: {num_cats:,}")
print(f"Dog images: {num_dogs:,}")
print(f"Total images: {total_images:,}")
print(f"\nClass Balance: {num_cats/total_images*100:.1f}% cats, {num_dogs/total_images*100:.1f}% dogs")

if abs(num_cats - num_dogs) / total_images < 0.1:
    print("✓ Dataset is balanced!")
else:
    print("⚠ Dataset is imbalanced (may need class weights)")

---

<a id='part3'></a>
# Part 3: Exploratory Data Analysis (EDA)

Before building models, we must **understand our data**:
- Are there corrupted images?
- What are the image sizes?
- What do the images look like?
- Are there any patterns we should know about?

In [None]:
# Check for corrupted images
# Some images in this dataset are corrupted and will cause errors

def check_image(filepath):
    """Check if an image can be loaded properly."""
    try:
        img = Image.open(filepath)
        img.verify()  # Verify it's actually an image
        return True
    except:
        return False

print("Checking for corrupted images...")
print("This may take a few minutes...")

corrupted_cats = []
corrupted_dogs = []

# Check cat images
cat_files = [f for f in os.listdir(CAT_DIR) if f.endswith('.jpg')]
for filename in cat_files[:1000]:  # Check first 1000 for speed
    filepath = os.path.join(CAT_DIR, filename)
    if not check_image(filepath):
        corrupted_cats.append(filename)

# Check dog images  
dog_files = [f for f in os.listdir(DOG_DIR) if f.endswith('.jpg')]
for filename in dog_files[:1000]:  # Check first 1000 for speed
    filepath = os.path.join(DOG_DIR, filename)
    if not check_image(filepath):
        corrupted_dogs.append(filename)

print(f"\nCorrupted cat images found: {len(corrupted_cats)}")
print(f"Corrupted dog images found: {len(corrupted_dogs)}")
print(f"Total corrupted: {len(corrupted_cats) + len(corrupted_dogs)}")

if len(corrupted_cats) + len(corrupted_dogs) > 0:
    print("\n⚠ We'll need to filter these out during data loading!")
else:
    print("\n✓ No corrupted images found in sample!")

In [None]:
# Analyze image dimensions
print("Analyzing image dimensions...")
print("Sampling 500 images from each class...\n")

def get_image_dimensions(directory, num_samples=500):
    """Get dimensions of sample images."""
    files = [f for f in os.listdir(directory) if f.endswith('.jpg')]
    files = files[:num_samples]
    
    widths = []
    heights = []
    
    for filename in files:
        filepath = os.path.join(directory, filename)
        try:
            img = Image.open(filepath)
            w, h = img.size
            widths.append(w)
            heights.append(h)
        except:
            continue
    
    return widths, heights

cat_widths, cat_heights = get_image_dimensions(CAT_DIR)
dog_widths, dog_heights = get_image_dimensions(DOG_DIR)

all_widths = cat_widths + dog_widths
all_heights = cat_heights + dog_heights

print("="*60)
print("IMAGE DIMENSION STATISTICS")
print("="*60)
print(f"\nWidth  - Min: {min(all_widths):4d}, Max: {max(all_widths):4d}, Mean: {np.mean(all_widths):.1f}")
print(f"Height - Min: {min(all_heights):4d}, Max: {max(all_heights):4d}, Mean: {np.mean(all_heights):.1f}")
print(f"\nMost common aspect ratios: landscape and portrait")
print("Note: Images have VARYING sizes - we'll need to resize them!")

In [None]:
# Visualize image dimension distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Width distribution
axes[0].hist(all_widths, bins=50, color='skyblue', edgecolor='black', alpha=0.7)
axes[0].axvline(np.mean(all_widths), color='red', linestyle='--', linewidth=2, label=f'Mean: {np.mean(all_widths):.0f}')
axes[0].set_xlabel('Width (pixels)')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Distribution of Image Widths')
axes[0].legend()

# Height distribution
axes[1].hist(all_heights, bins=50, color='lightcoral', edgecolor='black', alpha=0.7)
axes[1].axvline(np.mean(all_heights), color='red', linestyle='--', linewidth=2, label=f'Mean: {np.mean(all_heights):.0f}')
axes[1].set_xlabel('Height (pixels)')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Distribution of Image Heights')
axes[1].legend()

plt.tight_layout()
plt.show()

print("Key Observation: Images have highly variable dimensions!")
print("We'll standardize to 150x150 or 224x224 for training.")

## 3.3 Visualizing Sample Images

Let's see what our cats and dogs actually look like!

In [None]:
# Visualize a grid of sample images
def load_sample_images(directory, num_samples=12):
    """Load sample images from a directory."""
    files = [f for f in os.listdir(directory) if f.endswith('.jpg')][:num_samples]
    images = []
    
    for filename in files:
        filepath = os.path.join(directory, filename)
        try:
            img = load_img(filepath, target_size=(150, 150))
            images.append(img_to_array(img) / 255.0)  # Normalize
        except:
            continue
    
    return images

# Load sample cats and dogs
print("Loading sample images...")
sample_cats = load_sample_images(CAT_DIR, num_samples=8)
sample_dogs = load_sample_images(DOG_DIR, num_samples=8)

print(f"Loaded {len(sample_cats)} cat images")
print(f"Loaded {len(sample_dogs)} dog images")

# Visualize cats
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle('Sample Cat Images', fontweight='bold', fontsize=16)

for i, ax in enumerate(axes.flat):
    if i < len(sample_cats):
        ax.imshow(sample_cats[i])
        ax.axis('off')
        ax.set_title(f'Cat {i+1}')

plt.tight_layout()
plt.show()

# Visualize dogs
fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle('Sample Dog Images', fontweight='bold', fontsize=16)

for i, ax in enumerate(axes.flat):
    if i < len(sample_dogs):
        ax.imshow(sample_dogs[i])
        ax.axis('off')
        ax.set_title(f'Dog {i+1}')

plt.tight_layout()
plt.show()

print("\nObservations:")
print("- Images vary in SIZE (different dimensions)")
print("- Images vary in POSE (sitting, standing, lying down)")
print("- Images vary in LIGHTING (bright, dark, outdoor, indoor)")
print("- Images vary in ANGLE (front, side, close-up, far away)")
print("\nThis VARIETY is what makes the problem challenging and interesting!")

---

<a id='part4'></a>
# Part 4: Image Preprocessing

Before feeding images to a CNN, we need to preprocess them. This is CRITICAL for model performance!

## Why Preprocess Images?

| Issue | Problem | Solution |
|-------|---------|----------|
| **Different Sizes** | CNNs expect fixed input size | Resize all images to same dimensions |
| **Pixel Range** | Raw pixels are 0-255 (large values) | Normalize to 0-1 or -1 to 1 |
| **Class Imbalance** | Model might favor majority class | Balance classes or use class weights |
| **Limited Data** | Risk of overfitting | Data augmentation (Part 5) |

## Resizing: Why 150x150?

Common CNN input sizes:
- **224x224**: Standard for many pre-trained models (VGG, ResNet)
- **150x150**: Good balance between detail and computation
- **299x299**: For Inception models
- **32x32**: For lightweight models (CIFAR-10)

Smaller = Faster training but less detail  
Larger = More detail but slower training

## Normalization: Why Divide by 255?

**Raw pixel values**: 0-255 (integers)  
**Normalized values**: 0.0-1.0 (floats)

Benefits:
1. **Smaller gradient updates** (more stable training)
2. **Faster convergence** (weights update efficiently)
3. **Prevents saturation** (activation functions work better)

Formula: `normalized_pixel = pixel / 255.0`

In [None]:
# Demonstration: Before and After Normalization
sample_img_path = os.path.join(CAT_DIR, [f for f in os.listdir(CAT_DIR) if f.endswith('.jpg')][0])

# Load image
img = load_img(sample_img_path, target_size=(150, 150))
img_array = img_to_array(img)

print("Image Preprocessing Demo")
print("="*60)
print(f"\nOriginal image shape: {img_array.shape}")
print(f"Pixel value range: [{img_array.min():.0f}, {img_array.max():.0f}]")
print(f"Data type: {img_array.dtype}")

# Normalize
img_normalized = img_array / 255.0

print(f"\nNormalized image shape: {img_normalized.shape}")
print(f"Pixel value range: [{img_normalized.min():.3f}, {img_normalized.max():.3f}]")
print(f"Data type: {img_normalized.dtype}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].imshow(img_array.astype('uint8'))
axes[0].set_title('Original Image\\n(Pixels: 0-255)', fontweight='bold')
axes[0].axis('off')

axes[1].imshow(img_normalized)
axes[1].set_title('Normalized Image\\n(Pixels: 0.0-1.0)', fontweight='bold')
axes[1].axis('off')

plt.tight_layout()
plt.show()

print("\nNote: Both images LOOK the same, but pixel values are scaled!")
print("This scaling helps neural networks learn better.")

## 4.1 Creating Train/Validation/Test Splits

For image classification, we typically split data into THREE sets:

| Set | Size | Purpose |
|-----|------|---------|
| **Training** | 70-80% | Train the model |
| **Validation** | 10-15% | Tune hyperparameters, monitor overfitting |
| **Test** | 10-15% | Final evaluation (never seen during training) |

**Why Validation Set?**  
Without it, we might overfit to the test set by trying different hyperparameters!

In [None]:
# Create data generators for training and validation
# We'll use a subset of data for faster training (you can use more!)

# Image parameters
IMG_HEIGHT = 150
IMG_WIDTH = 150
BATCH_SIZE = 32

# For this notebook, we'll use a smaller subset for demonstration
# In production, you'd use all the data
TRAIN_SAMPLES = 2000  # Per class
VAL_SAMPLES = 500     # Per class  
TEST_SAMPLES = 500    # Per class

print("Data Split Configuration")
print("="*60)
print(f"Image dimensions: {IMG_HEIGHT}x{IMG_WIDTH}")
print(f"Batch size: {BATCH_SIZE}")
print(f"\nSamples per class:")
print(f"  Training: {TRAIN_SAMPLES}")
print(f"  Validation: {VAL_SAMPLES}")
print(f"  Test: {TEST_SAMPLES}")
print(f"\nTotal samples:")
print(f"  Training: {TRAIN_SAMPLES * 2:,} (cats + dogs)")
print(f"  Validation: {VAL_SAMPLES * 2:,}")
print(f"  Test: {TEST_SAMPLES * 2:,}")

# Create basic data generator (no augmentation yet)
train_datagen = ImageDataGenerator(
    rescale=1./255,           # Normalize to 0-1
    validation_split=0.2      # 20% for validation
)

# Test generator (only normalization, no augmentation)
test_datagen = ImageDataGenerator(rescale=1./255)

print("\nData generators created!")
print("  - Training: Rescaling to [0, 1]")
print("  - Validation: Rescaling to [0, 1]")
print("  - Test: Rescaling to [0, 1]")

---

<a id='part5'></a>
# Part 5: Data Augmentation

## What is Data Augmentation?

**Data augmentation** artificially increases training data by applying random transformations to existing images.

### Why Do We Need It?

| Problem | Solution |
|---------|----------|
| **Limited data** | Generate more training examples |
| **Overfitting** | Model learns robust features, not memorization |
| **Real-world variance** | Images vary in rotation, zoom, lighting |

## Common Augmentation Techniques

| Technique | What It Does | Example Use |
|-----------|--------------|-------------|
| **Rotation** | Rotate image by angle | Cat can be tilted |
| **Horizontal Flip** | Mirror image left-right | Dog facing either direction |
| **Zoom** | Zoom in/out | Cat close-up or far away |
| **Width/Height Shift** | Shift image position | Object not always centered |
| **Shear** | Slant image | Different perspectives |
| **Brightness** | Adjust lighting | Indoor vs outdoor |

### Important: Only Augment Training Data!

- **Training set**: Apply augmentation
- **Validation/Test sets**: NO augmentation (we want real-world performance)

## How It Helps Learning

Without augmentation:
```
Model sees: Cat facing left, centered, bright lighting
Model learns: "Cat = exactly this pose and lighting"
Real world: Cat facing right, off-center, dim lighting → FAILS!
```

With augmentation:
```
Model sees: Cats in many poses, positions, lighting conditions
Model learns: "Cat = general cat features (ears, whiskers, fur)"
Real world: Any cat → SUCCEEDS!
```

In [None]:
# Create augmented data generator
augmented_datagen = ImageDataGenerator(
    rescale=1./255,                # Normalize
    rotation_range=40,             # Randomly rotate by 0-40 degrees
    width_shift_range=0.2,         # Randomly shift horizontally by 20%
    height_shift_range=0.2,        # Randomly shift vertically by 20%
    shear_range=0.2,               # Shear transformation
    zoom_range=0.2,                # Randomly zoom in/out by 20%
    horizontal_flip=True,          # Randomly flip images
    fill_mode='nearest',           # How to fill pixels after transformation
    validation_split=0.2           # 20% for validation
)

print("Augmented Data Generator Created!")
print("="*60)
print("\nAugmentation Parameters:")
print(f"  Rotation: 0-40 degrees")
print(f"  Horizontal shift: ±20%")
print(f"  Vertical shift: ±20%")
print(f"  Shear: 0.2")
print(f"  Zoom: ±20%")
print(f"  Horizontal flip: Yes")
print(f"  Fill mode: nearest")
print("\nThese transformations are applied RANDOMLY during training!")

In [None]:
# Visualize augmentation examples
# Show how ONE image looks after multiple random augmentations

# Load a sample image
sample_cat_img = load_img(sample_img_path, target_size=(150, 150))
sample_array = img_to_array(sample_cat_img)
sample_array = sample_array.reshape((1,) + sample_array.shape)  # Reshape for generator

print("Visualizing Data Augmentation")
print("="*60)
print("Showing how ONE image looks after random augmentations...")
print("Each augmented version is slightly different!\n")

fig, axes = plt.subplots(3, 4, figsize=(15, 12))
fig.suptitle('Same Image with Different Random Augmentations', fontweight='bold', fontsize=16)

# Original image in first subplot
axes[0, 0].imshow(sample_cat_img)
axes[0, 0].set_title('ORIGINAL', fontweight='bold', fontsize=12)
axes[0, 0].axis('off')

# Generate 11 augmented versions
i = 0
for batch in augmented_datagen.flow(sample_array, batch_size=1):
    ax_row = (i + 1) // 4
    ax_col = (i + 1) % 4
    
    axes[ax_row, ax_col].imshow(batch[0])
    axes[ax_row, ax_col].set_title(f'Augmented #{i+1}', fontsize=11)
    axes[ax_row, ax_col].axis('off')
    
    i += 1
    if i >= 11:
        break

plt.tight_layout()
plt.show()

print("\nKey Observations:")
print("- Each version is DIFFERENT (random rotation, zoom, shift, flip)")
print("- The cat is still recognizable in all versions")
print("- This teaches the model to be ROBUST to these variations")
print("- During training, EVERY image gets augmented differently each epoch!")

---

<a id='part6'></a>
# Part 6: Building a Simple CNN from Scratch

Now for the exciting part - building our own Convolutional Neural Network!

## Our CNN Architecture

We'll build a classic CNN with the following structure:

```
Input (150x150x3)
    ↓
[Conv2D (32 filters) → ReLU → MaxPool] → 75x75x32
    ↓
[Conv2D (64 filters) → ReLU → MaxPool] → 37x37x64
    ↓
[Conv2D (128 filters) → ReLU → MaxPool] → 18x18x128
    ↓
Flatten → 41,472 features
    ↓
Dense (512) → ReLU
    ↓
Dropout (0.5)
    ↓
Dense (1) → Sigmoid (Cat vs Dog)
```

## Layer-by-Layer Explanation

| Layer Type | Purpose | Parameters |
|------------|---------|------------|
| **Conv2D** | Extract features (edges, textures, patterns) | Filters, kernel size |
| **ReLU** | Non-linearity (allows learning complex patterns) | None |
| **MaxPooling2D** | Downsample (reduce size, prevent overfitting) | Pool size |
| **Flatten** | Convert 2D features to 1D vector | None |
| **Dense** | Fully connected layer (combines features) | Units |
| **Dropout** | Randomly drop neurons (prevents overfitting) | Dropout rate |
| **Sigmoid** | Output probability (0 = Cat, 1 = Dog) | None |

## Why This Architecture?

- **3 Conv blocks**: Each learns increasingly complex features
- **Filter progression (32→64→128)**: More complex features need more filters
- **MaxPooling**: Reduces spatial dimensions, prevents overfitting
- **Dropout**: Prevents overfitting by randomly disabling neurons
- **Sigmoid output**: Binary classification (Cat or Dog)

In [None]:
# Build the Simple CNN model
def build_simple_cnn(input_shape=(150, 150, 3)):
    """
    Build a simple CNN for binary image classification.
    
    Architecture:
    - 3 Convolutional blocks (Conv2D + MaxPool)
    - Flatten layer
    - Dense layer with dropout
    - Output layer with sigmoid
    
    Parameters:
    -----------
    input_shape : tuple
        Shape of input images (height, width, channels)
    
    Returns:
    --------
    model : keras.Model
        Compiled CNN model
    """
    model = Sequential([
        # Block 1: 32 filters
        Conv2D(32, (3, 3), activation='relu', input_shape=input_shape, name='conv1'),
        MaxPooling2D((2, 2), name='pool1'),
        
        # Block 2: 64 filters
        Conv2D(64, (3, 3), activation='relu', name='conv2'),
        MaxPooling2D((2, 2), name='pool2'),
        
        # Block 3: 128 filters
        Conv2D(128, (3, 3), activation='relu', name='conv3'),
        MaxPooling2D((2, 2), name='pool3'),
        
        # Flatten and Dense layers
        Flatten(name='flatten'),
        Dense(512, activation='relu', name='dense1'),
        Dropout(0.5, name='dropout'),
        
        # Output layer (binary classification)
        Dense(1, activation='sigmoid', name='output')
    ])
    
    return model

# Create the model
print("Building Simple CNN...")
print("="*60)

simple_cnn = build_simple_cnn()

print("Model created successfully!")
print("\nModel architecture:")

In [None]:
# Display model summary
simple_cnn.summary()

print("\n" + "="*60)
print("UNDERSTANDING THE MODEL SUMMARY")
print("="*60)
print("\nKey Information:")
print("1. LAYER TYPES: Conv2D, MaxPooling2D, Flatten, Dense, Dropout")
print("2. OUTPUT SHAPE: How data transforms at each layer")
print("3. PARAMETERS: Learnable weights (these get trained!)")
print("\nNote the progression:")
print("  - Spatial size DECREASES: 150→75→37→18 (pooling)")
print("  - Number of filters INCREASES: 32→64→128 (more complex features)")
print("  - Final output: Single neuron (0=Cat, 1=Dog)")

---

<a id='part7'></a>
# Part 7: Training the Model

## Compiling the Model

Before training, we need to configure the learning process:

| Component | Choice | Why |
|-----------|--------|-----|
| **Optimizer** | Adam | Adaptive learning rate, works well for CNNs |
| **Loss Function** | Binary Crossentropy | Binary classification (Cat or Dog) |
| **Metrics** | Accuracy | Easy to understand, balanced dataset |

### Loss Functions for Classification

| Problem Type | Loss Function |
|--------------|---------------|
| **Binary** (2 classes) | Binary Crossentropy |
| **Multi-class** (3+ classes) | Categorical Crossentropy |
| **Multi-label** (multiple tags) | Binary Crossentropy |

## Callbacks: Monitoring Training

Callbacks are functions that run during training:

| Callback | Purpose |
|----------|---------|
| **EarlyStopping** | Stop if validation loss doesn't improve (prevents overfitting) |
| **ModelCheckpoint** | Save best model weights (don't lose progress!) |
| **ReduceLROnPlateau** | Reduce learning rate when stuck (helps convergence) |

## What Happens During Training?

Each **epoch** (one pass through all training data):
1. **Forward pass**: Make predictions
2. **Compute loss**: How wrong are we?
3. **Backward pass**: Compute gradients (backpropagation)
4. **Update weights**: Adjust using optimizer (Adam)
5. **Validate**: Check performance on validation set
6. **Repeat**

In [None]:
# Compile the model
simple_cnn.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("Model compiled successfully!")
print("="*60)
print("Configuration:")
print(f"  Optimizer: Adam")
print(f"  Loss function: Binary Crossentropy")
print(f"  Metrics: Accuracy")

# Setup callbacks
early_stop = callbacks.EarlyStopping(
    monitor='val_loss',
    patience=5,
    restore_best_weights=True,
    verbose=1
)

checkpoint = callbacks.ModelCheckpoint(
    'best_simple_cnn.h5',
    monitor='val_accuracy',
    save_best_only=True,
    verbose=1
)

reduce_lr = callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=3,
    min_lr=1e-7,
    verbose=1
)

print("\nCallbacks configured:")
print("  - EarlyStopping: Stop if val_loss doesn't improve for 5 epochs")
print("  - ModelCheckpoint: Save best model based on val_accuracy")
print("  - ReduceLROnPlateau: Reduce LR by 50% if val_loss plateaus")

In [None]:
# Create data generators from directory
# Note: In a real scenario, you'd organize images into train/val/test folders
# For this demo, we'll use flow_from_directory with a subset

# Since we're using the raw directory structure, let's create generators directly
# We'll use a small subset for demonstration (you can use more in production)

print("Setting up data generators...")
print("="*60)

# Training generator with augmentation
train_generator = augmented_datagen.flow_from_directory(
    BASE_DIR,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='training',
    shuffle=True,
    seed=42
)

# Validation generator (no augmentation)
val_datagen = ImageDataGenerator(rescale=1./255, validation_split=0.2)

val_generator = val_datagen.flow_from_directory(
    BASE_DIR,
    target_size=(IMG_HEIGHT, IMG_WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='binary',
    subset='validation',
    shuffle=False,
    seed=42
)

print(f"\nTraining batches: {len(train_generator)}")
print(f"Validation batches: {len(val_generator)}")
print(f"\nClass indices: {train_generator.class_indices}")
print("(0 = Cat, 1 = Dog or vice versa depending on alphabetical order)")

In [None]:
# Train the model
# Note: For demonstration, we'll train for fewer epochs
# In production, you might train for 25-50 epochs

EPOCHS = 20  # Adjust based on your needs

print("Starting training...")
print("="*60)
print(f"Epochs: {EPOCHS}")
print(f"Training samples: {train_generator.samples}")
print(f"Validation samples: {val_generator.samples}")
print(f"Batch size: {BATCH_SIZE}")
print("\nThis may take a while depending on your hardware...")
print("(GPU: ~2-5 min, CPU: ~20-30 min)")
print("="*60)

# Train!
history = simple_cnn.fit(
    train_generator,
    epochs=EPOCHS,
    validation_data=val_generator,
    callbacks=[early_stop, checkpoint, reduce_lr],
    verbose=1
)

print("\n" + "="*60)
print("TRAINING COMPLETE!")
print("="*60)

In [None]:
# Plot training history
def plot_training_history(history):
    """Plot training and validation accuracy/loss curves."""
    fig, axes = plt.subplots(1, 2, figsize=(14, 5))
    
    # Accuracy curves
    axes[0].plot(history.history['accuracy'], label='Training Accuracy', linewidth=2)
    axes[0].plot(history.history['val_accuracy'], label='Validation Accuracy', linewidth=2)
    axes[0].set_xlabel('Epoch')
    axes[0].set_ylabel('Accuracy')
    axes[0].set_title('Model Accuracy Over Time', fontweight='bold')
    axes[0].legend()
    axes[0].grid(True, alpha=0.3)
    
    # Loss curves
    axes[1].plot(history.history['loss'], label='Training Loss', linewidth=2)
    axes[1].plot(history.history['val_loss'], label='Validation Loss', linewidth=2)
    axes[1].set_xlabel('Epoch')
    axes[1].set_ylabel('Loss')
    axes[1].set_title('Model Loss Over Time', fontweight='bold')
    axes[1].legend()
    axes[1].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

plot_training_history(history)

print("\nInterpretation:")
print("="*60)
print("\nLEFT (Accuracy):")
print("  - Training accuracy (blue): How well model fits training data")
print("  - Validation accuracy (orange): How well model generalizes")
print("  - Goal: Both should increase and converge")
print("\nRIGHT (Loss):")
print("  - Training loss (blue): Error on training data")
print("  - Validation loss (orange): Error on validation data")
print("  - Goal: Both should decrease and converge")
print("\nSigns of overfitting:")
print("  - Training accuracy much higher than validation accuracy")
print("  - Validation loss starts increasing while training loss decreases")
print("\nSigns of good training:")
print("  - Both curves improve together")
print("  - Small gap between training and validation metrics")

---

<a id='part8'></a>
# Part 8: Model Evaluation

Now let's evaluate our trained model on the test set!

## Evaluation Metrics for Binary Classification

| Metric | Formula | Meaning |
|--------|---------|---------|
| **Accuracy** | (TP + TN) / Total | Overall correctness |
| **Precision** | TP / (TP + FP) | Of predicted cats, how many are actually cats? |
| **Recall** | TP / (TP + FN) | Of actual cats, how many did we find? |
| **F1-Score** | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |

## Confusion Matrix

Shows the breakdown of predictions:

```
                 Predicted
             Cat        Dog
Actual Cat  [TP]       [FN]
       Dog  [FP]       [TN]
```

- **TP (True Positive)**: Correctly predicted cat
- **TN (True Negative)**: Correctly predicted dog
- **FP (False Positive)**: Predicted cat, but actually dog
- **FN (False Negative)**: Predicted dog, but actually cat

In [None]:
# Evaluate on validation set
val_loss, val_accuracy = simple_cnn.evaluate(val_generator, verbose=0)

print("Validation Performance")
print("="*60)
print(f"Validation Loss: {val_loss:.4f}")
print(f"Validation Accuracy: {val_accuracy:.4f} ({val_accuracy*100:.2f}%)")

# Get predictions for confusion matrix
val_generator.reset()
predictions = simple_cnn.predict(val_generator, verbose=1)
predicted_classes = (predictions > 0.5).astype(int).flatten()
true_classes = val_generator.classes

# Calculate metrics
from sklearn.metrics import classification_report, confusion_matrix

print("\n" + "="*60)
print("CLASSIFICATION REPORT")
print("="*60)
print(classification_report(true_classes, predicted_classes, 
                           target_names=['Cat', 'Dog']))

# Confusion matrix
cm = confusion_matrix(true_classes, predicted_classes)
print("\n" + "="*60)
print("CONFUSION MATRIX")
print("="*60)
print(cm)
print("\nInterpretation:")
print(f"  True Cats correctly identified: {cm[0,0]}")
print(f"  True Cats misclassified as Dogs: {cm[0,1]}")
print(f"  True Dogs misclassified as Cats: {cm[1,0]}")
print(f"  True Dogs correctly identified: {cm[1,1]}")

In [None]:
# Visualize confusion matrix
fig, ax = plt.subplots(figsize=(8, 6))

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax,
            xticklabels=['Cat', 'Dog'], yticklabels=['Cat', 'Dog'],
            annot_kws={'size': 16, 'weight': 'bold'})

ax.set_xlabel('Predicted Label', fontsize=12)
ax.set_ylabel('True Label', fontsize=12)
ax.set_title('Confusion Matrix - Simple CNN', fontweight='bold', fontsize=14)

plt.tight_layout()
plt.show()

print("\nHow to Read:")
print("- Diagonal (top-left to bottom-right): Correct predictions")
print("- Off-diagonal: Misclassifications")
print("- Darker blue = higher count")

## 8.1 Visualizing Misclassified Examples

Let's see where our model makes mistakes - this helps us understand its limitations!

In [None]:
# Find misclassified images
misclassified_indices = np.where(predicted_classes != true_classes)[0]
class_names = list(val_generator.class_indices.keys())

print(f"Total misclassified images: {len(misclassified_indices)}")
print(f"Misclassification rate: {len(misclassified_indices)/len(true_classes)*100:.2f}%")

# Visualize some misclassified examples
if len(misclassified_indices) > 0:
    num_examples = min(8, len(misclassified_indices))
    fig, axes = plt.subplots(2, 4, figsize=(16, 8))
    fig.suptitle('Misclassified Examples', fontweight='bold', fontsize=16)
    
    # Get image files
    val_generator.reset()
    filenames = val_generator.filenames
    
    for i, ax in enumerate(axes.flat):
        if i < num_examples:
            idx = misclassified_indices[i]
            img_path = os.path.join(BASE_DIR, filenames[idx])
            
            try:
                img = load_img(img_path, target_size=(IMG_HEIGHT, IMG_WIDTH))
                ax.imshow(img)
                
                true_label = class_names[true_classes[idx]]
                pred_label = class_names[predicted_classes[idx]]
                confidence = predictions[idx][0] if predicted_classes[idx] == 1 else 1 - predictions[idx][0]
                
                ax.set_title(f'True: {true_label}\\nPredicted: {pred_label}\\n(Conf: {confidence:.2%})',
                           fontsize=10)
                ax.axis('off')
            except:
                ax.axis('off')
        else:
            ax.axis('off')
    
    plt.tight_layout()
    plt.show()
    
    print("\nWhy misclassifications happen:")
    print("- Unusual poses or angles")
    print("- Poor lighting or image quality")
    print("- Partial views of the animal")
    print("- Multiple animals in one image")
    print("- Breed characteristics that confuse the model")
else:
    print("\nNo misclassifications! Perfect model (unlikely in real scenarios).")

---

<a id='part9'></a>
# Part 9: Transfer Learning with VGG16

## What is Transfer Learning?

**Transfer Learning** = Using a pre-trained model as a starting point for your task.

Instead of training from scratch, we use a model that was already trained on millions of images!

### The Analogy

Imagine learning to drive a car:

| From Scratch | Transfer Learning |
|--------------|-------------------|
| Learn basic physics | Already know how vehicles work |
| Learn traffic rules | Already know traffic rules |
| Practice driving | Just learn this specific car! |

## How Transfer Learning Works

1. **Take a pre-trained model** (trained on ImageNet - 1.4M images, 1000 classes)
2. **Remove the top layers** (classifier specific to ImageNet)
3. **Freeze the base** (keep learned features like edges, textures, shapes)
4. **Add custom classifier** (for our task: Cat vs Dog)
5. **Train only new layers** (much faster!)

## Why Transfer Learning?

| Benefit | Explanation |
|---------|-------------|
| **Less data needed** | Pre-trained model already knows general features |
| **Faster training** | Only train classifier, not entire network |
| **Better performance** | Leverages knowledge from millions of images |
| **Less compute** | No need for powerful GPUs for weeks |

## Popular Pre-trained Models

| Model | Parameters | ImageNet Top-1 Accuracy | Use Case |
|-------|------------|------------------------|----------|
| **VGG16** | 138M | 71.3% | Good general features, easy to use |
| **ResNet50** | 25M | 76.1% | Good balance of size and accuracy |
| **InceptionV3** | 24M | 77.9% | Multi-scale features |
| **MobileNet** | 4M | 70.4% | Lightweight, for mobile devices |

We'll use **VGG16** because it's simple and effective!

In [None]:
# Build Transfer Learning model with VGG16
def build_vgg16_model(input_shape=(150, 150, 3)):
    """
    Build a transfer learning model using VGG16 as base.
    
    Parameters:
    -----------
    input_shape : tuple
        Shape of input images
    
    Returns:
    --------
    model : keras.Model
        Transfer learning model
    """
    # Load VGG16 pre-trained on ImageNet (without top classification layers)
    base_model = VGG16(
        weights='imagenet',      # Use ImageNet pre-trained weights
        include_top=False,       # Exclude final dense layers
        input_shape=input_shape
    )
    
    # Freeze the base model (don't train these layers)
    base_model.trainable = False
    
    # Build custom classifier on top
    model = Sequential([
        base_model,
        Flatten(),
        Dense(256, activation='relu'),
        Dropout(0.5),
        Dense(1, activation='sigmoid')  # Binary classification
    ])
    
    return model, base_model

print("Building VGG16 Transfer Learning Model...")
print("="*60)

vgg_model, vgg_base = build_vgg16_model()

print("Model created successfully!")
print(f"\nBase model (VGG16) layers: {len(vgg_base.layers)}")
print(f"Base model trainable: {vgg_base.trainable}")
print("\nAll VGG16 layers are FROZEN (won't be trained)")
print("Only our custom classifier layers will be trained!")

In [None]:
# Model summary
vgg_model.summary()

print("\n" + "="*60)
print("KEY OBSERVATIONS")
print("="*60)
print("\n1. VGG16 Base:")
print("   - 14.7M parameters (all pre-trained on ImageNet)")
print("   - These parameters are FROZEN (not trainable)")
print("\n2. Our Custom Classifier:")
print("   - Only a few thousand parameters to train")
print("   - Much faster than training from scratch!")
print("\n3. Total Parameters:")
print("   - Millions of parameters, but only training a fraction")
print("   - This is the power of transfer learning!")

In [None]:
# Compile and train VGG16 model
vgg_model.compile(
    optimizer='adam',
    loss='binary_crossentropy',
    metrics=['accuracy']
)

print("VGG16 model compiled!")
print("\nStarting training (Transfer Learning)...")
print("="*60)
print("Note: This should be FASTER than training from scratch")
print("      We're only training the classifier, not the entire network!")
print("="*60)

# Train (fewer epochs needed due to transfer learning)
vgg_history = vgg_model.fit(
    train_generator,
    epochs=10,  # Fewer epochs needed!
    validation_data=val_generator,
    callbacks=[early_stop, checkpoint, reduce_lr],
    verbose=1
)

print("\n" + "="*60)
print("VGG16 TRANSFER LEARNING COMPLETE!")
print("="*60)

In [None]:
# Plot VGG16 training history
plot_training_history(vgg_history)

print("\nCompare with Simple CNN:")
print("- Transfer learning often converges faster")
print("- May achieve higher accuracy with fewer epochs")
print("- Less prone to overfitting (pre-trained features are robust)")

---

<a id='part10'></a>
# Part 10: Model Comparison

Let's compare our Simple CNN vs VGG16 Transfer Learning!

In [None]:
# Compare models side by side
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Accuracy comparison
axes[0, 0].plot(history.history['accuracy'], label='Simple CNN (Train)', linewidth=2)
axes[0, 0].plot(vgg_history.history['accuracy'], label='VGG16 (Train)', linewidth=2, linestyle='--')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].set_title('Training Accuracy Comparison', fontweight='bold')
axes[0, 0].legend()
axes[0, 0].grid(True, alpha=0.3)

axes[0, 1].plot(history.history['val_accuracy'], label='Simple CNN (Val)', linewidth=2)
axes[0, 1].plot(vgg_history.history['val_accuracy'], label='VGG16 (Val)', linewidth=2, linestyle='--')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Accuracy')
axes[0, 1].set_title('Validation Accuracy Comparison', fontweight='bold')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Loss comparison
axes[1, 0].plot(history.history['loss'], label='Simple CNN (Train)', linewidth=2)
axes[1, 0].plot(vgg_history.history['loss'], label='VGG16 (Train)', linewidth=2, linestyle='--')
axes[1, 0].set_xlabel('Epoch')
axes[1, 0].set_ylabel('Loss')
axes[1, 0].set_title('Training Loss Comparison', fontweight='bold')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

axes[1, 1].plot(history.history['val_loss'], label='Simple CNN (Val)', linewidth=2)
axes[1, 1].plot(vgg_history.history['val_loss'], label='VGG16 (Val)', linewidth=2, linestyle='--')
axes[1, 1].set_xlabel('Epoch')
axes[1, 1].set_ylabel('Loss')
axes[1, 1].set_title('Validation Loss Comparison', fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print("\nKey Insights:")
print("="*60)
print("Note: Epoch counts differ, so compare final performance:")
print(f"\nSimple CNN:")
print(f"  Final train accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"  Final val accuracy: {history.history['val_accuracy'][-1]:.4f}")
print(f"\nVGG16 Transfer Learning:")
print(f"  Final train accuracy: {vgg_history.history['accuracy'][-1]:.4f}")
print(f"  Final val accuracy: {vgg_history.history['val_accuracy'][-1]:.4f}")

In [None]:
# Create comparison table
comparison_data = {
    'Model': ['Simple CNN', 'VGG16 Transfer Learning'],
    'Parameters': [simple_cnn.count_params(), vgg_model.count_params()],
    'Trainable Params': [
        sum([tf.size(w).numpy() for w in simple_cnn.trainable_weights]),
        sum([tf.size(w).numpy() for w in vgg_model.trainable_weights])
    ],
    'Train Accuracy': [
        f"{history.history['accuracy'][-1]:.4f}",
        f"{vgg_history.history['accuracy'][-1]:.4f}"
    ],
    'Val Accuracy': [
        f"{history.history['val_accuracy'][-1]:.4f}",
        f"{vgg_history.history['val_accuracy'][-1]:.4f}"
    ],
    'Training Time': ['Baseline', 'Similar or faster']
}

comparison_df = pd.DataFrame(comparison_data)

print("\nMODEL COMPARISON TABLE")
print("="*80)
print(comparison_df.to_string(index=False))
print("="*80)

print("\nConclusions:")
print("1. Transfer Learning (VGG16):")
print("   - More total parameters, but most are pre-trained (frozen)")
print("   - Often achieves higher accuracy with less training")
print("   - Great for small datasets")
print("\n2. Simple CNN:")
print("   - Fewer parameters, trains everything from scratch")
print("   - Good for understanding CNN fundamentals")
print("   - May need more data and training time")
print("\n3. When to use each:")
print("   - Use Transfer Learning: Limited data, want quick results")
print("   - Build from Scratch: Lots of data, very specific domain")

---

<a id='part11'></a>
# Part 11: Visualizing What CNNs Learn

One of the most fascinating aspects of CNNs is visualizing what they actually learn!

## What Do Different Layers Learn?

| Layer | What It Detects | Example |
|-------|-----------------|---------|
| **First Conv Layer** | Basic features (edges, colors, textures) | Horizontal/vertical edges |
| **Second Conv Layer** | Simple patterns (corners, circles) | Curves, basic shapes |
| **Third Conv Layer** | Complex patterns (parts) | Eyes, ears, whiskers |
| **Deeper Layers** | High-level features (whole objects) | Cat face, dog body |

## Visualization Techniques

1. **Filter Visualization**: See the filters (kernels) the network learned
2. **Feature Maps**: See what each layer activates on for a specific image
3. **Grad-CAM**: Heatmap showing which parts of image the network focuses on

Let's visualize filters from our Simple CNN!

In [None]:
# Visualize filters from first convolutional layer
def visualize_filters(model, layer_name, num_filters=32):
    """Visualize filters from a convolutional layer."""
    # Get the layer
    layer = model.get_layer(layer_name)
    filters = layer.get_weights()[0]  # Shape: (height, width, channels, num_filters)
    
    # Normalize filter values to 0-1
    f_min, f_max = filters.min(), filters.max()
    filters = (filters - f_min) / (f_max - f_min)
    
    # Plot filters
    n_filters = min(num_filters, filters.shape[3])
    n_cols = 8
    n_rows = n_filters // n_cols + (1 if n_filters % n_cols else 0)
    
    fig, axes = plt.subplots(n_rows, n_cols, figsize=(16, n_rows * 2))
    fig.suptitle(f'Filters from {layer_name}', fontweight='bold', fontsize=16)
    
    for i in range(n_rows * n_cols):
        row = i // n_cols
        col = i % n_cols
        ax = axes[row, col] if n_rows > 1 else axes[col]
        
        if i < n_filters:
            # Get filter (handle 3-channel input for first layer)
            filter_img = filters[:, :, :, i]
            if filter_img.shape[2] == 3:  # RGB
                ax.imshow(filter_img)
            else:  # Grayscale
                ax.imshow(filter_img[:, :, 0], cmap='gray')
            ax.set_title(f'Filter {i+1}', fontsize=9)
        ax.axis('off')
    
    plt.tight_layout()
    plt.show()

print("Visualizing filters from first convolutional layer...")
visualize_filters(simple_cnn, 'conv1', num_filters=32)

print("\nWhat you're seeing:")
print("- Each small square is a LEARNED FILTER (3x3 kernel)")
print("- These filters detect basic features like:")
print("  - Edges (vertical, horizontal, diagonal)")
print("  - Color patterns (red, green, blue combinations)")
print("  - Textures (rough, smooth)")
print("\nThe network LEARNED these automatically from data!")
print("We didn't tell it to look for edges - it discovered that on its own!")

## 11.1 Visualizing Feature Maps (Activations)

Let's see how each layer responds to a specific image!

In [None]:
# Visualize feature maps for a sample image
def visualize_feature_maps(model, img_array, layer_names):
    """
    Visualize activations (feature maps) from specified layers.
    
    Parameters:
    -----------
    model : keras.Model
        The CNN model
    img_array : numpy array
        Input image (already preprocessed)
    layer_names : list
        Names of layers to visualize
    """
    # Create a model that outputs activations from specified layers
    layer_outputs = [model.get_layer(name).output for name in layer_names]
    activation_model = Model(inputs=model.input, outputs=layer_outputs)
    
    # Get activations
    activations = activation_model.predict(img_array)
    
    # Plot feature maps for each layer
    for layer_name, activation in zip(layer_names, activations):
        n_features = activation.shape[-1]
        size = activation.shape[1]
        
        # Display up to 16 feature maps
        n_cols = 8
        n_rows = min(2, n_features // n_cols)
        
        fig, axes = plt.subplots(n_rows, n_cols, figsize=(16, n_rows * 2))
        fig.suptitle(f'Feature Maps from {layer_name} ({size}x{size})', 
                     fontweight='bold', fontsize=14)
        
        for i in range(n_rows * n_cols):
            row = i // n_cols
            col = i % n_cols
            ax = axes[row, col] if n_rows > 1 else axes[col]
            
            if i < n_features:
                ax.imshow(activation[0, :, :, i], cmap='viridis')
                ax.set_title(f'Filter {i+1}', fontsize=9)
            ax.axis('off')
        
        plt.tight_layout()
        plt.show()

# Load a test image
test_img_path = os.path.join(CAT_DIR, [f for f in os.listdir(CAT_DIR) if f.endswith('.jpg')][10])
test_img = load_img(test_img_path, target_size=(150, 150))
test_img_array = img_to_array(test_img) / 255.0
test_img_array = np.expand_dims(test_img_array, axis=0)

# Display original image
plt.figure(figsize=(5, 5))
plt.imshow(test_img)
plt.title('Original Image', fontweight='bold')
plt.axis('off')
plt.show()

print("Visualizing feature maps from each convolutional layer...")
print("="*60)

# Visualize activations from all conv layers
visualize_feature_maps(simple_cnn, test_img_array, ['conv1', 'conv2', 'conv3'])

print("\nKey Observations:")
print("="*60)
print("1. FIRST LAYER (conv1):")
print("   - Feature maps show edge detection")
print("   - Different filters detect different orientations")
print("   - Still recognizable as original image")
print("\n2. SECOND LAYER (conv2):")
print("   - More abstract features (combinations of edges)")
print("   - Less recognizable as original image")
print("   - Detects patterns like fur texture, eye shapes")
print("\n3. THIRD LAYER (conv3):")
print("   - Very abstract, high-level features")
print("   - Barely recognizable")
print("   - Represents complex patterns specific to cats/dogs")
print("\nThis HIERARCHY of features is why CNNs work so well!")

---

<a id='part12'></a>
# Part 12: Making Predictions on New Images

Let's use our trained model to predict whether new images are cats or dogs!

In [None]:
# Function to predict and visualize
def predict_image(model, img_path, class_indices):
    """
    Make prediction on a single image and display result.
    
    Parameters:
    -----------
    model : keras.Model
        Trained model
    img_path : str
        Path to image file
    class_indices : dict
        Dictionary mapping class names to indices
    """
    # Load and preprocess image
    img = load_img(img_path, target_size=(IMG_HEIGHT, IMG_WIDTH))
    img_array = img_to_array(img) / 255.0
    img_array = np.expand_dims(img_array, axis=0)
    
    # Make prediction
    prediction = model.predict(img_array, verbose=0)[0][0]
    
    # Get class names
    class_names = {v: k for k, v in class_indices.items()}
    
    # Determine predicted class
    if prediction > 0.5:
        predicted_class = class_names[1]
        confidence = prediction
    else:
        predicted_class = class_names[0]
        confidence = 1 - prediction
    
    return img, predicted_class, confidence

# Predict on random samples
print("Making Predictions on Random Images")
print("="*60)

fig, axes = plt.subplots(2, 4, figsize=(16, 8))
fig.suptitle('Predictions with Confidence Scores', fontweight='bold', fontsize=16)

# Get random images (mix of cats and dogs)
cat_files = [f for f in os.listdir(CAT_DIR) if f.endswith('.jpg')]
dog_files = [f for f in os.listdir(DOG_DIR) if f.endswith('.jpg')]

np.random.seed(42)
sample_images = (
    [os.path.join(CAT_DIR, f) for f in np.random.choice(cat_files, 4, replace=False)] +
    [os.path.join(DOG_DIR, f) for f in np.random.choice(dog_files, 4, replace=False)]
)

for i, ax in enumerate(axes.flat):
    try:
        img, pred_class, confidence = predict_image(
            simple_cnn, sample_images[i], train_generator.class_indices
        )
        
        ax.imshow(img)
        
        # Color code: green for high confidence, yellow for medium, red for low
        if confidence > 0.9:
            color = 'green'
        elif confidence > 0.7:
            color = 'orange'
        else:
            color = 'red'
        
        ax.set_title(f'{pred_class.upper()}\\nConfidence: {confidence:.1%}',
                    fontweight='bold', color=color, fontsize=12)
        ax.axis('off')
    except:
        ax.axis('off')

plt.tight_layout()
plt.show()

print("\nColor Code:")
print("  GREEN: High confidence (>90%)")
print("  ORANGE: Medium confidence (70-90%)")
print("  RED: Low confidence (<70%)")
print("\nNote: Low confidence predictions are more likely to be wrong!")

---

<a id='part13'></a>
# Part 13: Summary and Key Takeaways

## What We Accomplished

In this comprehensive notebook, we:

1. Understood CNN fundamentals
2. Explored the Cats vs Dogs dataset
3. Preprocessed images (resizing, normalization)
4. Applied data augmentation
5. Built a CNN from scratch
6. Implemented transfer learning with VGG16
7. Evaluated and compared models
8. Visualized what CNNs learn
9. Made predictions on new images

## Key Concepts Learned

| Concept | Key Takeaway |
|---------|--------------|
| **CNNs** | Automatically learn hierarchical features from images |
| **Convolution** | Extract local patterns using sliding filters |
| **Pooling** | Reduce spatial dimensions, provide translation invariance |
| **Data Augmentation** | Artificially increase data diversity to prevent overfitting |
| **Transfer Learning** | Leverage pre-trained models for better performance |
| **Feature Hierarchy** | Early layers detect edges, deeper layers detect complex patterns |

## Performance Summary

Our models achieved competitive performance on the Cats vs Dogs classification task!

In [None]:
# Create final summary dashboard
fig = plt.figure(figsize=(18, 12))
gs = fig.add_gridspec(3, 3, hspace=0.3, wspace=0.3)

# 1. Training curves comparison
ax1 = fig.add_subplot(gs[0, :])
ax1.plot(history.history['val_accuracy'], label='Simple CNN', linewidth=2, marker='o')
ax1.plot(vgg_history.history['val_accuracy'], label='VGG16', linewidth=2, marker='s')
ax1.set_xlabel('Epoch', fontsize=11)
ax1.set_ylabel('Validation Accuracy', fontsize=11)
ax1.set_title('Model Comparison: Validation Accuracy Over Time', fontweight='bold', fontsize=13)
ax1.legend(fontsize=11)
ax1.grid(True, alpha=0.3)

# 2. Confusion Matrix - Simple CNN
ax2 = fig.add_subplot(gs[1, 0])
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax2, cbar=False,
            xticklabels=['Cat', 'Dog'], yticklabels=['Cat', 'Dog'])
ax2.set_title('Simple CNN\\nConfusion Matrix', fontweight='bold', fontsize=11)

# 3. Sample predictions
ax3 = fig.add_subplot(gs[1, 1])
# Create a simple visualization
sample_img, pred_class, confidence = predict_image(
    simple_cnn, sample_images[0], train_generator.class_indices
)
ax3.imshow(sample_img)
ax3.set_title(f'Sample Prediction\\n{pred_class.upper()} ({confidence:.1%})', 
             fontweight='bold', fontsize=11)
ax3.axis('off')

# 4. Key metrics table
ax4 = fig.add_subplot(gs[1, 2])
ax4.axis('off')
metrics_text = f"""
FINAL METRICS

Simple CNN:
  Accuracy: {history.history['val_accuracy'][-1]:.2%}
  Parameters: {simple_cnn.count_params():,}
  
VGG16 Transfer:
  Accuracy: {vgg_history.history['val_accuracy'][-1]:.2%}
  Total Params: {vgg_model.count_params():,}
  Trainable: {sum([tf.size(w).numpy() for w in vgg_model.trainable_weights]):,}

Dataset:
  Training: {train_generator.samples:,}
  Validation: {val_generator.samples:,}
"""
ax4.text(0.1, 0.5, metrics_text, fontsize=10, verticalalignment='center',
        family='monospace', bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.3))

# 5. Data augmentation example
ax5 = fig.add_subplot(gs[2, 0])
aug_img = next(augmented_datagen.flow(sample_array, batch_size=1))[0]
ax5.imshow(aug_img)
ax5.set_title('Data Augmentation\\nExample', fontweight='bold', fontsize=11)
ax5.axis('off')

# 6. First layer filters
ax6 = fig.add_subplot(gs[2, 1])
layer = simple_cnn.get_layer('conv1')
filters = layer.get_weights()[0]
f_min, f_max = filters.min(), filters.max()
filters_norm = (filters - f_min) / (f_max - f_min)
# Show one filter
ax6.imshow(filters_norm[:, :, :, 0])
ax6.set_title('Learned Filter\\n(Conv Layer 1)', fontweight='bold', fontsize=11)
ax6.axis('off')

# 7. Architecture comparison
ax7 = fig.add_subplot(gs[2, 2])
ax7.axis('off')
arch_comparison = """
ARCHITECTURE

Simple CNN:
  Conv(32) → Pool
  Conv(64) → Pool  
  Conv(128) → Pool
  Dense(512) → Dropout
  Dense(1) - Sigmoid

VGG16 Transfer:
  VGG16 Base (FROZEN)
  Dense(256)
  Dropout(0.5)
  Dense(1) - Sigmoid
"""
ax7.text(0.1, 0.5, arch_comparison, fontsize=9, verticalalignment='center',
        family='monospace', bbox=dict(boxstyle='round', facecolor='lightblue', alpha=0.3))

plt.suptitle('CATS VS DOGS CNN - COMPREHENSIVE SUMMARY DASHBOARD', 
            fontweight='bold', fontsize=16, y=0.98)
plt.tight_layout()
plt.show()

print("\n" + "="*70)
print("NOTEBOOK COMPLETE!")
print("="*70)

## Key Takeaways Table

| Topic | What We Learned |
|-------|----------------|
| **CNNs are Hierarchical** | Early layers detect simple features (edges), deeper layers detect complex patterns (cat faces) |
| **Convolution > Fully Connected** | Shared weights reduce parameters massively, spatial structure is preserved |
| **Pooling is Essential** | Reduces dimensions, provides translation invariance, prevents overfitting |
| **Data Augmentation Helps** | Artificially increases dataset size, makes model robust to variations |
| **Transfer Learning is Powerful** | Pre-trained models give huge advantage with limited data |
| **Dropout Prevents Overfitting** | Randomly disabling neurons forces network to learn robust features |
| **Visualization is Insightful** | We can see what CNNs learn - not a black box! |

## When to Use What

| Scenario | Recommendation |
|----------|---------------|
| **Large dataset (>100k images)** | Train from scratch or fine-tune all layers |
| **Small dataset (<10k images)** | Use transfer learning, freeze base layers |
| **Very limited data (<1k images)** | Strong augmentation + transfer learning |
| **Custom domain (medical, satellite)** | May need domain-specific pre-training |
| **Mobile/Edge deployment** | Use lightweight models (MobileNet, SqueezeNet) |
| **High accuracy critical** | Ensemble multiple models, use state-of-the-art architectures |

## What's Next?

Want to improve further? Try:

1. **More Data**: Use the full 25k images (we used a subset)
2. **Advanced Architectures**: ResNet, EfficientNet, Vision Transformers
3. **Fine-tuning**: Unfreeze some VGG16 layers and train end-to-end
4. **Ensembling**: Combine predictions from multiple models
5. **Test-Time Augmentation**: Augment test images and average predictions
6. **Grad-CAM**: Visualize which regions drive predictions
7. **Object Detection**: Detect and locate multiple objects in images (YOLO, Faster R-CNN)

## Real-World Applications

These techniques apply to:

- **Medical Imaging**: Detect diseases from X-rays, MRIs, CT scans
- **Autonomous Vehicles**: Recognize pedestrians, traffic signs, lane markings
- **Security**: Face recognition, anomaly detection
- **Agriculture**: Crop disease detection, weed identification
- **Retail**: Product recognition, visual search
- **Wildlife**: Species identification, population monitoring

## Resources for Further Learning

- **Papers**: 
  - AlexNet (2012) - Started the deep learning revolution
  - VGGNet (2014) - Simple, effective architecture
  - ResNet (2015) - Skip connections, very deep networks
  - EfficientNet (2019) - Optimized scaling

- **Courses**:
  - Stanford CS231n: Convolutional Neural Networks
  - Fast.ai: Practical Deep Learning for Coders
  - Coursera: Deep Learning Specialization

- **Practice**:
  - Kaggle competitions
  - ImageNet dataset
  - COCO dataset for object detection

In [None]:
# Final summary printout
print("="*70)
print("CATS VS DOGS CNN IMAGE CLASSIFICATION - COMPLETED!")
print("="*70)

print("\n WHAT WE BUILT:")
print("  1. Simple CNN from scratch (3 conv blocks)")
print("  2. VGG16 Transfer Learning model")
print("  3. Complete image classification pipeline")
print("  4. Comprehensive visualizations")

print("\n TECHNIQUES MASTERED:")
print("  - Convolutional Neural Networks architecture")
print("  - Image preprocessing and normalization")
print("  - Data augmentation for robustness")
print("  - Transfer learning with pre-trained models")
print("  - Model evaluation and comparison")
print("  - Feature visualization")

print("\n KEY INSIGHTS:")
print("  - CNNs learn hierarchical features automatically")
print("  - Data augmentation prevents overfitting")
print("  - Transfer learning is powerful for limited data")
print("  - Visualization helps understand what models learn")

print("\n PERFORMANCE:")
print(f"  Simple CNN: ~{history.history['val_accuracy'][-1]*100:.1f}% accuracy")
print(f"  VGG16: ~{vgg_history.history['val_accuracy'][-1]*100:.1f}% accuracy")
print(f"  (State-of-the-art on this dataset: 98-99%)")

print("\n SKILLS GAINED:")
print("  Image classification")
print("  Deep learning")
print("  Computer vision")
print("  TensorFlow/Keras")
print("  Model evaluation")
print("  Transfer learning")

print("\n" + "="*70)
print("Thank you for completing this comprehensive CNN tutorial!")
print("You now have a solid foundation in image classification.")
print("="*70)

print("\n Next Steps:")
print("  1. Try this on other image datasets")
print("  2. Experiment with different architectures")
print("  3. Explore object detection (YOLO, Faster R-CNN)")
print("  4. Learn about semantic segmentation (U-Net)")
print("  5. Study advanced topics (GANs, Vision Transformers)")

print("\n Happy Learning!")
print("="*70)