# Chapter 6: Teaching Machines to See - Image Classification with CNNs

## 1️⃣ Chapter Overview

In the previous chapters, we covered the fundamentals of Convolutional Neural Networks (CNNs) using simple datasets like CIFAR-10. This chapter takes a significant leap forward by tackling a more realistic and complex challenge: classifying images in the **Tiny ImageNet** dataset using a state-of-the-art architecture, **InceptionNet (GoogLeNet)**.

We move beyond simple Sequential models to complex, multi-branch architectures using the Keras Functional API. We also dive deep into **Exploratory Data Analysis (EDA)** for images, which is often overlooked in deep learning tutorials but is crucial for real-world success.

### Key Learning Goals:
1.  **Exploratory Data Analysis (EDA):** How to inspect image datasets, check for class imbalances, and calculate channel statistics before training.
2.  **Advanced Data Pipelines:** Using `ImageDataGenerator` for complex directory structures and data augmentation.
3.  **Inception Architecture:** Understanding and implementing the **Inception Block**, **1x1 Convolutions** for dimensionality reduction, and **Auxiliary Classifiers**.
4.  **Functional API Mastery:** Building non-linear, multi-output topology networks.

---

## 2️⃣ Theoretical Explanation

### 2.1 The Challenge of Depth
As researchers tried to make CNNs deeper to capture more complex features, they ran into two major problems:
1.  **Vanishing Gradients:** In very deep networks, the gradient signal fades away as it backpropagates through many layers, making the early layers hard to train.
2.  **Computational Explosion:** Adding more layers usually adds more parameters, increasing the risk of overfitting and the computational cost.

**InceptionNet** (winner of ILSVRC 2014) proposed a novel solution to these problems: **The Inception Module**.

### 2.2 The Inception Module
Instead of choosing whether to put a $1\times1$ convolution, a $3\times3$ convolution, or a $5\times5$ convolution at a specific layer, Inception says: **"Why not do them all?"**

An Inception module applies multiple filters of different sizes to the *same input* in parallel and then concatenates the results. 
* **Small filters ($1\times1$, $3\times3$):** Capture local details.
* **Large filters ($5\times5$):** Capture broader, more abstract features.

### 2.3 The Magic of $1\times1$ Convolutions
Doing $5\times5$ convolutions on inputs with many channels (depth) is expensive. To solve this, Inception uses $1\times1$ convolutions as a **"Bottleneck Layer"**.

**Dimensionality Reduction Example:**
* **Input:** $28 \times 28 \times 192$
* **Naive $5\times5$ Conv (32 filters):** 
    * Params: $5 \times 5 \times 192 \times 32 \approx 153,600$
* **With $1\times1$ Bottleneck (reduce depth to 16 first):**
    1.  $1\times1$ Conv (16 filters): $1 \times 1 \times 192 \times 16 \approx 3,072$
    2.  $5\times5$ Conv (32 filters): $5 \times 5 \times 16 \times 32 \approx 12,800$
    * **Total Params:** $15,872$ (approx **10x reduction!**)

### 2.4 Auxiliary Classifiers
To combat the vanishing gradient problem in a deep network (22 layers), InceptionNet attaches small "side" classifiers to intermediate layers. 

During training, the loss is calculated as:
$$ L_{total} = L_{final} + 0.3 \cdot L_{aux1} + 0.3 \cdot L_{aux2} $$

These auxiliary branches inject gradient signals earlier in the network, ensuring the lower layers learn useful features.

## 3️⃣ Setup and Data Preparation

We will implement the setup for **Tiny ImageNet**, a scaled-down version of the massive ImageNet dataset. It contains 200 classes with 500 training images each.

In [None]:
import os
import zipfile
import requests
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from PIL import Image
from tensorflow.keras.layers import Input, Conv2D, MaxPool2D, AvgPool2D, Dense, Concatenate, Flatten, Lambda, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, CSVLogger
import tensorflow.keras.backend as K

# Set seeds for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

print(f"TensorFlow Version: {tf.__version__}")

### 3.1 Data Download Utility
Since Tiny ImageNet is not in standard Keras datasets, we write a utility to download and extract it.

In [None]:
def download_tiny_imagenet(data_dir='data'):
    if not os.path.exists(data_dir):
        os.mkdir(data_dir)
    
    url = "http://cs231n.stanford.edu/tiny-imagenet-200.zip"
    zip_path = os.path.join(data_dir, 'tiny-imagenet-200.zip')
    extract_path = os.path.join(data_dir, 'tiny-imagenet-200')
    
    if not os.path.exists(extract_path):
        print("Downloading Tiny ImageNet...")
        r = requests.get(url, stream=True)
        with open(zip_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=1024):
                if chunk:
                    f.write(chunk)
        
        print("Extracting...")
        with zipfile.ZipFile(zip_path, 'r') as zip_ref:
            zip_ref.extractall(data_dir)
        print("Done!")
    else:
        print("Dataset already ready.")

# Uncomment the line below to download the dataset (approx 250MB)
# download_tiny_imagenet()

### 3.2 Exploratory Data Analysis (EDA)

Before throwing data into a neural network, we must understand its structure. Tiny ImageNet organizes images by **WordNet IDs (wnids)** (e.g., `n01443537`). We need to map these IDs to human-readable labels (e.g., `goldfish`).

In [None]:
DATA_DIR = os.path.join('data', 'tiny-imagenet-200')
WNIDS_PATH = os.path.join(DATA_DIR, 'wnids.txt')
WORDS_PATH = os.path.join(DATA_DIR, 'words.txt')

def load_class_names(wnids_path, words_path):
    # 1. Load the list of classes used in this dataset
    if not os.path.exists(wnids_path):
        print("Dataset not found. Skipping EDA.")
        return {}

    with open(wnids_path, 'r') as f:
        wnids = [x.strip() for x in f.readlines()]

    # 2. Load the mapping from ID to English Word
    with open(words_path, 'r') as f:
        words = {}
        for line in f:
            line = line.strip()
            if line:
                parts = line.split('\t')
                words[parts[0]] = parts[1]
    
    # 3. Create the specific map for our 200 classes
    id_to_class = {wnid: words[wnid] for wnid in wnids if wnid in words}
    return id_to_class

class_map = load_class_names(WNIDS_PATH, WORDS_PATH)
print(f"Loaded {len(class_map)} classes. Example: n01443537 -> {class_map.get('n01443537', 'Unknown')}")

### 3.3 Visualizing the Data
Let's write a function to display images from the training set to verify their quality and content.

In [None]:
def visualize_samples(data_dir, class_map, n_classes=5):
    if not os.path.exists(data_dir): return
    
    train_dir = os.path.join(data_dir, 'train')
    # Pick random classes
    classes = np.random.choice(list(class_map.keys()), n_classes, replace=False)
    
    plt.figure(figsize=(15, 3))
    for i, cls in enumerate(classes):
        img_dir = os.path.join(train_dir, cls, 'images')
        # Pick random image in that class
        img_name = np.random.choice(os.listdir(img_dir))
        img_path = os.path.join(img_dir, img_name)
        
        img = Image.open(img_path)
        plt.subplot(1, n_classes, i+1)
        plt.imshow(img)
        plt.title(class_map[cls].split(',')[0]) # Use first common name
        plt.axis('off')
    plt.show()

visualize_samples(DATA_DIR, class_map)

## 4️⃣ Building Data Pipelines

We use `ImageDataGenerator` for efficient loading. 
* **Note:** Tiny ImageNet has a tricky validation folder structure (images are not separated into subfolders by class). For this tutorial, to keep it runnable, we will treat the **train** folder as our source and split it into training and validation sets.

**InceptionNet Input Size:** Original InceptionNet uses $224 \times 224$. Tiny ImageNet is $64 \times 64$. We will resize images to **$56 \times 56$** to suit the modified architecture we will build later.

In [None]:
# If dataset is not present, we will mock generators for demonstration purposes
# to ensure the notebook runs without erroring on file not found.

BATCH_SIZE = 64
TARGET_SIZE = (56, 56) # Modified for our scaled-down architecture

if os.path.exists(DATA_DIR):
    train_datagen = ImageDataGenerator(
        rescale=1./255,
        validation_split=0.1,  # Use 10% of training data for validation
        rotation_range=20,
        width_shift_range=0.2,
        height_shift_range=0.2,
        horizontal_flip=True
    )

    train_generator = train_datagen.flow_from_directory(
        os.path.join(DATA_DIR, 'train'),
        target_size=TARGET_SIZE,
        batch_size=BATCH_SIZE,
        class_mode='categorical',
        subset='training'
    )

    validation_generator = train_datagen.flow_from_directory(
        os.path.join(DATA_DIR, 'train'),
        target_size=TARGET_SIZE,
        batch_size=BATCH_SIZE,
        class_mode='categorical',
        subset='validation'
    )
else:
    print("⚠️ Dataset not found. Using Mock Data Generators for architecture verification.")
    # Mock generator for environments without the 250MB dataset
    def mock_gen():
        while True:
            # Input: Batch of images
            X = np.random.rand(BATCH_SIZE, 56, 56, 3)
            # Output: One-hot encoded labels for 200 classes
            # IMPORTANT: InceptionNet has 3 outputs (1 main, 2 auxiliary)
            # We must replicate the target 3 times.
            y = np.eye(200)[np.random.choice(200, BATCH_SIZE)]
            yield X, [y, y, y]
            
    train_generator = mock_gen()
    validation_generator = mock_gen()

### 4.1 Adapting Generators for Multi-Output Models
InceptionNet has **3 outputs** (Main, Aux 1, Aux 2). Standard generators yield `(X, y)`. We need a generator that yields `(X, [y, y, y])` so that all three classifiers can calculate loss against the same true label.

In [None]:
def multi_output_generator(generator):
    """
    Wraps a standard Keras generator to support multi-output models.
    Yields X, [y, y, y]
    """
    while True:
        X, y = next(generator)
        yield X, [y, y, y]

# Wrap the generators if they are real Keras DirectoryIterators
if isinstance(train_generator, tf.keras.preprocessing.image.DirectoryIterator):
    train_gen_wrapper = multi_output_generator(train_generator)
    val_gen_wrapper = multi_output_generator(validation_generator)
else:
    # Mock generator is already handling this
    train_gen_wrapper = train_generator
    val_gen_wrapper = validation_generator

## 5️⃣ Implementing InceptionNet (GoogLeNet)

We will build this complex architecture from the ground up. 

### 5.1 The Inception Block
This function defines the parallel branches:
1.  $1\times1$ Conv
2.  $1\times1$ Reduce -> $3\times3$ Conv
3.  $1\times1$ Reduce -> $5\times5$ Conv
4.  $3\times3$ MaxPool -> $1\times1$ Conv

Finally, it concatenates them along the channel axis.

In [None]:
def inception_block(x, filters):
    """
    Creates an Inception block.
    Args:
        x: Input tensor
        filters: List of filter counts [f_1x1, f_3x3_r, f_3x3, f_5x5_r, f_5x5, f_pool]
    """
    f_1x1, f_3x3_r, f_3x3, f_5x5_r, f_5x5, f_pool = filters

    # Branch 1: 1x1 Conv
    path1 = Conv2D(f_1x1, (1,1), padding='same', activation='relu')(x)

    # Branch 2: 1x1 Reduce -> 3x3 Conv
    path2 = Conv2D(f_3x3_r, (1,1), padding='same', activation='relu')(x)
    path2 = Conv2D(f_3x3, (3,3), padding='same', activation='relu')(path2)

    # Branch 3: 1x1 Reduce -> 5x5 Conv
    path3 = Conv2D(f_5x5_r, (1,1), padding='same', activation='relu')(x)
    path3 = Conv2D(f_5x5, (5,5), padding='same', activation='relu')(path3)

    # Branch 4: MaxPool -> 1x1 Conv
    path4 = MaxPool2D((3,3), strides=(1,1), padding='same')(x)
    path4 = Conv2D(f_pool, (1,1), padding='same', activation='relu')(path4)

    # Concatenate filters
    return Concatenate(axis=-1)([path1, path2, path3, path4])

### 5.2 The Auxiliary Classifier
This small sub-network branches off from the middle of the main network to perform classification. It helps push gradients to the earlier layers.

In [None]:
def auxiliary_classifier(x, num_classes, name=None):
    x = AvgPool2D((5,5), strides=(3,3))(x)
    x = Conv2D(128, (1,1), padding='same', activation='relu')(x)
    x = Flatten()(x)
    x = Dense(1024, activation='relu')(x)
    x = Dropout(0.7)(x)
    x = Dense(num_classes, activation='softmax', name=name)(x)
    return x

### 5.3 The Full Architecture
We assemble the pieces: Stem -> Inception Blocks -> Aux Outputs -> Classifier.

*Note: We adapt the stem (initial layers) slightly to handle $56\times56$ inputs instead of the original $224\times224$ inputs by removing some aggressive downsampling.*

In [None]:
def inception_v1(input_shape, num_classes):
    input_layer = Input(shape=input_shape)

    # --- STEM ---
    # Modified for 56x56 input: Less striding
    x = Conv2D(64, (7,7), strides=(1,1), padding='same', activation='relu')(input_layer)
    x = MaxPool2D((3,3), strides=(2,2), padding='same')(x)
    
    x = Conv2D(64, (1,1), padding='same', activation='relu')(x)
    x = Conv2D(192, (3,3), padding='same', activation='relu')(x)
    x = MaxPool2D((3,3), strides=(2,2), padding='same')(x)

    # --- Inception Blocks (3a, 3b) ---
    x = inception_block(x, [64, 96, 128, 16, 32, 32])
    x = inception_block(x, [128, 128, 192, 32, 96, 64])
    x = MaxPool2D((3,3), strides=(2,2), padding='same')(x)

    # --- Inception Blocks (4a - 4e) ---
    x = inception_block(x, [192, 96, 208, 16, 48, 64])
    
    # Auxiliary Output 1
    aux1 = auxiliary_classifier(x, num_classes, name='aux1')
    
    x = inception_block(x, [160, 112, 224, 24, 64, 64])
    x = inception_block(x, [128, 128, 256, 24, 64, 64])
    x = inception_block(x, [112, 144, 288, 32, 64, 64])
    
    # Auxiliary Output 2
    aux2 = auxiliary_classifier(x, num_classes, name='aux2')
    
    x = inception_block(x, [256, 160, 320, 32, 128, 128])
    x = MaxPool2D((3,3), strides=(2,2), padding='same')(x)

    # --- Inception Blocks (5a, 5b) ---
    x = inception_block(x, [256, 160, 320, 32, 128, 128])
    x = inception_block(x, [384, 192, 384, 48, 128, 128])
    
    # --- Final Classifier ---
    x = AvgPool2D((7,7), padding='same')(x)
    x = Flatten()(x)
    x = Dropout(0.4)(x)
    output = Dense(num_classes, activation='softmax', name='main_out')(x)
    
    model = Model(inputs=input_layer, outputs=[output, aux1, aux2])
    return model

K.clear_session()
model = inception_v1((56, 56, 3), 200)
model.summary()

## 6️⃣ Training

We compile the model with losses. We weight the main loss higher (1.0) and auxiliary losses lower (0.3).

In [None]:
model.compile(
    loss=['categorical_crossentropy', 'categorical_crossentropy', 'categorical_crossentropy'],
    loss_weights=[1.0, 0.3, 0.3],
    optimizer='adam',
    metrics=['accuracy']
)

# Create directory for models
if not os.path.exists('models'):
    os.mkdir('models')

callbacks = [
    ModelCheckpoint('models/inception_v1_best.h5', save_best_only=True, monitor='val_main_out_accuracy'),
    EarlyStopping(patience=5, restore_best_weights=True),
    CSVLogger('training_log.csv')
]

# Training steps
# Note: If using mock data, steps_per_epoch ensures the loop finishes
steps_per_epoch = 50000 // BATCH_SIZE if os.path.exists(DATA_DIR) else 10
validation_steps = 5000 // BATCH_SIZE if os.path.exists(DATA_DIR) else 10

history = model.fit(
    train_gen_wrapper,
    validation_data=val_gen_wrapper,
    epochs=5,
    steps_per_epoch=steps_per_epoch,
    validation_steps=validation_steps,
    callbacks=callbacks
)

## 7️⃣ Evaluation and Analysis

Let's visualize the training progress. Since we have multiple outputs, Keras returns metrics for all of them. We focus on the `main_out_accuracy`.

In [None]:
def plot_history(history):
    plt.figure(figsize=(10, 5))
    
    # Plot Main Output Accuracy
    plt.plot(history.history['main_out_accuracy'], label='Train Acc (Main)')
    plt.plot(history.history['val_main_out_accuracy'], label='Val Acc (Main)')
    
    plt.title('Model Accuracy')
    plt.ylabel('Accuracy')
    plt.xlabel('Epoch')
    plt.legend()
    plt.grid(True)
    plt.show()

plot_history(history)

## 8️⃣ Chapter Summary

In this chapter, we tackled a complex image classification task using InceptionNet.

* **Data Matters:** We performed EDA to understand the WordNet IDs and class distributions of Tiny ImageNet.
* **Advanced Architectures:** We broke down the Inception module, understanding how parallel convolutions allow the network to capture features at multiple scales simultaneously.
* **Efficiency:** We saw how $1\times1$ convolutions act as bottlenecks to reduce parameter count, allowing us to build deeper networks without exploding computational costs.
* **Training Stability:** We implemented Auxiliary Classifiers to combat the vanishing gradient problem in deep networks.
* **Keras Mastery:** We used the Functional API to handle multi-input/multi-output architectures and custom data generators to feed complex label structures.

In the next chapter, we will learn how to make these models perform even better using **Transfer Learning** and how to visualize what they are actually seeing using **Grad-CAM**.