# Chapter 6: Teaching machines to see: Image classification with CNNs

This notebook reproduces the code and summarizes the theoretical concepts from Chapter 6 of *'TensorFlow in Action'* by Thushan Ganegedara.

This chapter covers the end-to-end process of building a sophisticated image classifier. The key steps include:
1.  **Exploratory Data Analysis (EDA)**: Understanding our image dataset's structure, classes, and potential issues.
2.  **Data Pipelines**: Using the `ImageDataGenerator` to efficiently load and augment images from directories.
3.  **Advanced Model Implementation**: Building a complex, state-of-the-art CNN (Inception net v1) using the Keras Functional API.
4.  **Training and Evaluation**: Training the model and evaluating its performance on a test set.

---

## 6.1 Putting the data under the microscope: Exploratory Data Analysis (EDA)

Before building any model, we must first understand our data. EDA helps us answer critical questions:
* What classes are we working with?
* Is the dataset balanced? (i.e., equal number of images per class)
* What are the image properties (e.g., size, color channels)?

We will be using the **tiny-imagenet-200** dataset, a smaller version of the famous ImageNet dataset. It contains 200 different classes of objects.

In [None]:
import os
import requests
import zipfile
import pandas as pd
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

# Define a random seed for reproducibility
random_seed = 4321

# 1. Download and Extract the Dataset
data_dir = 'data'
zip_path = os.path.join(data_dir, 'tiny-imagenet-200.zip')
extract_path = os.path.join(data_dir, 'tiny-imagenet-200')

if not os.path.exists(extract_path):
    if not os.path.exists(zip_path):
        print("Downloading tiny-imagenet-200.zip (238 MB)...")
        url = "http://cs231n.stanford.edu/tiny-imagenet-200.zip"
        r = requests.get(url)
        os.makedirs(data_dir, exist_ok=True)
        with open(zip_path, 'wb') as f:
            f.write(r.content)
        print("Download complete.")
    else:
        print("Zip file already exists.")
    
    print("Extracting data...")
    with zipfile.ZipFile(zip_path, 'r') as zip_ref:
        zip_ref.extractall(data_dir)
    print("Extraction complete.")
else:
    print("Data already downloaded and extracted.")

### 6.1.1 & 6.1.2 Understanding the Data Structure and Classes

The dataset uses **WordNet IDs (wnids)** to label its classes. We need to map these IDs to human-readable names using the provided `words.txt` file. We will also count the number of training images for each class to check for balance.

In [None]:
data_dir = os.path.join('data', 'tiny-imagenet-200')
wnids_path = os.path.join(data_dir, 'wnids.txt')
words_path = os.path.join(data_dir, 'words.txt')

def get_tiny_imagenet_classes(wnids_path, words_path):
    """Reads wnids.txt and words.txt to create a mapping from class ID to class name."""
    # Read the list of 200 wnids used in the dataset
    with open(wnids_path, 'r') as f:
        wnids = [x.strip() for x in f]
    
    # Read the full mapping of all wnids to names
    words = pd.read_csv(words_path, sep='\t', index_col=0, header=None, names=['wnid', 'class'])
    
    # Filter the full mapping to only include the 200 classes we care about
    words_200 = words.loc[wnids].reset_index()
    return words_200

labels_df = get_tiny_imagenet_classes(wnids_path, words_path)
print("Class ID to Name Mapping (First 5):")
print(labels_df.head())

def get_image_count(data_dir):
    """Counts the number of JPEG files in a given folder."""
    if not os.path.exists(data_dir):
        return 0
    return len([f for f in os.listdir(data_dir) if f.lower().endswith('jpeg')])

# Apply the count function to each class's training images folder
train_image_dir = os.path.join(data_dir, 'train')
labels_df["n_train"] = labels_df["wnid"].apply(
    lambda x: get_image_count(os.path.join(train_image_dir, x, 'images'))
)

print("\nTraining Image Count Statistics:")
print(labels_df["n_train"].describe())

The description shows `count    200.0`, `mean    500.0`, `std    0.0`. This is excellent: it confirms the training set is **perfectly balanced**, with exactly 500 images for each of the 200 classes.

### 6.1.3 Computing Simple Statistics on the Data Set

Next, we check the dimensions of the images. CNNs require a fixed input size, so we need to know if our images are already uniform or if they will need resizing.

In [None]:
image_sizes = []
# We only check the first 25 classes for speed
for wnid in labels_df["wnid"].iloc[:25]:
    img_dir = os.path.join(train_image_dir, wnid, 'images')
    for f in os.listdir(img_dir):
        if f.endswith('JPEG'):
            image_sizes.append(Image.open(os.path.join(img_dir, f)).size)

img_df = pd.DataFrame.from_records(image_sizes)
img_df.columns = ["width", "height"]

print("Image Dimension Statistics:")
print(img_df.describe())

The statistics confirm that all images in the dataset are **64x64**. This is also great, as it means we won't have to handle variable image sizes.

---

## 6.2 Creating data pipelines using the Keras ImageDataGenerator

Instead of loading all 100,000+ images into memory, we will use `ImageDataGenerator`. This Keras utility can read images from disk in batches, preprocess them (like normalizing), and augment them (like rotating/zooming) on the fly. This saves memory and helps reduce overfitting.

The model we will build (Inception) has 3 outputs (1 main, 2 auxiliary). Therefore, our generator must be wrapped to output a tuple of `(x, (y, y, y))`.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from functools import partial

batch_size = 128
target_size = (56, 56) # We use 56x56 for the Inception model

# 1. Define the generator for Training and Validation
# We split 10% of the training data off for validation
image_gen = ImageDataGenerator(
    samplewise_center=True, # Normalizes by subtracting the image's mean pixel value
    validation_split=0.1
)

# 2. Use 'partial' to create a base generator function with common args
partial_flow_func = partial(
    image_gen.flow_from_directory,
    directory=train_image_dir,
    target_size=target_size,
    classes=None, # Infers classes from subdirectory names
    class_mode='categorical', # Returns one-hot encoded labels
    batch_size=batch_size,
    shuffle=True,
    seed=random_seed
)

# 3. Create the training and validation generators
train_gen = partial_flow_func(subset='training')
valid_gen = partial_flow_func(subset='validation')

# 4. Define the generator for Test data (from the 'val' folder)
val_dir = os.path.join(data_dir, 'val')
val_ann_path = os.path.join(val_dir, 'val_annotations.txt')

def get_test_labels_df(test_labels_path):
    test_df = pd.read_csv(test_labels_path, sep='\t', index_col=None, header=None)
    test_df = test_df.iloc[:, [0, 1]].rename({0: "filename", 1: "class"}, axis=1)
    return test_df

test_df = get_test_labels_df(val_ann_path)

# Use a separate generator for test data (no validation split)
image_gen_test = ImageDataGenerator(samplewise_center=True)

test_gen = image_gen_test.flow_from_dataframe(
    dataframe=test_df,
    directory=os.path.join(val_dir, 'images'),
    x_col='filename',
    y_col='class',
    target_size=target_size,
    class_mode='categorical',
    batch_size=batch_size,
    shuffle=False # No need to shuffle test data
)

# 5. Define the auxiliary wrapper for 3 outputs
def data_gen_aux(gen):
    for x, y in gen:
        yield x, (y, y, y) # Return the label 3 times

train_gen_aux = data_gen_aux(train_gen)
valid_gen_aux = data_gen_aux(valid_gen)
test_gen_aux = data_gen_aux(test_gen)

print(f"Created {len(train_gen)} training batches.")
print(f"Created {len(valid_gen)} validation batches.")
print(f"Created {len(test_gen)} test batches.")

# Check output shape
x_sample, (y_sample1, y_sample2, y_sample3) = next(train_gen_aux)
print(f"Sample X shape: {x_sample.shape}")
print(f"Sample Y1 shape: {y_sample1.shape}")

---

## 6.3 Inception net: Implementing a state-of-the-art image classifier

Now we build the **Inception net v1 (GoogLeNet)** model. This model's architecture is more complex than a simple sequential CNN. We will use the Keras Functional API to build it.

**Key Components:**
1.  **Stem**: The first few layers of standard convolution and pooling to reduce the initial dimensions.
2.  **Inception Block**: The core idea. This block runs multiple convolutions (1x1, 3x3, 5x5) and a pooling operation in parallel and concatenates their outputs. This allows the network to capture features at multiple scales simultaneously.
3.  **1x1 Convolutions**: Used *before* the 3x3 and 5x5 convolutions as a dimensionality reduction "bottleneck" to reduce the number of parameters and computations.
4.  **Auxiliary Outputs**: Two extra "mini-classifiers" added to intermediate layers. During training, their loss is added to the main loss. This helps combat the vanishing gradient problem in very deep networks and provides extra regularization.

In [None]:
import tensorflow as tf
from tensorflow.keras import layers, models, Model
import tensorflow.keras.backend as K

# Define the Stem (initial layers)
def stem(inp):
    conv1 = layers.Conv2D(64, (7,7), strides=(1,1), activation='relu', padding='same')(inp) 
    maxpool2 = layers.MaxPool2D((3,3), strides=(2,2), padding='same')(conv1) 
    # Local Response Normalization (LRN) - less common now, often replaced by BatchNormalization
    lrn3 = layers.Lambda(lambda x: tf.nn.local_response_normalization(x), name='lrn3')(maxpool2) 
    conv4 = layers.Conv2D(64, (1,1), strides=(1,1), padding='same', activation='relu')(lrn3) 
    conv5 = layers.Conv2D(192, (3,3), strides=(1,1), activation='relu', padding='same')(conv4) 
    lrn6 = layers.Lambda(lambda x: tf.nn.local_response_normalization(x), name='lrn6')(conv5) 
    maxpool7 = layers.MaxPool2D((3,3), strides=(1,1), padding='same')(lrn6) 
    return maxpool7

# Define the Inception Block
def inception(inp, n_filters):
    # n_filters is a list: [(1x1), (1x1_reduce, 3x3), (1x1_reduce, 5x5), (pool_proj)]
    
    # Branch 1: 1x1 convolution
    out1 = layers.Conv2D(n_filters[0][0], (1,1), strides=(1,1), activation='relu', padding='same')(inp)
    
    # Branch 2: 1x1 conv -> 3x3 conv
    out2_1 = layers.Conv2D(n_filters[1][0], (1,1), strides=(1,1), activation='relu', padding='same')(inp)
    out2_2 = layers.Conv2D(n_filters[1][1], (3,3), strides=(1,1), activation='relu', padding='same')(out2_1)
    
    # Branch 3: 1x1 conv -> 5x5 conv
    out3_1 = layers.Conv2D(n_filters[2][0], (1,1), strides=(1,1), activation='relu', padding='same')(inp)
    out3_2 = layers.Conv2D(n_filters[2][1], (5,5), strides=(1,1), activation='relu', padding='same')(out3_1)
    
    # Branch 4: 3x3 pool -> 1x1 conv
    out4_1 = layers.MaxPool2D((3,3), strides=(1,1), padding='same')(inp)
    out4_2 = layers.Conv2D(n_filters[3][0], (1,1), strides=(1,1), activation='relu', padding='same')(out4_1)
    
    # Concatenate all branches along the channel axis
    out = layers.Concatenate(axis=-1)([out1, out2_2, out3_2, out4_2])
    return out

# Define the Auxiliary Output classifier
def aux_out(inp, name=None):
    avgpool1 = layers.AvgPool2D((5,5), strides=(3,3), padding='valid')(inp) 
    conv1 = layers.Conv2D(128, (1,1), activation='relu', padding='same')(avgpool1) 
    flat = layers.Flatten()(conv1) 
    dense1 = layers.Dense(1024, activation='relu')(flat) 
    aux_out = layers.Dense(200, activation='softmax', name=name)(dense1) # 200 classes
    return aux_out

# Define the full Inception v1 model
def inception_v1(input_shape=(56, 56, 3), num_classes=200):
    K.clear_session()
    
    inp = layers.Input(shape=input_shape)
    
    # Stem
    stem_out = stem(inp)
    
    # Inception blocks
    inc_3a = inception(stem_out, [(64,),(96,128),(16,32),(32,)])
    inc_3b = inception(inc_3a, [(128,),(128,192),(32,96),(64,)])
    
    maxpool_3 = layers.MaxPool2D((3,3), strides=(2,2), padding='same')(inc_3b)
    
    inc_4a = inception(maxpool_3, [(192,),(96,208),(16,48),(64,)])
    inc_4b = inception(inc_4a, [(160,),(112,224),(24,64),(64,)])
    inc_4c = inception(inc_4b, [(128,),(128,256),(24,64),(64,)])
    inc_4d = inception(inc_4c, [(112,),(144,288),(32,64),(64,)])
    inc_4e = inception(inc_4d, [(256,),(160,320),(32,128),(128,)])
    
    maxpool_4 = layers.MaxPool2D((3,3), strides=(2,2), padding='same')(inc_4e)
    
    inc_5a = inception(maxpool_4, [(256,),(160,320),(32,128),(128,)])
    inc_5b = inception(inc_5a, [(384,),(192,384),(48,128),(128,)])
    
    # --- Classifiers ---
    
    # Auxiliary Output 1 (from 4a)
    aux_output_1 = aux_out(inc_4a, name='aux1')
    
    # Auxiliary Output 2 (from 4d)
    aux_output_2 = aux_out(inc_4d, name='aux2')
    
    # Main Output (from 5b)
    avgpool_final = layers.AvgPool2D((7,7), strides=(1,1), padding='valid')(inc_5b)
    flat_out = layers.Flatten()(avgpool_final) 
    main_output = layers.Dense(num_classes, activation='softmax', name='final')(flat_out) 
    
    # Create the model
    model = models.Model(inputs=inp, outputs=[main_output, aux_output_1, aux_output_2])
    
    # Compile the model
    # We provide a loss for each of the 3 outputs. We can weigh them differently.
    model.compile(loss='categorical_crossentropy', 
                  loss_weights={'final': 1.0, 'aux1': 0.3, 'aux2': 0.3}, # As per the paper
                  optimizer='adam', 
                  metrics=['accuracy'])
    return model

model = inception_v1()
model.summary()

---

## 6.4 Training the model and evaluating performance

With the data generators and the complex model defined, we can now start the training process.

In [None]:
from tensorflow.keras.callbacks import CSVLogger
from tensorflow.keras.models import load_model
import time

def get_steps_per_epoch(n_data, batch_size):
    if n_data % batch_size == 0:
        return int(n_data / batch_size)
    else:
        return int(n_data * 1.0 / batch_size) + 1

# Create directories for logs and models
os.makedirs('eval', exist_ok=True)
os.makedirs('models', exist_ok=True)

csv_logger = CSVLogger(os.path.join('eval', '1_eval_base.log'))

n_train = len(train_gen.filenames)
n_valid = len(valid_gen.filenames)
n_test = len(test_gen.filenames)

train_steps = get_steps_per_epoch(n_train, batch_size)
valid_steps = get_steps_per_epoch(n_valid, batch_size)
test_steps = get_steps_per_epoch(n_test, batch_size)

# The book runs for 50 epochs. We will run for 5 to make it executable.
epochs_to_run = 5 
print(f"Starting training for {epochs_to_run} epochs (Book uses 50)...")

history = model.fit(
    x=train_gen_aux,
    validation_data=valid_gen_aux,
    steps_per_epoch=train_steps,
    validation_steps=valid_steps,
    epochs=epochs_to_run, 
    callbacks=[csv_logger]
)

print("Training complete.")

# Save the trained model
model_path = os.path.join('models', 'inception_v1_base.h5')
model.save(model_path)
print(f"Model saved to {model_path}")

In [None]:
# Load the model back and evaluate on the test set
print("Evaluating model on test set...")
loaded_model = load_model(model_path)

# Note: The output names (e.g., 'final_accuracy') match the layer names we set.
test_res = loaded_model.evaluate(test_gen_aux, steps=test_steps)
test_res_dict = dict(zip(loaded_model.metrics_names, test_res))

print("Test Results:")
print(test_res_dict)

*(Book Observation)*: The model in the book achieves high training accuracy (~94%) but low validation/test accuracy (~30%). This is a classic sign of **overfitting**. Chapter 7 will explore techniques like data augmentation, dropout, and using a better-suited architecture (Minception) to combat this.