# üñºÔ∏è Teaching Machines to See ‚Äì Image Classification with CNNs
**Source: TensorFlow in Action ‚Äì Chapter 6**

Chapter 6 membahas pipeline lengkap image classification: exploratory data analysis (EDA) untuk dataset Tiny ImageNet, pembuatan data pipeline dengan `ImageDataGenerator`, dan implementasi arsitektur **Inception v1** menggunakan Keras Functional API untuk melatih CNN skala besar.


In [1]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from PIL import Image
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models


## üîé Exploratory Data Analysis (Tiny ImageNet)

**Theory**: EDA penting untuk memahami struktur folder, jumlah kelas, distribusi sampel per kelas, dan atribut gambar (ukuran, kualitas, noise) sebelum membangun model.
Dataset **tiny-imagenet-200** memiliki 200 kelas dengan masing-masing 500 gambar training dan gambar 64√ó64 RGB; label kelas direpresentasikan dengan WordNet ID (`wnid`) dan deskripsi di `words.txt`.


In [4]:
import os
import requests
import zipfile

os.makedirs("data", exist_ok=True)
zip_path = os.path.join("data", "tiny-imagenet-200.zip")
data_dir = os.path.join("data", "tiny-imagenet-200")

if not os.path.exists(data_dir):
    if not os.path.exists(zip_path):
        url = "http://cs231n.stanford.edu/tiny-imagenet-200.zip"
        print("Downloading Tiny ImageNet...")
        r = requests.get(url)
        with open(zip_path, "wb") as f:
            f.write(r.content)
    print("Extracting...")
    with zipfile.ZipFile(zip_path, "r") as zf:
        zf.extractall("data")
else:
    print("Dataset folder already exists:", data_dir)


Downloading Tiny ImageNet...
Extracting...


In [5]:
data_dir = os.path.join("data", "tiny-imagenet-200")
wnids_path = os.path.join(data_dir, "wnids.txt")
words_path = os.path.join(data_dir, "words.txt")

def get_tiny_imagenet_classes(wnids_path, words_path):
    # baca wnids sebagai DataFrame 1 kolom, kemudian ambil kolom 0 sebagai Series
    wnids_df = pd.read_csv(wnids_path, header=None)
    wnids = wnids_df[0]

    # baca mapping wnid ‚Üí deskripsi
    words = pd.read_csv(words_path, sep="\t", index_col=0, header=None)

    # pilih hanya 200 wnid yang dipakai tiny-imagenet
    words200 = words.loc[wnids].rename({1: "class"}, axis=1)
    words200.index.name = "wnid"
    return words200.reset_index()

labels = get_tiny_imagenet_classes(wnids_path, words_path)
labels.head()


Unnamed: 0,wnid,class
0,n02124075,Egyptian cat
1,n04067472,reel
2,n04540053,volleyball
3,n04099969,"rocking chair, rocker"
4,n07749582,lemon


In [6]:
def get_image_count(img_dir):
    return len([f for f in os.listdir(img_dir) if f.lower().endswith("jpeg")])

labels["n_train"] = labels["wnid"].apply(
    lambda wid: get_image_count(os.path.join(data_dir, "train", wid, "images"))
)
labels["n_train"].describe()


count    200.0
mean     500.0
std        0.0
min      500.0
25%      500.0
50%      500.0
75%      500.0
max      500.0
Name: n_train, dtype: float64

In [7]:
# contoh statistik ukuran gambar (pakai subset kelas biar cepat)
image_sizes = []
for wnid in labels["wnid"].iloc[:25]:
    img_dir = os.path.join(data_dir, "train", wnid, "images")
    for fname in os.listdir(img_dir):
        if fname.endswith("JPEG"):
            w, h = Image.open(os.path.join(img_dir, fname)).size
            image_sizes.append((w, h))

imgdf = pd.DataFrame.from_records(image_sizes, columns=["width", "height"])
imgdf.describe()


Unnamed: 0,width,height
count,12500.0,12500.0
mean,64.0,64.0
std,0.0,0.0
min,64.0,64.0
25%,64.0,64.0
50%,64.0,64.0
75%,64.0,64.0
max,64.0,64.0


## üöö Data Pipeline dengan ImageDataGenerator

**Theory**: `ImageDataGenerator.flow_from_directory` memudahkan pembacaan batch gambar dari folder yang terstruktur menurut kelas, termasuk resize, normalisasi, dan split train/validation.

Dataset dibagi menjadi:
- Train: melatih parameter model.
- Validation: mengevaluasi performa per-epoch dan mendeteksi overfitting.
- Test: hanya dipakai setelah training selesai untuk estimasi generalisasi.


In [8]:
random_seed = 4321
batch_size = 128
target_size = (56, 56)  # 64‚Üí56 agar kelipatan 224 lebih mudah diadaptasi

imagegen = ImageDataGenerator(
    samplewise_center=True,
    validation_split=0.1
)

from functools import partial

partial_flow = partial(
    imagegen.flow_from_directory,
    directory=os.path.join(data_dir, "train"),
    target_size=target_size,
    classes=None,
    class_mode="categorical",
    batch_size=batch_size,
    shuffle=True,
    seed=random_seed,
)

train_gen = partial_flow(subset="training")
valid_gen = partial_flow(subset="validation")


Found 90000 images belonging to 200 classes.
Found 10000 images belonging to 200 classes.


In [9]:
# test generator dari val/annotations
def get_test_labels_df(ann_path):
    df = pd.read_csv(ann_path, sep="\t", header=None)
    df = df.iloc[:, [0, 1]].rename({0: "filename", 1: "class"}, axis=1)
    return df

test_df = get_test_labels_df(
    os.path.join(data_dir, "val", "val_annotations.txt")
)

test_gen = imagegen.flow_from_dataframe(
    dataframe=test_df,
    directory=os.path.join(data_dir, "val", "images"),
    x_col="filename",
    y_col="class",
    target_size=target_size,
    class_mode="categorical",
    batch_size=batch_size,
    shuffle=False
)


Found 10000 validated image filenames belonging to 200 classes.


## üß† Inception v1 ‚Äì Intuisi Arsitektur

**Theory**: Inception v1 (GoogLeNet) adalah CNN dalam yang efisien parameter dengan tiga komponen utama: **stem**, **Inception blocks**, dan **auxiliary classifiers**.

- Inception block menjalankan beberapa conv paralel (1√ó1, 3√ó3, 5√ó5, dan pooling), lalu meng-*concat* fitur sehingga model bisa "melihat" multi-skala tanpa ledakan parameter.
- 1√ó1 convolution dipakai untuk **reduksi dimensi kanal**, sehingga conv 3√ó3/5√ó5 di atasnya bekerja di ruang fitur yang lebih kecil dan menghemat parameter.


In [10]:
from tensorflow.keras.layers import Conv2D, MaxPool2D, Lambda, Input

def stem(inp):
    # conv 7x7
    x = Conv2D(
        64, (7, 7),
        strides=(1, 1),  # disesuaikan untuk input lebih kecil
        activation="relu",
        padding="same"
    )(inp)
    x = MaxPool2D((3, 3), strides=(2, 2), padding="same")(x)
    x = Lambda(lambda t: tf.nn.local_response_normalization(t))(x)

    x = Conv2D(64, (1, 1), strides=(1, 1), padding="same")(x)
    x = Conv2D(192, (3, 3), strides=(1, 1), activation="relu", padding="same")(x)
    x = Lambda(lambda t: tf.nn.local_response_normalization(t))(x)
    x = MaxPool2D((3, 3), strides=(1, 1), padding="same")(x)  # sedikit berbeda dari paper asli

    return x


In [11]:
from tensorflow.keras.layers import Concatenate, AveragePooling2D

def inception_block(x, f1x1, f3x3_reduce, f3x3, f5x5_reduce, f5x5, fpool_proj):
    # branch 1: 1x1
    b1 = Conv2D(f1x1, (1, 1), activation="relu", padding="same")(x)

    # branch 2: 1x1 ‚Üí 3x3
    b2 = Conv2D(f3x3_reduce, (1, 1), activation="relu", padding="same")(x)
    b2 = Conv2D(f3x3, (3, 3), activation="relu", padding="same")(b2)

    # branch 3: 1x1 ‚Üí 5x5
    b3 = Conv2D(f5x5_reduce, (1, 1), activation="relu", padding="same")(x)
    b3 = Conv2D(f5x5, (5, 5), activation="relu", padding="same")(b3)

    # branch 4: 3x3 pool ‚Üí 1x1
    b4 = MaxPool2D((3, 3), strides=(1, 1), padding="same")(x)
    b4 = Conv2D(fpool_proj, (1, 1), activation="relu", padding="same")(b4)

    x = Concatenate(axis=-1)([b1, b2, b3, b4])
    return x


## üß∑ Auxiliary Classifier

**Theory**: Auxiliary classifier di tengah jaringan membantu gradien mengalir ke lapisan awal dan bertindak seperti regularizer dengan memberikan sinyal loss tambahan selama training.

Auxiliary head mengambil feature map intermediate, lalu:
1. `AveragePooling2D` dengan stride besar.
2. `Conv2D` kecil.
3. `Flatten` ‚Üí `Dense` ‚Üí `Dense(num_classes, softmax)`.


In [12]:
from tensorflow.keras.layers import AveragePooling2D, Flatten, Dense, Dropout

def aux_classifier(x, num_classes, name_prefix):
    x = AveragePooling2D((5, 5), strides=(3, 3))(x)
    x = Conv2D(128, (1, 1), activation="relu", padding="same")(x)
    x = Flatten()(x)
    x = Dense(1024, activation="relu")(x)
    x = Dropout(0.7)(x)
    x = Dense(num_classes, activation="softmax", name=name_prefix)(x)
    return x


In [13]:
num_classes = 200

inp = Input(shape=(56, 56, 3))
x = stem(inp)

# contoh subset Inception blocks (tidak semua blok dari paper asli)
x = inception_block(x, 64, 96, 128, 16, 32, 32)   # 3a
x = inception_block(x, 128, 128, 192, 32, 96, 64) # 3b
x = MaxPool2D((3, 3), strides=(2, 2), padding="same")(x)

x = inception_block(x, 192, 96, 208, 16, 48, 64)  # 4a
aux1 = aux_classifier(x, num_classes, "aux1")

x = inception_block(x, 160, 112, 224, 24, 64, 64) # 4b
x = inception_block(x, 128, 128, 256, 24, 64, 64) # 4c
x = inception_block(x, 112, 144, 288, 32, 64, 64) # 4d
aux2 = aux_classifier(x, num_classes, "aux2")

x = inception_block(x, 256, 160, 320, 32, 128, 128) # 4e
x = MaxPool2D((3, 3), strides=(2, 2), padding="same")(x)

x = inception_block(x, 256, 160, 320, 32, 128, 128) # 5a
x = inception_block(x, 384, 192, 384, 48, 128, 128) # 5b

x = AveragePooling2D((7, 7), strides=(1, 1), padding="valid")(x)
x = Flatten()(x)
x = Dropout(0.4)(x)
out_main = Dense(num_classes, activation="softmax", name="final")(x)

model = models.Model(
    inputs=inp,
    outputs=[out_main, aux1, aux2]
)

model.summary()


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 56, 56, 3)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 56, 56, 64)   9472        ['input_1[0][0]']                
                                                                                                  
 max_pooling2d (MaxPooling2D)   (None, 28, 28, 64)   0           ['conv2d[0][0]']                 
                                                                                                  
 lambda (Lambda)                (None, 28, 28, 64)   0           ['max_pooling2d[0][0]']          
                                                                                              

## üéØ Training dengan tiga output

**Theory**: Selama proses training, total loss merupakan kombinasi linear antara loss utama dan dua *auxiliary loss*, misalnya:
$$
\mathcal{L} = \mathcal{L}_{\text{final}} + \alpha \,\mathcal{L}_{\text{aux1}} + \beta \,\mathcal{L}_{\text{aux2}}
$$

dengan \(\alpha\) dan \(\beta\) bernilai kecil (misalnya 0.3) sehingga *auxiliary head* berperan sebagai *regularizer* lemah.


In [14]:
# gunakan weight loss: [1.0, 0.3, 0.3]
model.compile(
    optimizer="adam",
    loss={
        "final": "categorical_crossentropy",
        "aux1": "categorical_crossentropy",
        "aux2": "categorical_crossentropy",
    },
    loss_weights={
        "final": 1.0,
        "aux1": 0.3,
        "aux2": 0.3,
    },
    metrics={"final": "accuracy", "aux1": "accuracy", "aux2": "accuracy"}
)

# generator harus mengeluarkan (x, y_main, y_aux1, y_aux2)
def datagen_aux(gen):
    for x, y in gen:
        yield x, {"final": y, "aux1": y, "aux2": y}

train_gen_aux = datagen_aux(train_gen)
valid_gen_aux = datagen_aux(valid_gen)


In [15]:
# gunakan weight loss: [1.0, 0.3, 0.3]
model.compile(
    optimizer="adam",
    loss={
        "final": "categorical_crossentropy",
        "aux1": "categorical_crossentropy",
        "aux2": "categorical_crossentropy",
    },
    loss_weights={
        "final": 1.0,
        "aux1": 0.3,
        "aux2": 0.3,
    },
    metrics={"final": "accuracy", "aux1": "accuracy", "aux2": "accuracy"}
)

# generator harus mengeluarkan (x, y_main, y_aux1, y_aux2)
def datagen_aux(gen):
    for x, y in gen:
        yield x, {"final": y, "aux1": y, "aux2": y}

train_gen_aux = datagen_aux(train_gen)
valid_gen_aux = datagen_aux(valid_gen)


In [16]:
# untuk evaluasi, cukup gunakan output utama
test_gen_aux = datagen_aux(test_gen)
test_steps = test_gen.samples // batch_size

results = model.evaluate(
    test_gen_aux,
    steps=test_steps
)
results




[8.47822093963623,
 5.2984137535095215,
 5.300570011138916,
 5.298788547515869,
 0.004907852504402399,
 0.005108173005282879,
 0.0052083334885537624]

## ‚úÖ Ringkasan Chapter 6

**Theory**: Chapter 6 menunjukkan bagaimana melakukan EDA pada dataset gambar, membangun data pipeline dengan `ImageDataGenerator`, lalu mengimplementasikan dan melatih arsitektur besar seperti Inception v1 di TensorFlow/Keras.

Inception memanfaatkan kombinasi conv multi-skala, 1√ó1 conv untuk reduksi dimensi, dan auxiliary classifiers untuk mencapai model yang dalam namun efisien parameter serta lebih stabil saat training pada dataset besar seperti Tiny ImageNet.
