# üß† Teaching Machines to See Better ‚Äì Improving CNNs and Making Them Confess
**Source: TensorFlow in Action ‚Äì Chapter 7**

Chapter ini fokus ke tiga hal: mengurangi overfitting Inception v1 di Tiny ImageNet, mendesain arsitektur baru Minception (Inception‚ÄëResNet‚Äëstyle), dan menjelaskan prediksi CNN dengan Grad‚ÄëCAM.

In [1]:
import os
import numpy as np
import tensorflow as tf

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, CSVLogger


## üìà Mengurangi Overfitting dengan Data Augmentation

**Theory**: Data augmentation menambah variasi data tanpa label baru sehingga model lebih general dan kurang overfit.

Augmentasi yang digunakan:
- Rotasi acak, pergeseran horizontal/vertikal
- Perubahan brightness, shear, zoom
- Horizontal flip dan pengisian area kosong dengan `reflect`.


In [3]:
random_seed = 4321
batch_size = 128
target_size = (56, 56)  # untuk Tiny ImageNet (cropped dari 64√ó64)

imagegen_aug = ImageDataGenerator(
    samplewise_center=False,       # normalisasi manual nanti
    rotation_range=30,
    width_shift_range=0.2,
    height_shift_range=0.2,
    brightness_range=(0.5, 1.5),
    shear_range=5,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode="reflect",
    validation_split=0.1
)

imagegen = ImageDataGenerator(samplewise_center=False)


## üöö Data Pipeline: Train, Validation, Test

**Theory**: Train di‚Äëaugmentasi, sedangkan validation dan test tidak di‚Äëaugmentasi agar mengukur performa pada distribusi asli.


In [4]:
from functools import partial

data_dir = os.path.join("data", "tiny-imagenet-200")

partial_flow = partial(
    imagegen_aug.flow_from_directory,
    directory=os.path.join(data_dir, "train"),
    target_size=target_size,
    classes=None,
    class_mode="categorical",
    batch_size=batch_size,
    shuffle=True,
    seed=random_seed,
)

train_gen = partial_flow(subset="training")
valid_gen = partial_flow(subset="validation")


Found 90000 images belonging to 200 classes.
Found 10000 images belonging to 200 classes.


In [5]:
import pandas as pd

def get_test_labels_df(ann_path):
    df = pd.read_csv(ann_path, sep="\t", header=None)
    df = df.iloc[:, [0, 1]].rename({0: "filename", 1: "class"}, axis=1)
    return df

test_df = get_test_labels_df(
    os.path.join(data_dir, "val", "val_annotations.txt")
)

test_gen = imagegen.flow_from_dataframe(
    dataframe=test_df,
    directory=os.path.join(data_dir, "val", "images"),
    x_col="filename",
    y_col="class",
    target_size=target_size,
    class_mode="categorical",
    batch_size=batch_size,
    shuffle=False
)


Found 10000 validated image filenames belonging to 200 classes.


## üîÄ Generator Multi-output untuk Inception v1

**Theory**: Inception v1 punya 1 output utama + 2 auxiliary classifier; jadi generator harus mengoutput `{"final": y, "aux1": y, "aux2": y}` untuk satu input batch.


In [6]:
def preprocess_batch(x):
    # contoh: normalisasi sample-wise manual
    x = x.astype("float32")
    mean = np.mean(x, axis=(1, 2, 3), keepdims=True)
    std = np.std(x, axis=(1, 2, 3), keepdims=True) + 1e-6
    return (x - mean) / std

def datagen_augmented_inceptionv1(gen):
    for x, y in gen:
        x = preprocess_batch(x)
        yield x, {"final": y, "aux1": y, "aux2": y}


## ‚èπÔ∏è Early Stopping dan LR Scheduling

**Theory**: Early stopping menghentikan training saat `val_loss` tidak membaik; LR scheduler menurunkan learning rate jika `val_loss` stagnan sehingga training lebih stabil dan menghindari overfitting berat.


In [7]:
early_stop = EarlyStopping(
    monitor="val_final_loss",
    patience=5,
    min_delta=0.01,
    restore_best_weights=True
)

lr_sched = ReduceLROnPlateau(
    monitor="val_final_loss",
    factor=0.1,
    patience=3,
    verbose=1
)

csv_logger = CSVLogger("eval/inceptionv1_improved.log", append=False)


## üß© Minception: Inception-ResNet-style Tiny ImageNet

**Theory**: Minception adalah arsitektur yang terinspirasi Inception‚ÄëResNet v2, tetapi diperkecil untuk Tiny ImageNet. Komponen utama:

- **Stem**: CNN awal dengan beberapa conv paralel dan batch normalization di antara conv dan aktivasi.
- **Inception‚ÄëResNet A/B blocks**: beberapa branch conv (1√ó1, 3√ó3, faktorisasi 5√ó5) dengan **residual connection** ke input.
- **Reduction blocks**: menurunkan ukuran spatial sambil menaikkan depth channel.
- Head: average pooling ‚Üí flatten ‚Üí dropout ‚Üí Dense(200, softmax).


In [8]:
init = "glorot_uniform"

def stem(inp, activation="relu", bn=True):
    x = layers.Conv2D(32, (3, 3), strides=(2, 2),
                      padding="same", activation=None,
                      kernel_initializer=init)(inp)
    if bn:
        x = layers.BatchNormalization()(x)
    x = layers.Activation(activation)(x)

    x = layers.Conv2D(32, (3, 3), strides=(1, 1),
                      padding="same", activation=None,
                      kernel_initializer=init)(x)
    if bn:
        x = layers.BatchNormalization()(x)
    x = layers.Activation(activation)(x)

    x = layers.Conv2D(64, (3, 3), strides=(1, 1),
                      padding="same", activation=None,
                      kernel_initializer=init)(x)
    if bn:
        x = layers.BatchNormalization()(x)
    x = layers.Activation(activation)(x)

    branch1 = layers.MaxPool2D((3, 3), strides=(2, 2), padding="same")(x)
    branch2 = layers.Conv2D(96, (3, 3), strides=(2, 2),
                            padding="same", activation=None,
                            kernel_initializer=init)(x)
    if bn:
        branch2 = layers.BatchNormalization()(branch2)
    branch2 = layers.Activation(activation)(branch2)

    x = layers.Concatenate(axis=-1)([branch1, branch2])

    b1 = layers.Conv2D(64, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(x)
    if bn:
        b1 = layers.BatchNormalization()(b1)
    b1 = layers.Activation(activation)(b1)
    b1 = layers.Conv2D(96, (3, 3), padding="same",
                       activation=None, kernel_initializer=init)(b1)
    if bn:
        b1 = layers.BatchNormalization()(b1)
    b1 = layers.Activation(activation)(b1)

    b2 = layers.Conv2D(64, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(x)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)
    b2 = layers.Conv2D(64, (7, 1), padding="same",
                       activation=None, kernel_initializer=init)(b2)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Conv2D(64, (1, 7), padding="same",
                       activation=None, kernel_initializer=init)(b2)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)
    b2 = layers.Conv2D(96, (3, 3), padding="same",
                       activation=None, kernel_initializer=init)(b2)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)

    x = layers.Concatenate(axis=-1)([b1, b2])

    b3 = layers.MaxPool2D((3, 3), strides=(2, 2), padding="same")(x)
    b4 = layers.Conv2D(192, (3, 3), strides=(2, 2),
                       padding="same", activation=None,
                       kernel_initializer=init)(x)
    if bn:
        b4 = layers.BatchNormalization()(b4)
    b4 = layers.Activation(activation)(b4)

    x = layers.Concatenate(axis=-1)([b3, b4])
    return x


In [9]:
def inception_resnet_block_A(inp, scale=0.1, activation="relu", bn=True):
    init_x = inp

    b1 = layers.Conv2D(32, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(inp)
    if bn:
        b1 = layers.BatchNormalization()(b1)
    b1 = layers.Activation(activation)(b1)

    b2 = layers.Conv2D(32, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(inp)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)
    b2 = layers.Conv2D(32, (3, 3), padding="same",
                       activation=None, kernel_initializer=init)(b2)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)

    b3 = layers.Conv2D(32, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(inp)
    if bn:
        b3 = layers.BatchNormalization()(b3)
    b3 = layers.Activation(activation)(b3)
    b3 = layers.Conv2D(48, (3, 3), padding="same",
                       activation=None, kernel_initializer=init)(b3)
    if bn:
        b3 = layers.BatchNormalization()(b3)
    b3 = layers.Activation(activation)(b3)
    b3 = layers.Conv2D(64, (3, 3), padding="same",
                       activation=None, kernel_initializer=init)(b3)
    if bn:
        b3 = layers.BatchNormalization()(b3)
    b3 = layers.Activation(activation)(b3)

    mixed = layers.Concatenate(axis=-1)([b1, b2, b3])
    up = layers.Conv2D(inp.shape[-1], (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(mixed)

    x = layers.Lambda(lambda z: z[0] + scale * z[1])([init_x, up])
    if bn:
        x = layers.BatchNormalization()(x)
    x = layers.Activation(activation)(x)
    return x


In [10]:
def reduction_block(inp, activation="relu", bn=True):
    b1 = layers.MaxPool2D((3, 3), strides=(2, 2), padding="same")(inp)

    b2 = layers.Conv2D(192, (3, 3), strides=(2, 2),
                       padding="same", activation=None,
                       kernel_initializer=init)(inp)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)

    x = layers.Concatenate(axis=-1)([b1, b2])
    return x

def inception_resnet_block_B(inp, scale=0.1, activation="relu", bn=True):
    init_x = inp

    b1 = layers.Conv2D(128, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(inp)
    if bn:
        b1 = layers.BatchNormalization()(b1)
    b1 = layers.Activation(activation)(b1)

    b2 = layers.Conv2D(128, (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(inp)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)
    b2 = layers.Conv2D(128, (1, 7), padding="same",
                       activation=None, kernel_initializer=init)(b2)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)
    b2 = layers.Conv2D(128, (7, 1), padding="same",
                       activation=None, kernel_initializer=init)(b2)
    if bn:
        b2 = layers.BatchNormalization()(b2)
    b2 = layers.Activation(activation)(b2)

    mixed = layers.Concatenate(axis=-1)([b1, b2])
    up = layers.Conv2D(inp.shape[-1], (1, 1), padding="same",
                       activation=None, kernel_initializer=init)(mixed)

    x = layers.Lambda(lambda z: z[0] + scale * z[1])([init_x, up])
    if bn:
        x = layers.BatchNormalization()(x)
    x = layers.Activation(activation)(x)
    return x


In [11]:
num_classes = 200

inp = layers.Input(shape=(56, 56, 3))
x = stem(inp)

x = inception_resnet_block_A(x)
x = reduction_block(x)
x = inception_resnet_block_B(x)
x = inception_resnet_block_B(x)

x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.5)(x)
out = layers.Dense(num_classes, activation="softmax")(x)

minception = models.Model(inputs=inp, outputs=out)
minception.summary()


Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 56, 56, 3)]  0           []                               
                                                                                                  
 conv2d (Conv2D)                (None, 28, 28, 32)   896         ['input_1[0][0]']                
                                                                                                  
 batch_normalization (BatchNorm  (None, 28, 28, 32)  128         ['conv2d[0][0]']                 
 alization)                                                                                       
                                                                                                  
 activation (Activation)        (None, 28, 28, 32)   0           ['batch_normalization[0][0]']

In [13]:
def preprocess_batch(x):
    x = x.astype("float32")
    mean = np.mean(x, axis=(1, 2, 3), keepdims=True)
    std = np.std(x, axis=(1, 2, 3), keepdims=True) + 1e-6
    return (x - mean) / std

def wrap_gen(gen):
    while True:
        x, y = next(gen)
        yield preprocess_batch(x), y


## üîÑ Transfer Learning dengan Inception-ResNet v2

**Theory**: Pretrained Inception‚ÄëResNet v2 di ImageNet digunakan sebagai feature extractor; head diganti dengan Dense(200) dan beberapa layer atas difine‚Äëtune untuk Tiny ImageNet, menghasilkan ~79% test accuracy.

In [15]:
base_model = tf.keras.applications.InceptionResNetV2(
    include_top=False,
    weights="imagenet",
    input_shape=(224, 224, 3),
    pooling="avg"
)

x = base_model.output
x = layers.Dropout(0.5)(x)
preds = layers.Dense(num_classes, activation="softmax")(x)

irv2_model = models.Model(inputs=base_model.input, outputs=preds)

for layer in base_model.layers[:-50]:
    layer.trainable = False

irv2_model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-4),
    loss="categorical_crossentropy",
    metrics=["accuracy"]
)


Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/inception_resnet_v2/inception_resnet_v2_weights_tf_dim_ordering_tf_kernels_notop.h5


## üîç Grad-CAM: Membuat CNN ‚ÄúNgaku‚Äù

**Theory**: Grad‚ÄëCAM memproyeksikan gradien skor kelas ke feature map layer conv terakhir untuk menghasilkan heatmap lokasi yang paling berkontribusi pada prediksi.

Formula inti:
- Hitung gradien skor kelas \(y^c\) terhadap feature map \(A^k\).
- Dapatkan bobot saluran:
  $$
  \alpha_k^c = \frac{1}{Z} \sum_i \sum_j \frac{\partial y^c}{\partial A^k_{ij}}
  $$
- Peta Grad‚ÄëCAM:
  $$
  L_{\text{Grad-CAM}}^c = \text{ReLU}\Big(\sum_k \alpha_k^c A^k\Big)
  $$


In [16]:
import numpy as np

def make_gradcam_heatmap(img_array, model, last_conv_layer_name, pred_index=None):
    grad_model = tf.keras.models.Model(
        [model.inputs],
        [model.get_layer(last_conv_layer_name).output, model.output]
    )

    with tf.GradientTape() as tape:
        conv_outputs, preds = grad_model(img_array)
        if pred_index is None:
            pred_index = tf.argmax(preds[0])
        class_channel = preds[:, pred_index]

    grads = tape.gradient(class_channel, conv_outputs)
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))

    conv_outputs = conv_outputs[0]
    heatmap = tf.reduce_sum(tf.multiply(pooled_grads, conv_outputs), axis=-1)

    heatmap = np.maximum(heatmap, 0)
    heatmap /= np.max(heatmap) + 1e-8
    return heatmap.numpy()


## ‚úÖ Ringkasan Chapter 7

**Theory**: Chapter ini menunjukkan bagaimana mengurangi overfitting CNN dengan augmentasi, dropout, dan early stopping, lalu mendesain Minception dan memanfaatkan transfer learning untuk meningkatkan akurasi jauh lebih tinggi di Tiny ImageNet.

Grad‚ÄëCAM memberikan cara visual untuk memverifikasi bahwa model fokus pada objek yang relevan, sehingga membantu debugging dan membangun kepercayaan sebelum deployment di dunia nyata.
