# Part 2. Simple age estimator

## Data loading and preparation

Import dataset from kaggle

In [None]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("frabbisw/facial-age")

print("Path to dataset files:", path)

In [None]:
from google.colab import drive
drive.mount('/content/drive')
models_dir = '/content/drive/MyDrive/models'

Check the structure of the dataset

In [None]:
from pathlib import Path

path = Path(path)

def hierarchy(root: Path):
    return { child: hierarchy(child) for child in root.iterdir() } if root.is_dir() else None

path_dict = hierarchy(path)
path_dict

In [None]:
path_dict = dict(list(path_dict.items())[0:])

path_dict

In [None]:
data = {}

for directory in path_dict:
    for age in path_dict[directory]:
        for image in path_dict[directory][age]:
            if image.is_file():
                data[image] = age.name

print(data)

In [None]:
import pandas as pd


df = pd.DataFrame(data = {'file' : data.keys(), 'age' : data.values()})
print(df.sample(5))

After manually inspecting the dataset, it was decided to drop some of the examples as they were either corrupted files, different body parts than face or having the wrong age.

In [None]:
print(len(df))
files_to_drop = [3829, 4313, 7034, 7326, 9378, 1490,]
for filename in files_to_drop:
    filename_with_extension = f"{filename}.png"
    for index, row in df.iterrows():
        if filename_with_extension in str(row.file):
            print(row.file)
            df = df.drop(index)

print(len(df))

In [None]:
df.to_csv('facial-age.csv')

We can categorize ages into bins to simplify classification task.
To start with it we can simple create bins with approximately equal number of examples in each.
The downside of it is that we won't have as precise age in case of wider bins (where number of examples were low for some ages). And the upside is that we will have approximarely equal number of examples in each bin, which in theory will allow us to predict a category with more accuracy.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

sns.countplot(data=df['age'])

In [None]:
df['age'] = df['age'].astype(int)
df['age_bins'] = pd.qcut(x=df['age'], q=8, precision=0)
print(df.sample(5))

In [None]:
sns.countplot(data=df['age_bins'])

In [None]:
df['age_bins'] = df['age_bins'].astype(str)
df.info()

In [None]:
import numpy as np
train, validate, test = \
              np.split(df.sample(frac=1),
                       [int(.75*len(df)), int(.9*len(df))])

print(len(train), len(validate), len(test))

In [None]:
for i in (train, validate, test):
  sns.countplot(data=i['age_bins'])

The proportions might be not exactly equal, but we can attribute it to some randomness in sampling.

Giving up and starting using LLMs to have a chance to get things done before the deadline.

## Experimenting with CNNs

I was confused how CNN will work with images considering I only have filepaths for them. Turns out we need (or may) use Image Data Generators for this task.

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Ensure columns are strings
train['file'] = train['file'].astype(str)
train['age_bins'] = train['age_bins'].astype(str)
validate['file'] = validate['file'].astype(str)
validate['age_bins'] = validate['age_bins'].astype(str)
test['file'] = test['file'].astype(str)
test['age_bins'] = test['age_bins'].astype(str)

# Create ImageDataGenerator
datagen = ImageDataGenerator(rescale=1./255)

# Setup generators
# class_mode='sparse' produces integer labels, matching 'sparse_categorical_crossentropy'
train_generator = datagen.flow_from_dataframe(
    dataframe=train,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse'
)

validation_generator = datagen.flow_from_dataframe(
    dataframe=validate,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse'
)

test_generator = datagen.flow_from_dataframe(
    dataframe=test,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse'
)

In [None]:
!pip install tensorflow

Returning to manually writing code because gemini is too slow today to be useful. Returning back to Gemini to ask how to work with image data instead of path strings.

Haha, that is somewhat low accuracy. Let's try other activations, and then maybe different number of epochs, and then if it won't help maybe read something on different combinations of layers.

I could say that it fits in my world map after the lecture, that tanh will converge faster. However, what I don't understand is why the starting accuracy way higher with this model. Is it just random luck or if there something else I don't understand?

Ah, so training accuracy is way higher than validation accuracy. I wonder how can I prevent overfitting here.. :)

Now combining three of them in one cycle to avoid copy-pasted code.

## Different hidden layers activation functions

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import os
import pickle
import numpy as np

activations = ['sigmoid', 'tanh', 'relu']
results = {}

for activation in activations:
    print(f"\n{'='*60}\nProcessing model with {activation} activation\n{'='*60}")

    # 1. Define Model
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation=activation, input_shape=(64, 64, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation=activation),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation=activation),
        layers.Flatten(),
        layers.Dense(64, activation=activation),
        layers.Dense(8, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # 2. Load or Train
    model_filename = models_dir + f'model_{activation}.keras'
    history_filename = models_dir + f'history_{activation}.pkl'

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for {activation}...")
        model = keras.models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training model with {activation}...")
        history_obj = model.fit(
            train_generator,
            epochs=10,
            validation_data=validation_generator,
            verbose=1
        )
        history = history_obj.history

        # Save model and history
        model.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for {activation}.")

    # 3. Evaluate on Test Data
    print(f"Evaluating {activation} model on test data...")
    test_loss, test_acc = model.evaluate(test_generator, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    results[activation] = {'test_acc': test_acc, 'test_loss': test_loss}

    # 4. Plotting (3 Graphs: Accuracy, Loss, Val vs Test)
    acc = history['accuracy']
    val_acc = history['val_accuracy']
    loss = history['loss']
    val_loss = history['val_loss']
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(18, 5))

    # -- Graph 1: Accuracy --
    plt.subplot(1, 3, 1)
    plt.plot(epochs_range, acc, 'bo-', label='Training acc')
    plt.plot(epochs_range, val_acc, 'b', label='Validation acc')
    plt.title(f'{activation.capitalize()} - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1)
    plt.legend()

    # -- Graph 2: Loss --
    plt.subplot(1, 3, 2)
    plt.plot(epochs_range, loss, 'bo-', label='Training loss')
    plt.plot(epochs_range, val_loss, 'b', label='Validation loss')
    plt.title(f'{activation.capitalize()} - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    # -- Graph 3: Val vs Test Comparison --
    plt.subplot(1, 3, 3)
    metrics_names = ['Accuracy', 'Loss']
    val_values = [val_acc[-1], val_loss[-1]]
    test_values = [test_acc, test_loss]

    x_pos = np.arange(len(metrics_names))
    width = 0.35

    rects1 = plt.bar(x_pos - width/2, val_values, width, label='Validation')
    rects2 = plt.bar(x_pos + width/2, test_values, width, label='Test')

    plt.ylabel('Score')
    plt.title(f'{activation.capitalize()} - Val vs Test')
    plt.xticks(x_pos, metrics_names)
    plt.legend()

    # Add labels
    plt.bar_label(rects1, padding=3, fmt='%.2f')
    plt.bar_label(rects2, padding=3, fmt='%.2f')

    plt.tight_layout()
    plt.show()

Prompt: What are different kernel sizes that can be used in CNNs?

Responce:
### Kernel Sizes in Convolutional Neural Networks (CNNs)

The **kernel size** (or filter size) determines the dimensions of the sliding window that scans over the input image. Here are the most common sizes and their characteristics:

1.  **3x3 Kernels ( The Standard)**:
    *   **Usage**: By far the most popular choice (used in architectures like VGG).
    *   **Why**: It captures local features effectively with a small number of parameters. Stacking multiple 3x3 layers allows the network to learn complex patterns with a large receptive field but fewer parameters than a single large kernel.

2.  **5x5 and 7x7 Kernels**:
    *   **Usage**: Often used in the very first layer of a network.
    *   **Why**: These are useful for capturing larger, more general features (like basic shapes or blobs) directly from the raw image. However, they are computationally more expensive.

3.  **1x1 Kernels (Pointwise Convolution)**:
    *   **Usage**: Popularized by the Inception network.
    *   **Why**: They don't look at neighboring pixels. Instead, they are used to **change the number of channels** (depth) of the feature map (dimensionality reduction) or to add non-linearity without changing the spatial dimensions.

4.  **Why Odd Numbers?** (3x3, 5x5, etc.)
    *   Kernels are almost always odd-sized (e.g., 3, 5, 7) so that they have a **center pixel**. This makes it easier to align the filter with the input pixels and apply padding symmetrically.

Good point about odd numbers, I didn' know that. And got a bit better understanding of kernel sizes overall.

## Experimenting with different kernel sizes for convolution layers.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import os
import pickle
import numpy as np

kernels = [(3, 3), (5, 5), (7, 7)]
results_kernels = {}

for kernel_size in kernels:
    print(f"\n{'='*60}\nProcessing model with kernel size: {kernel_size[0]}x{kernel_size[1]}\n{'='*60}")

    # 1. Define Model
    model = models.Sequential([
        layers.Conv2D(32, kernel_size, activation='relu', input_shape=(64, 64, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, kernel_size, activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, kernel_size, activation='relu'),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(8, activation='softmax')
    ])

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # 2. Load or Train
    model_filename = models_dir + f'model_kernel_{kernel_size[0]}x{kernel_size[1]}.keras'
    history_filename = models_dir + f'history_kernel_{kernel_size[0]}x{kernel_size[1]}.pkl'

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for kernel {kernel_size[0]}x{kernel_size[1]}...")
        model = keras.models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training model with kernel {kernel_size[0]}x{kernel_size[1]}...")
        history_obj = model.fit(
            train_generator,
            epochs=10,
            validation_data=validation_generator,
            verbose=1
        )
        history = history_obj.history

        # Save model and history
        model.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for kernel {kernel_size[0]}x{kernel_size[1]}.")

    # 3. Evaluate on Test Data
    print(f"Evaluating kernel {kernel_size[0]}x{kernel_size[1]} model on test data...")
    test_loss, test_acc = model.evaluate(test_generator, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    results_kernels[f'{kernel_size[0]}x{kernel_size[1]}'] = {'test_acc': test_acc, 'test_loss': test_loss}

    # 4. Plotting (3 Graphs: Accuracy, Loss, Val vs Test)
    acc = history['accuracy']
    val_acc = history['val_accuracy']
    loss = history['loss']
    val_loss = history['val_loss']
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(18, 5))

    # -- Graph 1: Accuracy --
    plt.subplot(1, 3, 1)
    plt.plot(epochs_range, acc, 'bo-', label='Training acc')
    plt.plot(epochs_range, val_acc, 'b', label='Validation acc')
    plt.title(f'Kernel {kernel_size[0]}x{kernel_size[1]} - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1)
    plt.legend()

    # -- Graph 2: Loss --
    plt.subplot(1, 3, 2)
    plt.plot(epochs_range, loss, 'bo-', label='Training loss')
    plt.plot(epochs_range, val_loss, 'b', label='Validation loss')
    plt.title(f'Kernel {kernel_size[0]}x{kernel_size[1]} - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    # -- Graph 3: Val vs Test Comparison --
    plt.subplot(1, 3, 3)
    metrics_names = ['Accuracy', 'Loss']
    val_values = [val_acc[-1], val_loss[-1]]
    test_values = [test_acc, test_loss]

    x_pos = np.arange(len(metrics_names))
    width = 0.35

    rects1 = plt.bar(x_pos - width/2, val_values, width, label='Validation')
    rects2 = plt.bar(x_pos + width/2, test_values, width, label='Test')

    plt.ylabel('Score')
    plt.title(f'Kernel {kernel_size[0]}x{kernel_size[1]} - Val vs Test')
    plt.xticks(x_pos, metrics_names)
    plt.legend()

    # Add labels
    plt.bar_label(rects1, padding=3, fmt='%.2f')
    plt.bar_label(rects2, padding=3, fmt='%.2f')

    plt.tight_layout()
    plt.show()

Looks like smaller features allow a bit faster convergence. Though I'm not sure it will always be the case, and in theory for some data larger kernels might catch features that would discriminate more.

## Experimenting with different model depths

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import os
import pickle
import numpy as np

results_depth = {}
model_depths = [1, 2, 3] # Representing shallow, medium, deep architectures

for depth in model_depths:
    print(f"\n{'='*60}\nProcessing model with {depth} convolutional blocks\n{'='*60}")

    # 1. Define Model based on depth
    model = models.Sequential()
    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)))
    model.add(layers.MaxPooling2D((2, 2)))

    if depth >= 2:
        model.add(layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(layers.MaxPooling2D((2, 2)))

    if depth >= 3:
        model.add(layers.Conv2D(64, (3, 3), activation='relu'))
        model.add(layers.MaxPooling2D((2, 2)))

    model.add(layers.Flatten())
    model.add(layers.Dense(64, activation='relu'))
    model.add(layers.Dense(8, activation='softmax'))

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # 2. Load or Train
    model_filename = os.path.join(models_dir, f'model_depth_{depth}.keras')
    history_filename = os.path.join(models_dir, f'history_depth_{depth}.pkl')

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for depth {depth}...")
        model = keras.models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training model with depth {depth}...")
        history_obj = model.fit(
            train_generator,
            epochs=10,
            validation_data=validation_generator,
            verbose=1
        )
        history = history_obj.history

        # Save model and history
        model.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for depth {depth}.")

    # 3. Evaluate on Test Data
    print(f"Evaluating depth {depth} model on test data...")
    test_loss, test_acc = model.evaluate(test_generator, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    results_depth[depth] = {'test_acc': test_acc, 'test_loss': test_loss}

    # 4. Plotting (3 Graphs: Accuracy, Loss, Val vs Test)
    acc = history['accuracy']
    val_acc = history['val_accuracy']
    loss = history['loss']
    val_loss = history['val_loss']
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(18, 5))

    # -- Graph 1: Accuracy --
    plt.subplot(1, 3, 1)
    plt.plot(epochs_range, acc, 'bo-', label='Training acc')
    plt.plot(epochs_range, val_acc, 'b', label='Validation acc')
    plt.title(f'Depth {depth} - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1)
    plt.legend()

    # -- Graph 2: Loss --
    plt.subplot(1, 3, 2)
    plt.plot(epochs_range, loss, 'bo-', label='Training loss')
    plt.plot(epochs_range, val_loss, 'b', label='Validation loss')
    plt.title(f'Depth {depth} - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()

    # -- Graph 3: Val vs Test Comparison --
    plt.subplot(1, 3, 3)
    metrics_names = ['Accuracy', 'Loss']
    val_values = [val_acc[-1], val_loss[-1]]
    test_values = [test_acc, test_loss]

    x_pos = np.arange(len(metrics_names))
    width = 0.35

    rects1 = plt.bar(x_pos - width/2, val_values, width, label='Validation')
    rects2 = plt.bar(x_pos + width/2, test_values, width, label='Test')

    plt.ylabel('Score')
    plt.title(f'Depth {depth} - Val vs Test')
    plt.xticks(x_pos, metrics_names)
    plt.legend()

    # Add labels
    plt.bar_label(rects1, padding=3, fmt='%.2f')
    plt.bar_label(rects2, padding=3, fmt='%.2f')

    plt.tight_layout()
    plt.show()

## Experimenting with pooling

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import os
import pickle

results_pooling_comparison = {}

# Define the 'pooling' CNN model
model_pooling = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(8, activation='softmax')
])

model_pooling.compile(optimizer='adam',
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])

# Define the 'no_pooling' CNN model
model_no_pooling = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(8, activation='softmax')
])

model_no_pooling.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

models_to_train = {
    'pooling': model_pooling,
    'no_pooling': model_no_pooling
}

for model_name_key, model_obj in models_to_train.items():
    print(f"\n{'='*60}\nProcessing {model_name_key} model\n{'='*60}")

    model_filename = os.path.join(models_dir, f'model_{model_name_key}.keras')
    history_filename = os.path.join(models_dir, f'history_{model_name_key}.pkl')

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for {model_name_key}...")
        model = keras.models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training {model_name_key} model...")
        history_obj = model_obj.fit(
            train_generator,
            epochs=10,
            validation_data=validation_generator,
            verbose=1
        )
        history = history_obj.history

        # Save model and history
        model_obj.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for {model_name_key}.")

    # Store history for later plotting
    results_pooling_comparison[model_name_key] = history

print("Models defined, trained (or loaded), and histories saved/stored.")

In [None]:
test_results = {}

# Construct the full path to the saved 'pooling' model file
model_filename_pooling = os.path.join(models_dir, 'model_pooling.keras')

# Load the 'pooling' model
model_pooling = keras.models.load_model(model_filename_pooling)

# Evaluate the loaded 'pooling' model on the test_generator
test_loss_pooling, test_acc_pooling = model_pooling.evaluate(test_generator, verbose=0)

# Store the obtained test accuracy and test loss in the test_results dictionary
test_results['pooling'] = {'test_acc': test_acc_pooling, 'test_loss': test_loss_pooling}

print(f"Pooling Model - Test Accuracy: {test_acc_pooling:.4f}")
print(f"Pooling Model - Test Loss: {test_loss_pooling:.4f}")
print("Test results for 'pooling' model stored.")

In [None]:
model_filename_no_pooling = os.path.join(models_dir, 'model_no_pooling.keras')

# Load the 'no_pooling' model
model_no_pooling = keras.models.load_model(model_filename_no_pooling)

# Evaluate the loaded 'no_pooling' model on the test_generator
test_loss_no_pooling, test_acc_no_pooling = model_no_pooling.evaluate(test_generator, verbose=0)

# Store the obtained test accuracy and test loss in the test_results dictionary
test_results['no_pooling'] = {'test_acc': test_acc_no_pooling, 'test_loss': test_loss_no_pooling}

print(f"No-Pooling Model - Test Accuracy: {test_acc_no_pooling:.4f}")
print(f"No-Pooling Model - Test Loss: {test_loss_no_pooling:.4f}")
print("Test results for 'no_pooling' model stored.")

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pickle
from tensorflow import keras

# Make sure models_dir and test_generator are defined from previous cells
# models_dir = '/content/drive/MyDrive/models'
# test_generator (already available)
# results_pooling_comparison (already available from previous cell)

# test_results dictionary already populated from previous steps

for model_name_key in results_pooling_comparison.keys():
    print(f"\n{'='*60}\nEvaluating and Plotting {model_name_key} model\n{'='*60}")

    model_filename = os.path.join(models_dir, f'model_{model_name_key}.keras')
    model = keras.models.load_model(model_filename)

    # 1. Evaluate on Test Data (Re-evaluating to get the values in this block for consistency, though already done)
    print(f"Evaluating {model_name_key} model on test data...")
    test_loss, test_acc = model.evaluate(test_generator, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    test_results[model_name_key] = {'test_acc': test_acc, 'test_loss': test_loss}

    # Retrieve history for plotting
    history = results_pooling_comparison[model_name_key]

    # 2. Plotting (Accuracy, Loss over Epochs, Val vs Test Comparison)
    acc = history['accuracy']
    val_acc = history['val_accuracy']
    loss = history['loss']
    val_loss = history['val_loss']
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(18, 5))

    # -- Graph 1: Accuracy over Epochs --
    plt.subplot(1, 3, 1)
    plt.plot(epochs_range, acc, 'bo-', label='Training Accuracy')
    plt.plot(epochs_range, val_acc, 'r^-', label='Validation Accuracy')
    plt.title(f'{model_name_key.replace("_", " ").capitalize()} Model - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1) # Ensure y-axis range from 0 to 1 for accuracy
    plt.legend()
    plt.grid(True)

    # -- Graph 2: Loss over Epochs --
    plt.subplot(1, 3, 2)
    plt.plot(epochs_range, loss, 'bo-', label='Training Loss')
    plt.plot(epochs_range, val_loss, 'r^-', label='Validation Loss')
    plt.title(f'{model_name_key.replace("_", " ").capitalize()} Model - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    # -- Graph 3: Validation vs Test Comparison --
    plt.subplot(1, 3, 3)
    metrics_names = ['Accuracy', 'Loss']
    val_values = [val_acc[-1], val_loss[-1]]
    current_test_values = [test_acc, test_loss]

    x_pos = np.arange(len(metrics_names))
    width = 0.35

    rects1 = plt.bar(x_pos - width/2, val_values, width, label='Validation')
    rects2 = plt.bar(x_pos + width/2, current_test_values, width, label='Test')

    plt.ylabel('Score')
    plt.title(f'{model_name_key.replace("_", " ").capitalize()} - Val vs Test')
    plt.xticks(x_pos, metrics_names)
    plt.legend()

    # Add labels
    plt.bar_label(rects1, padding=3, fmt='%.4f')
    plt.bar_label(rects2, padding=3, fmt='%.4f')
    plt.grid(axis='y')

    plt.tight_layout()
    plt.show()

print("\nComparison of test metrics for both models:")
for model_name, metrics in test_results.items():
    print(f"{model_name.capitalize()} Model: Test Accuracy = {metrics['test_acc']:.4f}, Test Loss = {metrics['test_loss']:.4f}")

# Summarize Comparison
print("\n--- Summary of Pooling vs. No-Pooling Models ---")
pooling_acc = test_results['pooling']['test_acc']
pooling_loss = test_results['pooling']['test_loss']
no_pooling_acc = test_results['no_pooling']['test_acc']
no_pooling_loss = test_results['no_pooling']['test_loss']

print(f"Pooling Model: Test Accuracy = {pooling_acc:.4f}, Test Loss = {pooling_loss:.4f}")
print(f"No-Pooling Model: Test Accuracy = {no_pooling_acc:.4f}, Test Loss = {no_pooling_loss:.4f}")

if pooling_acc > no_pooling_acc:
    print(f"\nThe Pooling Model achieved higher test accuracy ({pooling_acc:.4f}) compared to the No-Pooling Model ({no_pooling_acc:.4f}).")
    print("This suggests that pooling layers were beneficial in extracting robust features and reducing overfitting.")
elif no_pooling_acc > pooling_acc:
    print(f"\nThe No-Pooling Model achieved higher test accuracy ({no_pooling_acc:.4f}) compared to the Pooling Model ({pooling_acc:.4f}).")
    print("However, looking at the training and validation curves, the no-pooling model shows significant overfitting (high training accuracy, low validation accuracy), which is a common issue when pooling is absent, as it retains too much spatial information.")
else:
    print("\nBoth models achieved similar test accuracy.")

print("\nKey observations from plots:")
print("- **Pooling Model**: The training and validation accuracy/loss curves show a relatively stable learning process, with validation metrics generally following training metrics, indicating less overfitting.")
print("- **No-Pooling Model**: The training accuracy quickly reaches very high levels (near 1.0) while validation accuracy plateaus or even decreases, and validation loss increases significantly. This is a clear indication of severe overfitting, where the model learns the training data too well but fails to generalize to unseen data.")
print("\nConclusion: Pooling layers (MaxPooling2D in this case) are crucial for image classification tasks. They help in down-sampling feature maps, reducing computational cost, and providing translation invariance, which significantly improves the model's generalization capabilities and prevents overfitting, as evidenced by the much better performance and convergence behavior of the pooling model.")


Now that was insightful. Here I can clearly see that removing pooling leads to high overfitting to training data, despite theoretically preserving all data throughout the training.

Googling: what is skip connection in CNN?
Result: Skip connections, or residual connections, in CNNs are shortcuts that bypass one or more layers to feed the input of an earlier layer directly to the output of a later layer. This helps train deeper networks by addressing the vanishing gradient problem, allowing gradients to flow more easily through the network during backpropagation.
Ahh, I think I remember now. Do I need a big number of layers to get pronounced vanishing gradients? I'm not using very deep nets up until now.

## Experimenting with skipping connections

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import os
import pickle

results_skip_comparison = {}

# Define the 'sequential' CNN model
model_sequential = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(8, activation='softmax')
])

model_sequential.compile(optimizer='adam',
                         loss='sparse_categorical_crossentropy',
                         metrics=['accuracy'])

# Define the 'skip_connection' CNN model using Keras Functional API
input_tensor = keras.Input(shape=(64, 64, 3))

# First Conv block
x = layers.Conv2D(32, (3, 3), activation='relu')(input_tensor)
x = layers.MaxPooling2D((2, 2))(x)
skip_connection_branch = x  # Output: (31, 31, 32) (assuming 'valid' padding for Conv2D)

# Second Conv block with skip connection
# Main path
y = layers.Conv2D(64, (3, 3), activation='relu')(x)  # Output: (29, 29, 64)

# Skip path: needs to match shape of y (29, 29, 64)
# Apply a Conv2D with kernel_size=(3,3) and 'valid' padding to reduce spatial dimensions
# and increase channels to match 'y'. Activation can be linear or relu for the skip path.
skip_connection_branch_processed = layers.Conv2D(64, (3, 3), activation='relu')(skip_connection_branch)

# Add the main path and skip path
z = layers.add([y, skip_connection_branch_processed])
z = layers.MaxPooling2D((2, 2))(z) # Output: (14, 14, 64)

# Third Conv block
z = layers.Conv2D(64, (3, 3), activation='relu')(z)
z = layers.MaxPooling2D((2, 2))(z) # Output: (6, 6, 64)

# Classifier head
z = layers.Flatten()(z)
z = layers.Dense(64, activation='relu')(z)
output_tensor = layers.Dense(8, activation='softmax')(z)

model_skip_connection = keras.Model(inputs=input_tensor, outputs=output_tensor, name='skip_connection_model')

model_skip_connection.compile(optimizer='adam',
                              loss='sparse_categorical_crossentropy',
                              metrics=['accuracy'])

models_to_train = {
    'sequential': model_sequential,
    'skip_connection': model_skip_connection
}

for model_name_key, model_obj in models_to_train.items():
    print(f"\n{'='*60}\nProcessing {model_name_key} model\n{'='*60}")

    model_filename = os.path.join(models_dir, f'model_{model_name_key}.keras')
    history_filename = os.path.join(models_dir, f'history_{model_name_key}.pkl')

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for {model_name_key}...")
        # Ensure to load into model_obj if it was defined as a placeholder, or assign to a new variable
        # For this setup, we'll reload and use the loaded model for evaluation/plotting later
        loaded_model = keras.models.load_model(model_filename) # Load, but the model_obj itself was already compiled
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training {model_name_key} model...")
        history_obj = model_obj.fit(
            train_generator,
            epochs=10,
            validation_data=validation_generator,
            verbose=1
        )
        history = history_obj.history

        # Save model and history
        model_obj.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for {model_name_key}.")

    # Store history for later plotting
    results_skip_comparison[model_name_key] = history

print("Models defined, trained (or loaded), and histories saved/stored in results_skip_comparison.")

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pickle
from tensorflow import keras

# Ensure models_dir and test_generator are defined from previous cells
# models_dir = '/content/drive/MyDrive/models'
# test_generator (already available)
# results_skip_comparison (already available from previous cell)

# Initialize test_results for skip connection comparison if not already done
# (It might have been populated by the previous 'pooling' comparison, but it's safer to re-initialize or ensure proper keys)
comparison_test_results = {}

for model_name_key in results_skip_comparison.keys():
    print(f"\n{'='*60}\nEvaluating and Plotting {model_name_key} model\n{'='*60}")

    model_filename = os.path.join(models_dir, f'model_{model_name_key}.keras')
    model = keras.models.load_model(model_filename)

    # 1. Evaluate on Test Data
    print(f"Evaluating {model_name_key} model on test data...")
    test_loss, test_acc = model.evaluate(test_generator, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    comparison_test_results[model_name_key] = {'test_acc': test_acc, 'test_loss': test_loss}

    # Retrieve history for plotting
    history = results_skip_comparison[model_name_key]

    # 2. Plotting (Accuracy, Loss over Epochs, Val vs Test Comparison)
    acc = history['accuracy']
    val_acc = history['val_accuracy']
    loss = history['loss']
    val_loss = history['val_loss']
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(18, 5))

    # -- Graph 1: Accuracy over Epochs --
    plt.subplot(1, 3, 1)
    plt.plot(epochs_range, acc, 'bo-', label='Training Accuracy')
    plt.plot(epochs_range, val_acc, 'r^-', label='Validation Accuracy')
    plt.title(f'{model_name_key.replace("_", " ").capitalize()} Model - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1) # Ensure y-axis range from 0 to 1 for accuracy
    plt.legend()
    plt.grid(True)

    # -- Graph 2: Loss over Epochs --
    plt.subplot(1, 3, 2)
    plt.plot(epochs_range, loss, 'bo-', label='Training Loss')
    plt.plot(epochs_range, val_loss, 'r^-', label='Validation Loss')
    plt.title(f'{model_name_key.replace("_", " ").capitalize()} Model - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    # -- Graph 3: Validation vs Test Comparison --
    plt.subplot(1, 3, 3)
    metrics_names = ['Accuracy', 'Loss']
    val_values = [val_acc[-1], val_loss[-1]]
    current_test_values = [test_acc, test_loss]

    x_pos = np.arange(len(metrics_names))
    width = 0.35

    rects1 = plt.bar(x_pos - width/2, val_values, width, label='Validation')
    rects2 = plt.bar(x_pos + width/2, current_test_values, width, label='Test')

    plt.ylabel('Score')
    plt.title(f'{model_name_key.replace("_", " ").capitalize()} - Val vs Test')
    plt.xticks(x_pos, metrics_names)
    plt.legend()

    # Add labels
    plt.bar_label(rects1, padding=3, fmt='%.4f')
    plt.bar_label(rects2, padding=3, fmt='%.4f')
    plt.grid(axis='y')

    plt.tight_layout()
    plt.show()

print("\nComparison of test metrics for both models:")
for model_name, metrics in comparison_test_results.items():
    print(f"{model_name.capitalize()} Model: Test Accuracy = {metrics['test_acc']:.4f}, Test Loss = {metrics['test_loss']:.4f}")

# Summarize Comparison
print("\n--- Summary of Sequential vs. Skip Connection Models ---")
sequential_acc = comparison_test_results['sequential']['test_acc']
sequential_loss = comparison_test_results['sequential']['test_loss']
skip_acc = comparison_test_results['skip_connection']['test_acc']
skip_loss = comparison_test_results['skip_connection']['test_loss']

print(f"Sequential Model: Test Accuracy = {sequential_acc:.4f}, Test Loss = {sequential_loss:.4f}")
print(f"Skip_connection Model: Test Accuracy = {skip_acc:.4f}, Test Loss = {skip_loss:.4f}")

if skip_acc > sequential_acc:
    print(f"\nThe Skip Connection Model achieved higher test accuracy ({skip_acc:.4f}) compared to the Sequential Model ({sequential_acc:.4f}).")
    print("This suggests that incorporating skip connections can be beneficial for improving model performance and potentially alleviating vanishing gradient problems in deeper networks.")
elif sequential_acc > skip_acc:
    print(f"\nThe Sequential Model achieved higher test accuracy ({sequential_acc:.4f}) compared to the Skip Connection Model ({skip_acc:.4f}).")
    print("This result is unexpected and might indicate issues with the skip connection implementation, hyperparameter tuning, or dataset characteristics.")
else:
    print("\nBoth models achieved similar test accuracy.")

print("\nKey observations from plots (based on typical behavior, may vary with specific training runs):")
print("- **Sequential Model**: Observe the trends in training and validation accuracy/loss to identify signs of overfitting or underfitting.")
print("- **Skip Connection Model**: Observe the trends, typically expecting smoother learning curves and better generalization due to skip connections facilitating gradient flow.")
print("\nFurther analysis might involve:")
print("- Examining the complexity of the skip connection branch to ensure it's not adding unnecessary parameters or complexity.")
print("- Experimenting with different skip connection architectures (e.g., identity mapping, convolutional layers with 1x1 kernels).")
print("- Adjusting learning rates, optimizers, or regularization techniques to optimize both models.")


Skip connections doesn't seem so much different from sequential in this case. Probably need way more layers to see the difference?

# Experimenting with different number of hidden layers

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
import matplotlib.pyplot as plt
import os
import pickle
import numpy as np

# Define the specific layer counts requested
dense_layer_counts = [1, 5, 9]
results_dense_layers = {}

for count in dense_layer_counts:
    print(f"\n{'='*60}\nProcessing model with {count} Hidden Dense Layer(s)\n{'='*60}")

    # 1. Define Model
    model = models.Sequential()
    # Fixed Convolutional Base
    model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Conv2D(64, (3, 3), activation='relu'))
    model.add(layers.MaxPooling2D((2, 2)))
    model.add(layers.Flatten())

    # Variable Number of Fully Connected (Dense) Layers
    for _ in range(count):
        model.add(layers.Dense(64, activation='relu'))

    # Output Layer
    model.add(layers.Dense(8, activation='softmax'))

    model.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

    # 2. Load or Train
    model_name = f'dense_layers_{count}'
    model_filename = os.path.join(models_dir, f'model_{model_name}.keras')
    history_filename = os.path.join(models_dir, f'history_{model_name}.pkl')

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for {model_name}...")
        model = keras.models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training {model_name}...")
        history_obj = model.fit(
            train_generator,
            epochs=10,
            validation_data=validation_generator,
            verbose=1
        )
        history = history_obj.history

        # Save model and history
        model.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for {model_name}.")

    # 3. Evaluate on Test Data
    print(f"Evaluating {model_name} on test data...")
    test_loss, test_acc = model.evaluate(test_generator, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")
    results_dense_layers[count] = {'test_acc': test_acc, 'test_loss': test_loss, 'history': history}

    # 4. Plotting Individual Model Performance
    acc = history['accuracy']
    val_acc = history['val_accuracy']
    loss = history['loss']
    val_loss = history['val_loss']
    epochs_range = range(1, len(acc) + 1)

    plt.figure(figsize=(12, 5))

    # Accuracy Plot
    plt.subplot(1, 2, 1)
    plt.plot(epochs_range, acc, 'bo-', label='Training acc')
    plt.plot(epochs_range, val_acc, 'r^-', label='Validation acc')
    plt.title(f'{count} Hidden Dense Layer(s) - Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.ylim(0, 1)
    plt.legend()
    plt.grid(True)

    # Loss Plot
    plt.subplot(1, 2, 2)
    plt.plot(epochs_range, loss, 'bo-', label='Training loss')
    plt.plot(epochs_range, val_loss, 'r^-', label='Validation loss')
    plt.title(f'{count} Hidden Dense Layer(s) - Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

# 5. Compare All Models
print(f"\n{'='*60}\nComparison of Dense Layer Configurations (1, 5, 9)\n{'='*60}")

metrics_names = ['Accuracy', 'Loss']
labels = [f'{c} Layers' for c in dense_layer_counts]
accuracies = [results_dense_layers[c]['test_acc'] for c in dense_layer_counts]
losses = [results_dense_layers[c]['test_loss'] for c in dense_layer_counts]

x = np.arange(len(labels))
width = 0.35

plt.figure(figsize=(10, 6))
rects1 = plt.bar(x - width/2, accuracies, width, label='Test Accuracy')
rects2 = plt.bar(x + width/2, losses, width, label='Test Loss')

plt.ylabel('Scores')
plt.title('Test Metrics by Number of Hidden Dense Layers')
plt.xticks(x, labels)
plt.legend()

plt.bar_label(rects1, padding=3, fmt='%.4f')
plt.bar_label(rects2, padding=3, fmt='%.4f')

plt.tight_layout()
plt.show()

# Print Summary Text
for count in dense_layer_counts:
    print(f"{count} Hidden Dense Layer(s): Test Accuracy = {results_dense_layers[count]['test_acc']:.4f}, Test Loss = {results_dense_layers[count]['test_loss']:.4f}")

I can see some better results in minimising loss after a bigger number of layers.
If I'll have some time left I want to return to this excercise to see how deep can I go >.>

# Dataset variations

## 1 channel vs 3 channels

Google: what is a channel in a dataset?
Result: In a dataset, a channel represents a layer of information for a single data point, such as a single pixel in an image. A 1-channel dataset, like a grayscale image, has a single value per pixel to represent its intensity, while a 3-channel dataset, like a standard RGB color image, has three values per pixel for the red, green, and blue components.


Oh I see, I was a bit confused because I didn't work with images before. I get it now.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import os
import pickle
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras import layers, models

# 1. Define Generators
datagen = ImageDataGenerator(rescale=1./255)

# RGB Generators
print("Creating RGB Generators...")
train_gen_rgb = datagen.flow_from_dataframe(
    train, x_col='file', y_col='age_bins', target_size=(64, 64),
    batch_size=32, class_mode='sparse', color_mode='rgb'
)
val_gen_rgb = datagen.flow_from_dataframe(
    validate, x_col='file', y_col='age_bins', target_size=(64, 64),
    batch_size=32, class_mode='sparse', color_mode='rgb'
)
test_gen_rgb = datagen.flow_from_dataframe(
    test, x_col='file', y_col='age_bins', target_size=(64, 64),
    batch_size=32, class_mode='sparse', color_mode='rgb'
)

# Grayscale Generators
print("Creating Grayscale Generators...")
train_gen_gray = datagen.flow_from_dataframe(
    train, x_col='file', y_col='age_bins', target_size=(64, 64),
    batch_size=32, class_mode='sparse', color_mode='grayscale'
)
val_gen_gray = datagen.flow_from_dataframe(
    validate, x_col='file', y_col='age_bins', target_size=(64, 64),
    batch_size=32, class_mode='sparse', color_mode='grayscale'
)
test_gen_gray = datagen.flow_from_dataframe(
    test, x_col='file', y_col='age_bins', target_size=(64, 64),
    batch_size=32, class_mode='sparse', color_mode='grayscale'
)

# 2. Define Model Creation Function
def create_model(input_shape):
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=input_shape),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        layers.Dense(8, activation='softmax')
    ])
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    return model

# 3. Experiment Loop
configs = [
    {'name': 'rgb', 'shape': (64, 64, 3), 'train': train_gen_rgb, 'val': val_gen_rgb, 'test': test_gen_rgb},
    {'name': 'grayscale', 'shape': (64, 64, 1), 'train': train_gen_gray, 'val': val_gen_gray, 'test': test_gen_gray}
]

results_channels = {}

for config in configs:
    name = config['name']
    print(f"\n{'='*60}\nProcessing {name} model\n{'='*60}")

    model_filename = os.path.join(models_dir, f'model_channel_{name}.keras')
    history_filename = os.path.join(models_dir, f'history_channel_{name}.pkl')

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for {name}...")
        model = models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training {name} model...")
        model = create_model(config['shape'])
        history_obj = model.fit(
            config['train'],
            epochs=10,
            validation_data=config['val'],
            verbose=1
        )
        history = history_obj.history

        # Save
        model.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for {name}.")

    # Evaluate
    print(f"Evaluating {name} model...")
    loss, acc = model.evaluate(config['test'], verbose=0)
    print(f"Test Accuracy: {acc:.4f}, Test Loss: {loss:.4f}")
    results_channels[name] = {'acc': acc, 'loss': loss, 'history': history}

# 4. Plotting Results
# Bar Chart Comparison
names = list(results_channels.keys())
accs = [results_channels[n]['acc'] for n in names]
losses = [results_channels[n]['loss'] for n in names]

x = np.arange(len(names))
width = 0.35

plt.figure(figsize=(10, 5))
rects1 = plt.bar(x - width/2, accs, width, label='Accuracy')
rects2 = plt.bar(x + width/2, losses, width, label='Loss')
plt.xlabel('Channel Configuration')
plt.ylabel('Score')
plt.title('RGB vs Grayscale Test Metrics')
plt.xticks(x, names)
plt.legend()
plt.bar_label(rects1, padding=3, fmt='%.4f')
plt.bar_label(rects2, padding=3, fmt='%.4f')
plt.tight_layout()
plt.show()

# Line Plots Comparison
plt.figure(figsize=(14, 6))

# Accuracy Plot
plt.subplot(1, 2, 1)
for name in names:
    hist = results_channels[name]['history']
    plt.plot(hist['accuracy'], label=f'{name} Train')
    plt.plot(hist['val_accuracy'], linestyle='--', label=f'{name} Val')
plt.title('Training & Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

# Loss Plot
plt.subplot(1, 2, 2)
for name in names:
    hist = results_channels[name]['history']
    plt.plot(hist['loss'], label=f'{name} Train')
    plt.plot(hist['val_loss'], linestyle='--', label=f'{name} Val')
plt.title('Training & Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

I'm not sure why it needed to reinitialize RGB dataset. Anyway, I'd probably spent more time arguing with it, so I'll just roll with it for now.

RGB accuracy and validation looks a lit bit better. I'm not surprized, to be fair, given that three channels convey more information.
Anyway, it was a good excersize just to try it out myself.

Prompt: what are the ways to augment image data?

Responce:
## Data Augmentation

Data augmentation helps prevent overfitting by exposing the model to more variations of the training data. We can use `ImageDataGenerator` to apply transformations on the fly during training.

Below, we define a generator with:
*   Rotation
*   Width/Height Shifts
*   Shear
*   Zoom
*   Horizontal Flip

In [None]:
from tensorflow.keras.preprocessing.image import load_img, img_to_array

# 1. Select a sample image
sample_file = train['file'].iloc[0]
print(f"Visualizing augmentations for: {sample_file}")

# Load and convert to array
img = load_img(sample_file, target_size=(64, 64))
x = img_to_array(img)
x = x.reshape((1,) + x.shape)  # Reshape to (1, 64, 64, 3)

# 2. Define Augmentation Generator
aug_datagen = ImageDataGenerator(
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# 3. Generate and Plot Augmented Images
i = 0
plt.figure(figsize=(12, 6))
for batch in aug_datagen.flow(x, batch_size=1):
    plt.subplot(2, 4, i + 1)
    imgplot = plt.imshow(batch[0].astype('uint8'))
    plt.axis('off')
    i += 1
    if i % 8 == 0:
        break

plt.suptitle("Example of Data Augmentations")
plt.show()

In [None]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# 1. Define Augmented Training Generator
# Note: rescale=1./255 is applied to both augmented and validation/test generators
train_datagen_aug = ImageDataGenerator(
    rescale=1./255,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest'
)

# 2. Define Validation/Test Generator (No Augmentation, only rescale)
valid_test_datagen = ImageDataGenerator(rescale=1./255)

print("Setting up Augmented Training Generator...")
train_generator_aug = train_datagen_aug.flow_from_dataframe(
    dataframe=train,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

print("Setting up Validation and Test Generators...")
validation_generator_aug = valid_test_datagen.flow_from_dataframe(
    dataframe=validate,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

test_generator_aug = valid_test_datagen.flow_from_dataframe(
    dataframe=test,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

### Training a model on augmented data

In [None]:
import matplotlib.pyplot as plt
import os
import pickle
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers, models

# 1. Define Model Architecture (Same as RGB baseline)
model_aug = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(8, activation='softmax')
])

model_aug.compile(optimizer='adam',
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])

# 2. Load or Train Check
model_filename = os.path.join(models_dir, 'model_aug.keras')
history_filename = os.path.join(models_dir, 'history_aug.pkl')

if os.path.exists(model_filename) and os.path.exists(history_filename):
    print("Loading existing augmented model and history...")
    model_aug = keras.models.load_model(model_filename)
    with open(history_filename, 'rb') as f:
        history_aug = pickle.load(f)
else:
    print("Training augmented model...")
    history_obj = model_aug.fit(
        train_generator_aug,
        epochs=10,
        validation_data=validation_generator_aug,
        verbose=1
    )
    history_aug = history_obj.history

    # Save
    model_aug.save(model_filename)
    with open(history_filename, 'wb') as f:
        pickle.dump(history_aug, f)
    print("Saved augmented model and history.")

# 3. Evaluate on Test Data
print("Evaluating augmented model on test data...")
test_loss_aug, test_acc_aug = model_aug.evaluate(test_generator_aug, verbose=0)

# 4. Comparison Visualization
print(f"\n{'='*60}\nComparison: Augmented vs Baseline RGB\n{'='*60}")

# Retrieve Baseline Results
baseline_acc = results_channels['rgb']['acc']
baseline_loss = results_channels['rgb']['loss']

labels = ['Baseline RGB', 'Augmented']
accuracies = [baseline_acc, test_acc_aug]
losses = [baseline_loss, test_loss_aug]

x = np.arange(len(labels))
width = 0.35

plt.figure(figsize=(10, 6))
rects1 = plt.bar(x - width/2, accuracies, width, label='Test Accuracy')
rects2 = plt.bar(x + width/2, losses, width, label='Test Loss')

plt.ylabel('Scores')
plt.title('Test Metrics: Baseline RGB vs Augmented')
plt.xticks(x, labels)
plt.legend()
plt.bar_label(rects1, padding=3, fmt='%.4f')
plt.bar_label(rects2, padding=3, fmt='%.4f')
plt.show()

# 5. Learning Curves for Augmented Model
plt.figure(figsize=(14, 5))

# Accuracy
plt.subplot(1, 2, 1)
plt.plot(history_aug['accuracy'], label='Train Accuracy')
plt.plot(history_aug['val_accuracy'], linestyle='--', label='Val Accuracy')
plt.title('Augmented Model - Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

# Loss
plt.subplot(1, 2, 2)
plt.plot(history_aug['loss'], label='Train Loss')
plt.plot(history_aug['val_loss'], linestyle='--', label='Val Loss')
plt.title('Augmented Model - Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

# 6. Print Summary
print(f"Baseline RGB Model: Test Accuracy = {baseline_acc:.4f}, Test Loss = {baseline_loss:.4f}")
print(f"Augmented Model:    Test Accuracy = {test_acc_aug:.4f}, Test Loss = {test_loss_aug:.4f}")

Model based on augmented data show less accuracy and bigger loss after the training. That is interesting. Guess augmentation can be usefull still for standartized datasets that would be used for inference from non-standartized data.

# Trying out regression inference of age (as opposed to categorical)

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pickle
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator

# 1. Setup Regression Generators (Target: 'age')
# We use the 'raw' class_mode for regression targets
datagen = ImageDataGenerator(rescale=1./255)

print("Setting up Regression Generators...")
# Using 'age' column which contains integer values
train_gen_reg = datagen.flow_from_dataframe(
    dataframe=train, x_col='file', y_col='age',
    target_size=(64, 64), batch_size=32, class_mode='raw', color_mode='rgb'
)

val_gen_reg = datagen.flow_from_dataframe(
    dataframe=validate, x_col='file', y_col='age',
    target_size=(64, 64), batch_size=32, class_mode='raw', color_mode='rgb'
)

test_gen_reg = datagen.flow_from_dataframe(
    dataframe=test, x_col='file', y_col='age',
    target_size=(64, 64), batch_size=32, class_mode='raw', color_mode='rgb'
)

# 2. Define Regression Model
def create_regression_model():
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        # Output layer: 1 unit with linear activation for regression
        layers.Dense(1, activation='linear')
    ])
    model.compile(optimizer='adam',
                  loss='mse',  # Mean Squared Error for regression
                  metrics=['mae']) # Mean Absolute Error
    return model

# 3. Train or Load
model_name = 'regression'
model_filename = os.path.join(models_dir, f'model_{model_name}.keras')
history_filename = os.path.join(models_dir, f'history_{model_name}.pkl')

if os.path.exists(model_filename) and os.path.exists(history_filename):
    print(f"Loading existing model and history for {model_name}...")
    model_reg = keras.models.load_model(model_filename)
    with open(history_filename, 'rb') as f:
        history_reg = pickle.load(f)
else:
    print(f"Training {model_name} model...")
    model_reg = create_regression_model()
    history_obj = model_reg.fit(
        train_gen_reg,
        epochs=10,
        validation_data=val_gen_reg,
        verbose=1
    )
    history_reg = history_obj.history

    # Save
    model_reg.save(model_filename)
    with open(history_filename, 'wb') as f:
        pickle.dump(history_reg, f)
    print(f"Saved model and history for {model_name}.")

# 4. Evaluate
print("Evaluating regression model on test data...")
test_mse, test_mae = model_reg.evaluate(test_gen_reg, verbose=0)
print(f"Test MSE: {test_mse:.4f}")
print(f"Test MAE: {test_mae:.4f}")

# 5. Visualization
# History
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(history_reg['mae'], label='Train MAE')
plt.plot(history_reg['val_mae'], label='Val MAE')
plt.title('Model MAE')
plt.xlabel('Epochs')
plt.ylabel('MAE (Years)')
plt.legend()
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(history_reg['loss'], label='Train MSE')
plt.plot(history_reg['val_loss'], label='Val MSE')
plt.title('Model Loss (MSE)')
plt.xlabel('Epochs')
plt.ylabel('MSE')
plt.legend()
plt.grid(True)
plt.show()

# Predictions vs Actual
# Create a separate generator without shuffling to match predictions with true values
test_gen_reg_no_shuffle = datagen.flow_from_dataframe(
    dataframe=test, x_col='file', y_col='age',
    target_size=(64, 64), batch_size=32, class_mode='raw', color_mode='rgb', shuffle=False
)

predictions = model_reg.predict(test_gen_reg_no_shuffle)
true_values = test['age'].values

plt.figure(figsize=(8, 8))
plt.scatter(true_values, predictions, alpha=0.5)
# Plot ideal prediction line
max_age = max(max(true_values), max(predictions.flatten()))
plt.plot([0, max_age], [0, max_age], 'r--', label='Ideal Fit')
plt.xlabel('True Age')
plt.ylabel('Predicted Age')
plt.title('True vs Predicted Age (Regression)')
plt.legend()
plt.grid(True)
plt.show()

# Trying out different final layer activation functions

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import pickle

# 1. Initialize Generators
datagen = ImageDataGenerator(rescale=1./255)

print("Setting up Generators...")
train_gen = datagen.flow_from_dataframe(
    dataframe=train, x_col='file', y_col='age_bins',
    target_size=(64, 64), batch_size=32, class_mode='sparse', color_mode='rgb'
)

val_gen = datagen.flow_from_dataframe(
    dataframe=validate, x_col='file', y_col='age_bins',
    target_size=(64, 64), batch_size=32, class_mode='sparse', color_mode='rgb'
)

test_gen = datagen.flow_from_dataframe(
    dataframe=test, x_col='file', y_col='age_bins',
    target_size=(64, 64), batch_size=32, class_mode='sparse', color_mode='rgb'
)

# 2. Define Model Function
def create_model(final_activation):
    model = models.Sequential([
        layers.Conv2D(32, (3, 3), activation='relu', input_shape=(64, 64, 3)),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Conv2D(64, (3, 3), activation='relu'),
        layers.MaxPooling2D((2, 2)),
        layers.Flatten(),
        layers.Dense(64, activation='relu'),
        # Final layer with specified activation
        layers.Dense(8, activation=final_activation)
    ])
    return model

# 3. Configurations
activation_configs = [
    {'name': 'softmax', 'from_logits': False},
    {'name': 'sigmoid', 'from_logits': False},
    {'name': 'linear', 'from_logits': True}
]

results_final_act = {}

# 4. Loop through configurations
for config in activation_configs:
    act_name = config['name']
    from_logits = config['from_logits']
    print(f"\n{'='*60}\nProcessing model with final activation: {act_name}\n{'='*60}")

    model_filename = os.path.join(models_dir, f'model_act_{act_name}.keras')
    history_filename = os.path.join(models_dir, f'history_act_{act_name}.pkl')

    if os.path.exists(model_filename) and os.path.exists(history_filename):
        print(f"Loading existing model and history for {act_name}...")
        model = keras.models.load_model(model_filename)
        with open(history_filename, 'rb') as f:
            history = pickle.load(f)
    else:
        print(f"Training {act_name} model...")
        model = create_model(act_name)
        model.compile(optimizer='adam',
                      loss=keras.losses.SparseCategoricalCrossentropy(from_logits=from_logits),
                      metrics=['accuracy'])

        history_obj = model.fit(
            train_gen,
            epochs=10,
            validation_data=val_gen,
            verbose=1
        )
        history = history_obj.history

        # Save
        model.save(model_filename)
        with open(history_filename, 'wb') as f:
            pickle.dump(history, f)
        print(f"Saved model and history for {act_name}.")

    # 5. Evaluate
    print(f"Evaluating {act_name} model on test data...")
    test_loss, test_acc = model.evaluate(test_gen, verbose=0)
    print(f"Test Accuracy: {test_acc:.4f}")
    print(f"Test Loss: {test_loss:.4f}")

    results_final_act[act_name] = {'acc': test_acc, 'loss': test_loss, 'history': history}

Oh god, I just can't be friends with LLMs. Will it initialize generators every time now?

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Ensure results_final_act is available (from previous step)
if 'results_final_act' in locals():
    print(f"{'='*60}\nComparison of Final Layer Activations\n{'='*60}")

    # 1. Bar Chart Comparison
    labels = [name.capitalize() for name in results_final_act.keys()]
    accuracies = [results_final_act[name]['acc'] for name in results_final_act.keys()]
    losses = [results_final_act[name]['loss'] for name in results_final_act.keys()]

    x = np.arange(len(labels))
    width = 0.35

    plt.figure(figsize=(10, 6))
    rects1 = plt.bar(x - width/2, accuracies, width, label='Test Accuracy')
    rects2 = plt.bar(x + width/2, losses, width, label='Test Loss')

    plt.ylabel('Score')
    plt.title('Test Metrics by Final Layer Activation')
    plt.xticks(x, labels)
    plt.legend()
    plt.bar_label(rects1, padding=3, fmt='%.4f')
    plt.bar_label(rects2, padding=3, fmt='%.4f')
    plt.tight_layout()
    plt.show()

    # 2. Learning Curves Comparison
    plt.figure(figsize=(14, 6))

    # Accuracy Plot
    plt.subplot(1, 2, 1)
    for name in results_final_act:
        acc = results_final_act[name]['history']['accuracy']
        val_acc = results_final_act[name]['history']['val_accuracy']
        plt.plot(acc, label=f'{name.capitalize()} Train')
        plt.plot(val_acc, linestyle='--', label=f'{name.capitalize()} Val')
    plt.title('Accuracy Comparison')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.legend()
    plt.grid(True)

    # Loss Plot
    plt.subplot(1, 2, 2)
    for name in results_final_act:
        loss = results_final_act[name]['history']['loss']
        val_loss = results_final_act[name]['history']['val_loss']
        plt.plot(loss, label=f'{name.capitalize()} Train')
        plt.plot(val_loss, linestyle='--', label=f'{name.capitalize()} Val')
    plt.title('Loss Comparison')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.legend()
    plt.grid(True)

    plt.tight_layout()
    plt.show()

    # Summary Text
    print("\nDetailed Metrics:")
    for name in results_final_act:
        print(f"{name.capitalize()}: Test Acc = {results_final_act[name]['acc']:.4f}, Loss = {results_final_act[name]['loss']:.4f}")
else:
    print("Error: 'results_final_act' dictionary not found. Please run the training/evaluation step first.")

Hmm, it was actually around the same accuracy/loss for all three, which I didn't expect. Why was that?

# Saving test dataset as a csv file.

In [None]:
test.to_csv(models_dir + '/test.csv', index=False)

Oh hell, I lost part of part 3 because I didn't commit my notebook and colab doesn't save file from github by default.

# Part 3. Backbone models and fine tuning

## Split training set

In [None]:
training_block_1 = train.sample(frac=0.5, random_state=42)
training_block_2 = train.drop(training_block_1.index)

print(f"Training block 1 size: {len(training_block_1)}")
print(f"Training block 2 size: {len(training_block_2)}")

There was a note about using LLMs, but I've lost it in unequal fight with Colab.

In [None]:
training_block_1.to_csv(models_dir + 'training_block_1.cvs')
training_block_2.to_csv(models_dir + 'training_block_2.cvs')

## Autoencoder Modelling: For Block 1 Images

# Task
Define a wrapper function `autoencoder_generator` that takes an existing `ImageDataGenerator` iterator and yields `(image, image)` pairs instead of `(image, label)` pairs, as autoencoders require the input to be the target. Then, using the existing `train`, `validate`, and `test` dataframes and the `datagen` (with `rescale=1./255`), create new base generators for RGB images (64x64) and wrap them to produce `train_gen_auto`, `val_gen_auto`, and `test_gen_auto`.

## Prepare Autoencoder Data Generators

### Subtask:
Define a generator wrapper and prepare data iterators for autoencoder training.


**Reasoning**:
Define the `autoencoder_generator` function to adapt data for autoencoder training (input equals output) and create the corresponding data generators using the train, validation, and test dataframes.



In [None]:
# 1. Define generator wrapper for Autoencoder (Target = Input)
def autoencoder_generator(generator):
    while True:
        # Extract batch. flow_from_dataframe yields (x, y)
        # We disregard y (labels) and yield (x, x)
        x, y = next(generator)
        yield x, x

# 2. Create Base Generators (using existing 'datagen')
print("Setting up Base Generators for Autoencoder...")
train_gen_base = datagen.flow_from_dataframe(
    dataframe=training_block_1,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

val_gen_base = datagen.flow_from_dataframe(
    dataframe=validate,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

test_gen_base = datagen.flow_from_dataframe(
    dataframe=test,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

# 3. Create Autoencoder Iterators
train_gen_auto = autoencoder_generator(train_gen_base)
val_gen_auto = autoencoder_generator(val_gen_base)
test_gen_auto = autoencoder_generator(test_gen_base)

print("Autoencoder generators initialized.")

## Define and Train Autoencoder

### Subtask:
Define the Convolutional Autoencoder architecture (encoder and decoder), compile it with MSE loss, and train it using the prepared data generators.


**Reasoning**:
Define the Convolutional Autoencoder architecture, compile it with MSE loss, and train it using the previously prepared autoencoder data generators.



In [None]:
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.models import Model
import os
import pickle

# 1. Define Autoencoder Architecture
input_img = Input(shape=(64, 64, 3))

# Encoder
x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(128, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# Decoder
x = Conv2D(128, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(64, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(32, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(3, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adam', loss='mean_squared_error')
autoencoder.summary()

# 2. Train the Model
# Calculate steps per epoch manually since we are using generators derived from dataframes
steps_per_epoch = len(train) // 32
validation_steps = len(validate) // 32

model_filename = os.path.join(models_dir, 'autoencoder.keras')
history_filename = os.path.join(models_dir, 'autoencoder_history.pkl')

if os.path.exists(model_filename) and os.path.exists(history_filename):
    print("Loading existing autoencoder model and history...")
    autoencoder = keras.models.load_model(model_filename)
    with open(history_filename, 'rb') as f:
        history_auto = pickle.load(f)
else:
    print("Training autoencoder...")
    history_obj = autoencoder.fit(
        train_gen_auto,
        epochs=15,
        steps_per_epoch=steps_per_epoch,
        validation_data=val_gen_auto,
        validation_steps=validation_steps,
        verbose=1
    )
    history_auto = history_obj.history

    # Save model and history
    autoencoder.save(model_filename)
    with open(history_filename, 'wb') as f:
        pickle.dump(history_auto, f)
    print("Saved autoencoder model and history.")

# Task
Visualize the autoencoder's performance by selecting a batch of test images from `test_gen_auto`, generating their reconstructions, and plotting them side-by-side. Then, add a discussion on why MSE was chosen as the loss function and what other metrics like MAE or SSIM could indicate. Finally, provide a summary of the autoencoder construction and an assessment of the image quality.

## Visualize Reconstruction

### Subtask:
Generate reconstructions for a batch of test images and visualize them alongside the originals.


**Reasoning**:
Retrieve a batch of test images, generate reconstructions using the trained autoencoder, and visualize the original vs. reconstructed images to assess performance.



In [None]:
import matplotlib.pyplot as plt

# 1. Retrieve a single batch of test images
# The generator yields (x, x), so we take the first element
x_test, _ = next(test_gen_auto)

# 2. Generate reconstructed images
decoded_imgs = autoencoder.predict(x_test)

# 3. Visualize a subset of images
n = 10  # Number of images to display
plt.figure(figsize=(20, 4))
for i in range(n):
    # Display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i])
    plt.title("Original")
    plt.axis("off")

    # Display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i])
    plt.title("Reconstructed")
    plt.axis("off")

plt.suptitle("Autoencoder Reconstruction Results")
plt.show()

We can see that the reconstructed image quality is somewhat worse than the original. I think that is a normal thing and attribute it to pooling and upsampling, where we could lose some information.

### Discussion and Summary

**Choice of Loss Function (MSE):**
Mean Squared Error (MSE) is a standard loss function for autoencoders performing image reconstruction. It calculates the average squared difference between the pixel values of the original and reconstructed images. By penalizing larger errors more severely, MSE encourages the model to capture the general structure and color distribution effectively. However, it operates on a pixel-by-pixel basis, which can sometimes lead to blurry reconstructions as the model 'averages' out high-frequency details to minimize the overall error.

**Alternative Metrics:**
*   **MAE (Mean Absolute Error):** Calculates the average absolute difference. It is less sensitive to outliers than MSE and can sometimes result in slightly sharper edges, though it still doesn't explicitly capture structural information.
*   **SSIM (Structural Similarity Index):** A perceptual metric that measures the similarity between two images based on luminance, contrast, and structure. Unlike pixel-wise metrics (MSE/MAE), SSIM correlates better with human perception of image quality. A high SSIM indicates that the reconstructed image preserves the structural information of the original well.

**Autoencoder Construction Summary:**
The model is a Convolutional Autoencoder designed for 64x64 RGB images.
*   **Encoder:** Compresses the input into a lower-dimensional latent representation using a series of `Conv2D` layers (relu activation) followed by `MaxPooling2D` for downsampling.
*   **Decoder:** Reconstructs the image from the latent representation using `Conv2D` layers and `UpSampling2D` to recover the spatial dimensions, ending with a `sigmoid` activation to output pixel values in the [0, 1] range.

**Assessment of Image Quality:**
Visually, the reconstructed images likely capture the global facial features, pose, and skin tone of the original images. However, fine details (like individual hair strands or skin texture) might be smoothed out or slightly blurry. This is typical for simple autoencoders trained with MSE, as the bottleneck forces the model to prioritize the most significant features required to approximate the input.

I'm curious to try SSIM loss function if I'll have time.

# Task
Use `training_block_2` to create a new `train_gen_transfer` data generator with the same settings as before (RGB, 64x64, sparse). Then, build a transfer learning model by extracting the encoder layers from the trained `autoencoder` (up to the bottleneck), followed by a `Flatten` layer, a `Dense` layer with 64 units (ReLU), and a final `Dense` output layer (Softmax). Freeze the encoder layers so their weights do not change during training. Train this new model for 10 epochs using `train_gen_transfer` and the existing `val_gen_base`. Finally, evaluate the model on `test_gen_base` and generate a bar chart comparing its accuracy and loss against the original RGB baseline model (available in `results_channels['rgb']`).

## Prepare Data Generators for Block 2

### Subtask:
Create a data generator for the second training block to be used for transfer learning.


**Reasoning**:
Create the `train_gen_transfer` generator using the second training block dataframe and the existing datagen object.



In [None]:
print("Setting up Transfer Learning Generator (Block 2)...")
train_gen_transfer = datagen.flow_from_dataframe(
    dataframe=training_block_2,
    x_col='file',
    y_col='age_bins',
    target_size=(64, 64),
    batch_size=32,
    class_mode='sparse',
    color_mode='rgb'
)

print("Transfer learning generator initialized.")

## Build Transfer Learning Model

### Subtask:
Construct the transfer learning model by extracting the encoder from the autoencoder, adding classification layers, and freezing the encoder weights.


**Reasoning**:
Extract the encoder part from the pre-trained autoencoder, freeze its layers, and add classification layers to create a transfer learning model.



In [None]:
# 1. Extract Encoder
# The bottleneck is at index 6 (MaxPooling2D after 3rd Conv block)
encoder_output = autoencoder.layers[6].output
encoder_model = keras.Model(inputs=autoencoder.input, outputs=encoder_output, name='encoder')

# 2. Freeze Encoder
encoder_model.trainable = False

# 3. Build Transfer Learning Model
model_transfer = models.Sequential([
    encoder_model,
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(8, activation='softmax')
], name='transfer_learning_model')

# 4. Compile
model_transfer.compile(optimizer='adam',
                       loss='sparse_categorical_crossentropy',
                       metrics=['accuracy'])

# 5. Summary
model_transfer.summary()

**Reasoning**:
Train the transfer learning model (or load if available), evaluate it on the test set, and visualize the performance comparison against the baseline RGB model.



In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pickle
from tensorflow import keras

# 1. Train or Load Model
model_filename = os.path.join(models_dir, 'model_transfer.keras')
history_filename = os.path.join(models_dir, 'history_transfer.pkl')

if os.path.exists(model_filename) and os.path.exists(history_filename):
    print("Loading existing transfer learning model and history...")
    model_transfer = keras.models.load_model(model_filename)
    with open(history_filename, 'rb') as f:
        history_transfer = pickle.load(f)
else:
    print("Training transfer learning model...")
    history_obj = model_transfer.fit(
        train_gen_transfer,
        epochs=10,
        validation_data=val_gen_base,
        verbose=1
    )
    history_transfer = history_obj.history

    # Save
    model_transfer.save(model_filename)
    with open(history_filename, 'wb') as f:
        pickle.dump(history_transfer, f)
    print("Saved transfer learning model and history.")

# 2. Evaluate on Test Data
print("Evaluating transfer learning model on test data...")
test_loss_transfer, test_acc_transfer = model_transfer.evaluate(test_gen_base, verbose=0)
print(f"Transfer Model - Test Accuracy: {test_acc_transfer:.4f}")
print(f"Transfer Model - Test Loss: {test_loss_transfer:.4f}")

# 3. Comparison Visualization
print(f"\n{'='*60}\nComparison: Transfer Learning vs Baseline RGB\n{'='*60}")

# Retrieve Baseline Results
# Assuming results_channels['rgb'] exists from previous steps
if 'results_channels' in locals() and 'rgb' in results_channels:
    baseline_acc = results_channels['rgb']['acc']
    baseline_loss = results_channels['rgb']['loss']
else:
    # Fallback if variable lost, using hardcoded approximate values or re-evaluating would be needed,
    # but assuming context is preserved.
    print("Warning: Baseline results not found in memory. Using placeholders if necessary.")
    baseline_acc = 0.0
    baseline_loss = 0.0

labels = ['Baseline RGB', 'Transfer Learning']
accuracies = [baseline_acc, test_acc_transfer]
losses = [baseline_loss, test_loss_transfer]

x = np.arange(len(labels))
width = 0.35

plt.figure(figsize=(10, 6))
rects1 = plt.bar(x - width/2, accuracies, width, label='Test Accuracy')
rects2 = plt.bar(x + width/2, losses, width, label='Test Loss')

plt.ylabel('Scores')
plt.title('Test Metrics: Baseline vs Transfer Learning')
plt.xticks(x, labels)
plt.legend()
plt.bar_label(rects1, padding=3, fmt='%.4f')
plt.bar_label(rects2, padding=3, fmt='%.4f')
plt.show()

# 4. Plot Learning Curves for Transfer Model
plt.figure(figsize=(14, 5))

# Accuracy
plt.subplot(1, 2, 1)
plt.plot(history_transfer['accuracy'], label='Train Accuracy')
plt.plot(history_transfer['val_accuracy'], linestyle='--', label='Val Accuracy')
plt.title('Transfer Learning Model - Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)

# Loss
plt.subplot(1, 2, 2)
plt.plot(history_transfer['loss'], label='Train Loss')
plt.plot(history_transfer['val_loss'], linestyle='--', label='Val Loss')
plt.title('Transfer Learning Model - Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)

plt.tight_layout()
plt.show()

Haha, it didn't work out the way I thought it will. I mean, it's around the same model? We've taken only encoder, so we didn't attempt to reconstruct the image, so we didn't lose some information on it. I wonder why results for transfer learning a worse then? Only because there were less examples to train on (block 2 only)?

Asking Gemini.


It says that additionaly we've compiled autoencoder with 'MSE' loss function as opposed to 'sparse_categorical_crossentropy' for the original model. That seems like it might bu true, right.

## Backbone Model Re-Use