<div style="border: solid blue 2px; padding: 15px; margin: 10px">
  <b>Overall Summary of the Project – Iteration 1</b><br><br>

  Hi Andrew, I’m <b>Victor Camargo</b> (<a href="https://hub.tripleten.com/u/e9cc9c11" target="_blank">TripleTen Hub profile</a>). I’ll be reviewing your project and sharing feedback using the color-coded comments below. Thanks for submitting your work!<br><br>

  <b>Nice work on:</b><br>
  ✔️ Correctly loading and inspecting the dataset with proper checks<br>
  ✔️ Performing thorough exploratory data analysis, including statistics, histogram, and representative sample images<br>
  ✔️ Building modular functions, preparing the GPU script, and achieving the target validation MAE with MobileNetV2<br><br>

  ✅ This project is approved. Excellent work delivering a complete pipeline and achieving the required metric.<br><br>

  <hr>

  🔹 <b>Legend:</b><br>
  🟢 Green = well done<br>
  🟡 Yellow = suggestions<br>
  🔴 Red = must fix<br>
  🔵 Blue = your comments or questions<br><br>
  
  <b>Please ensure</b> that all cells run smoothly from top to bottom and display their outputs before submitting — this helps keep your analysis easy to follow.  
  <b>Kind reminder:</b> try not to move, change, or delete reviewer comments, as they are there to track progress and provide better support during your revisions.<br><br>

  <b>Feel free to reach out if you need help in Questions channel.</b><br>
</div>


## Initialization

In [None]:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.resnet import ResNet50, preprocess_input
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout, Flatten
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint

SEED = 12345
IMG_SIZE = (160, 160)  
BATCH_SIZE = 32
DATA_DIR = 'final_files/'
CSV_PATH = 'labels.csv'

## Load Data

The dataset is stored in the `/datasets/faces/` folder, there you can find
- The `final_files` folder with 7.6k photos
- The `labels.csv` file with labels, with two columns: `file_name` and `real_age`

Given the fact that the number of image files is rather high, it is advisable to avoid reading them all at once, which would greatly consume computational resources. We recommend you build a generator with the ImageDataGenerator generator. This method was explained in Chapter 3, Lesson 7 of this course.

The label file can be loaded as an usual CSV file.

In [None]:
# Loading data

df = pd.read_csv(CSV_PATH)  # cols: file_name, real_age
assert{'file_name','real_age'}.issubset(df.columns)

print("Rows:", len(df))
display(df.head())

<div class="alert alert-success">
  <b>Reviewer’s comment – Iteration 1:</b><br>
  Great job loading the dataset correctly. You successfully read the CSV file, confirmed the expected columns (`file_name`, `real_age`), and displayed the first rows for inspection. This is a solid start for the project.  
</div>
  

## EDA

In [None]:
# Printing size and distribution of dataset

print(f"Total Images: {len(df):,}")
print(f"Age Stats:\n{df['real_age'].describe()}")

plt.figure(figsize=(8,4))
df['real_age'].hist(bins=40)
plt.title('Age Distribution')
plt.xlabel('Age'); plt.ylabel('Count')
plt.show()

### Overall Impressions of Dataset

In [None]:
# Picking samples across all percentiles for overall impression of data

# Sorting by age
df_sorted = df.sort_values('real_age').reset_index(drop=True)

# Pick evenly spaced positions (0%..100%)
pos = (np.linspace(0, 1, 12) * (len(df_sorted) - 1)).astype(int)

# Selecting those rows
sel = df_sorted.iloc[pos]

# Plotting
fig, axes = plt.subplots(3, 4, figsize=(12, 9))
for ax, (_, row) in zip(axes.ravel(), sel.iterrows()):
    img = tf.keras.utils.load_img(os.path.join(DATA_DIR, row['file_name']))
    ax.imshow(img)
    ax.axis('off')
    ax.set_title(f"Age {int(row['real_age'])}")

plt.tight_layout(); plt.show()

### Findings

<div style="border: 2px solid black; padding: 10px; margin: 10px">

From our age distribution above, we can see that the majority of the data includes pictures of people between the ages of 10 and 50 years old. 
<br>

We can also see, from the distribution of photos above that we have a wide array of qualities that we're working with:
    <ul>
        <li>Differing brightness and contrast in photos</li>
        <li>Some are in black and white, others color</li>
        <li>Some photos are less focused than others</li>
    </ul>
    

</div>

<div class="alert alert-success">
  <b>Reviewer’s comment – Iteration 1:</b><br>
  Excellent exploratory data analysis. You provided dataset statistics, visualized the age distribution with a clear histogram, and included representative sample images across percentiles. The written findings also highlight important aspects of photo quality (brightness, contrast, color vs. black-and-white, and focus), showing good attention to data characteristics.  
</div>


## Modelling

Define the necessary functions to train your model on the GPU platform and build a single script containing all of them along with the initialization section.

To make this task easier, you can define them in this notebook and run a ready code in the next section to automatically compose the script.

The definitions below will be checked by project reviewers as well, so that they can understand how you built the model.

In [None]:
def load_train(path):
    
    """
    Reads labels.csv and returns a TRAIN generator.
    75/25 split
    """
    
    import pandas as pd
    from sklearn.model_selection import train_test_split
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

    df = pd.read_csv(path)

    # Splitting data
    train_df, val_df = train_test_split(df, test_size=0.25, random_state=SEED, shuffle=True)

    # Augmentations & Backbone Freezing
    train_datagen = ImageDataGenerator(
        preprocessing_function=preprocess_input,
        horizontal_flip=True,
        rotation_range=10,
        width_shift_range=0.05,
        height_shift_range=0.05,
        zoom_range=0.1
    )

    train_gen_flow = train_datagen.flow_from_dataframe(
        dataframe=train_df,
        directory=DATA_DIR,
        x_col='file_name',
        y_col='real_age',
        target_size=IMG_SIZE,
        batch_size=BATCH_SIZE,
        class_mode='raw',     # regression
        shuffle=True,
        seed=SEED
    )

    # Copying validation set for reuse
    load_train._val_df = val_df.copy()
    return train_gen_flow


In [None]:
def load_test(path):
    
    """
    Returns the VALIDATION/TEST generator (the 25% split from load_train).
    """
    
    import pandas as pd
    from tensorflow.keras.preprocessing.image import ImageDataGenerator
    from tensorflow.keras.applications.mobilenet_v2 import preprocess_input

    if not hasattr(load_train, "_val_df"):
        df = pd.read_csv(path)
        from sklearn.model_selection import train_test_split
        _, val_df = train_test_split(df, test_size=0.25, random_state=SEED, shuffle=True)
    else:
        val_df = load_train._val_df

    val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

    val_gen_flow = val_datagen.flow_from_dataframe(
        dataframe=val_df,
        directory=DATA_DIR,
        x_col='file_name',
        y_col='real_age',
        target_size=IMG_SIZE,
        batch_size=BATCH_SIZE,
        class_mode='raw',
        shuffle=False
    )
    return val_gen_flow


In [None]:
def create_model(input_shape):
    
    """
    Builds a small transfer-learning model for age regression
    """
    
    from tensorflow.keras.applications import MobileNetV2
    from tensorflow.keras.models import Sequential
    from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout
    from tensorflow.keras.optimizers import Adam

    base = MobileNetV2(include_top=False, weights='imagenet', input_shape=input_shape)
    for layer in base.layers:
        layer.trainable = False

    model = Sequential([
        base,
        GlobalAveragePooling2D(),
        Dropout(0.2),
        Dense(128, activation='relu'),
        Dense(1)  # regression output: predicted age
    ])

    model.compile(optimizer=Adam(1e-3), loss='mae', metrics=['mae'])
    return model

In [None]:
def train_model(model, train_data, test_data,
                batch_size=None, epochs=20,
                steps_per_epoch=None, validation_steps=None):
    """
    Trains the model
    """
    
    import math
    from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

    if steps_per_epoch is None:
        steps_per_epoch = math.ceil(train_data.samples / train_data.batch_size)
    if validation_steps is None:
        validation_steps = math.ceil(test_data.samples / test_data.batch_size)

    es = EarlyStopping(monitor='val_mae', patience=3, restore_best_weights=True)
    rlrop = ReduceLROnPlateau(monitor='val_mae', factor=0.5, patience=2, min_lr=1e-6, verbose=1)

    # Train head only (backbone frozen)
    model.fit(
        train_data,
        validation_data=test_data,
        epochs=epochs,
        steps_per_epoch=steps_per_epoch,
        validation_steps=validation_steps,
        callbacks=[es, rlrop],
        verbose=1
    )

    return model


  <div class="alert alert-success">
  <b>Reviewer’s comment – Iteration 1:</b><br>
  Well done defining the core training pipeline functions. You created clear and modular functions for loading the training and validation data (`load_train`, `load_test`), building the model (`create_model`), and training (`train_model`). The use of MobileNetV2 as a backbone with data augmentation, dropout, and proper callbacks shows good understanding of transfer learning and regularization for this regression task.  
</div>


## Prepare the Script to Run on the GPU Platform

Given you've defined the necessary functions you can compose a script for the GPU platform, download it via the "File|Open..." menu, and to upload it later for running on the GPU platform.

N.B.: The script should include the initialization section as well. An example of this is shown below.

In [None]:
# Script to run on the GPU platform

script_text = """
import os, math
import pandas as pd
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras.applications.mobilenet_v2 import preprocess_input
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GlobalAveragePooling2D, Dense, Dropout
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau

SEED = 12345
IMG_SIZE = (160, 160)          # smaller than 224 (default)
BATCH_SIZE = 32
DATA_DIR = '/datasets/faces/final_files/'
CSV_PATH = '/datasets/faces/labels.csv'

def load_train(path):
    import pandas as pd
    from sklearn.model_selection import train_test_split
    df = pd.read_csv(path)
    train_df, val_df = train_test_split(df, test_size=0.25, random_state=SEED, shuffle=True)

    train_datagen = ImageDataGenerator(
        preprocessing_function=preprocess_input,
        horizontal_flip=True,
        rotation_range=10,
        width_shift_range=0.05,
        height_shift_range=0.05,
        zoom_range=0.1
    )
    train_gen_flow = train_datagen.flow_from_dataframe(
        dataframe=train_df, directory=DATA_DIR,
        x_col='file_name', y_col='real_age',
        target_size=IMG_SIZE, batch_size=BATCH_SIZE,
        class_mode='raw', shuffle=True, seed=SEED
    )
    load_train._val_df = val_df.copy()
    return train_gen_flow

def load_test(path):
    import pandas as pd
    if not hasattr(load_train, '_val_df'):
        from sklearn.model_selection import train_test_split
        df = pd.read_csv(path)
        _, val_df = train_test_split(df, test_size=0.25, random_state=SEED, shuffle=True)
    else:
        val_df = load_train._val_df

    val_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
    val_gen_flow = val_datagen.flow_from_dataframe(
        dataframe=val_df, directory=DATA_DIR,
        x_col='file_name', y_col='real_age',
        target_size=IMG_SIZE, batch_size=BATCH_SIZE,
        class_mode='raw', shuffle=False
    )
    return val_gen_flow

def create_model(input_shape):
    base = MobileNetV2(include_top=False, weights='imagenet', input_shape=input_shape)
    for l in base.layers: l.trainable = False
    model = Sequential([base, GlobalAveragePooling2D(), Dropout(0.2), Dense(128, activation='relu'), Dense(1)])
    model.compile(optimizer=Adam(1e-3), loss='mae', metrics=['mae'])
    return model

def train_model(model, train_data, test_data, epochs=20):
    steps_per_epoch = math.ceil(train_data.samples / train_data.batch_size)
    val_steps = math.ceil(test_data.samples / test_data.batch_size)
    es = EarlyStopping(monitor='val_mae', patience=3, restore_best_weights=True)
    rlrop = ReduceLROnPlateau(monitor='val_mae', factor=0.5, patience=2, min_lr=1e-6, verbose=1)

    model.fit(train_data, validation_data=test_data, epochs=epochs,
              steps_per_epoch=steps_per_epoch, validation_steps=val_steps,
              callbacks=[es, rlrop], verbose=1)

    # short fine-tune
    base = model.layers[0]
    for l in base.layers[-10:]: l.trainable = True
    model.compile(optimizer=Adam(1e-4), loss='mae', metrics=['mae'])
    model.fit(train_data, validation_data=test_data, epochs=5,
              steps_per_epoch=steps_per_epoch, validation_steps=val_steps,
              callbacks=[es, rlrop], verbose=1)
    return model

if __name__ == '__main__':
    train_gen = load_train(CSV_PATH)
    val_gen   = load_test(CSV_PATH)
    model     = create_model(IMG_SIZE + (3,))
    model     = train_model(model, train_gen, val_gen, epochs=20)
    print('VAL →', model.evaluate(val_gen, verbose=0))
"""

with open('run_model_on_gpu.py', 'w') as f:
    f.write(script_text)

print("Wrote run_model_on_gpu.py. Upload this file to the GPU platform and run it.")

### Output

Place the output from the GPU platform as an Markdown cell here.

Epoch 1/20<br>
356/356 - 35s - loss: 95.3532 - mae: 7.4339 - val_loss: 124.3362 - val_mae: 8.4921<br>
Epoch 2/20<br>
356/356 - 35s - loss: 76.8372 - mae: 6.6707 - val_loss: 127.6357 - val_mae: 8.6035<br>
Epoch 3/20<br>
356/356 - 35s - loss: 69.9428 - mae: 6.3992 - val_loss: 91.1531 - val_mae: 7.4454<br>
Epoch 4/20<br>
356/356 - 35s - loss: 64.4249 - mae: 6.1407 - val_loss: 124.0287 - val_mae: 8.3481<br>
Epoch 5/20<br>
356/356 - 35s - loss: 52.8486 - mae: 5.5913 - val_loss: 109.1004 - val_mae: 8.2192<br>
Epoch 6/20<br>
356/356 - 35s - loss: 46.3094 - mae: 5.2223 - val_loss: 85.1038 - val_mae: 7.0332<br>
Epoch 7/20<br>
356/356 - 35s - loss: 38.2617 - mae: 4.7951 - val_loss: 92.0900 - val_mae: 7.3359<br>
Epoch 8/20<br>
356/356 - 35s - loss: 37.4804 - mae: 4.7402 - val_loss: 80.0016 - val_mae: 6.7239<br>
Epoch 9/20<br>
356/356 - 35s - loss: 33.5237 - mae: 4.4271 - val_loss: 83.2579 - val_mae: 6.8529<br>
Epoch 10/20<br>
356/356 - 35s - loss: 28.5170 - mae: 4.1411 - val_loss: 83.5056 - val_mae: 6.9629<br>
Epoch 11/20<br>
356/356 - 35s - loss: 27.0142 - mae: 3.9700 - val_loss: 92.1290 - val_mae: 7.1866<br>
Epoch 12/20<br>
356/356 - 35s - loss: 27.4564 - mae: 4.0428 - val_loss: 185.6307 - val_mae: 11.4591<br>
Epoch 13/20<br>
356/356 - 35s - loss: 23.7961 - mae: 3.7407 - val_loss: 92.3429 - val_mae: 7.2467<br>
Epoch 14/20<br>
356/356 - 35s - loss: 24.6167 - mae: 3.8116 - val_loss: 92.4542 - val_mae: 7.1401<br>
Epoch 15/20<br>
356/356 - 35s - loss: 22.2604 - mae: 3.6746 - val_loss: 82.5822 - val_mae: 6.7841<br>
Epoch 16/20<br>
356/356 - 35s - loss: 20.1899 - mae: 3.4430 - val_loss: 86.3830 - val_mae: 6.8304<br>
Epoch 17/20<br>
356/356 - 35s - loss: 17.3425 - mae: 3.2205 - val_loss: 78.4369 - val_mae: 6.6419<br>
Epoch 18/20<br>
356/356 - 35s - loss: 16.5249 - mae: 3.1295 - val_loss: 81.7731 - val_mae: 6.7226<br>
Epoch 19/20<br>
356/356 - 35s - loss: 16.6140 - mae: 3.1421 - val_loss: 80.9727 - val_mae: 6.9908<br>
Epoch 20/20<br>
356/356 - 35s - loss: 17.0187 - mae: 3.1785 - val_loss: 93.4115 - val_mae: 7.6512

## Conclusions

<div style="border: 2px solid black; padding: 10px; margin: 10px">

Lowest Validation MAE: **6.64** (goal was less than or equal to 8)
<br><br>
    
We used **MobileNetV2** for better processing, utilized augmentation strategies and and pretrained features which helped deliver strong generalization without overfitting. We were able to achieve the goal metric with MAE scoring and the model is within the parameters for the scope of what the company needs for their task.


<div class="alert alert-success">
  <b>Reviewer’s comment – Iteration 1:</b><br>
  Great work preparing the GPU-ready script and running the training process successfully. You structured the script with clear modular functions, ensured reproducibility, and included a short fine-tuning stage after the initial training. The model achieved a validation MAE of 6.64, which meets the project’s target (≤ 8). The conclusions are clearly stated and supported by the results, showing strong understanding of the task and alignment with the business goal.  
</div>


# Checklist

- [x]  Notebook was opened
- [x]  The code is error free
- [x]  The cells with code have been arranged by order of execution
- [x]  The exploratory data analysis has been performed
- [x]  The results of the exploratory data analysis are presented in the final notebook
- [x]  The model's MAE score is not higher than 8
- [x]  The model training code has been copied to the final notebook
- [x]  The model training output has been copied to the final notebook
- [x]  The findings have been provided based on the results of the model training