## Age Prediction From Facial Images (CNN Regression)

Given *images of people ages 20-25, let's try to predict the **age** of the person in a given image. 

We will use a TensorFlow/Keras CNN to make our predictions. 

Data source: https://www.kaggle.com/datasets/mariafrenti/age-prediction?resource=download-directory

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
from pathlib import Path
import os.path

from sklearn.model_selection import train_test_split

import tensorflow as tf

from sklearn.metrics import r2_score

2025-03-22 10:32:29.237541: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


In [2]:
image_dir = Path("archive/20-50/20-50")

### Create File DataFrame

In [17]:
filepaths = pd.Series(list(image_dir.glob(r'**/*.jpg')), name='Filepath').astype(str)
filepaths

0        archive/20-50/20-50/train/50/101453.jpg
1        archive/20-50/20-50/train/50/124365.jpg
2        archive/20-50/20-50/train/50/161895.jpg
3        archive/20-50/20-50/train/50/147106.jpg
4        archive/20-50/20-50/train/50/120820.jpg
                          ...                   
40435      archive/20-50/20-50/test/30/43585.jpg
40436      archive/20-50/20-50/test/30/40997.jpg
40437      archive/20-50/20-50/test/30/42987.jpg
40438      archive/20-50/20-50/test/30/41229.jpg
40439      archive/20-50/20-50/test/30/41971.jpg
Name: Filepath, Length: 40440, dtype: object

In [8]:
os.path.split(os.path.split(filepaths.values[0])[0])[1]

'50'

In [13]:
ages = pd.Series(filepaths.apply(lambda x: os.path.split(os.path.split(x)[0])[1]), name='Age').astype(int)
ages

0        50
1        50
2        50
3        50
4        50
         ..
40435    30
40436    30
40437    30
40438    30
40439    30
Name: Age, Length: 40440, dtype: int64

In [21]:
images = pd.concat([filepaths, ages], axis=1).sample(frac=1.0, random_state=1).reset_index(drop=True)
images

Unnamed: 0,Filepath,Age
0,archive/20-50/20-50/train/36/157214.jpg,36
1,archive/20-50/20-50/train/22/151160.jpg,22
2,archive/20-50/20-50/train/20/142010.jpg,20
3,archive/20-50/20-50/train/41/170361.jpg,41
4,archive/20-50/20-50/train/35/159922.jpg,35
...,...,...
40435,archive/20-50/20-50/train/34/147485.jpg,34
40436,archive/20-50/20-50/train/30/174724.jpg,30
40437,archive/20-50/20-50/train/40/172530.jpg,40
40438,archive/20-50/20-50/train/44/170297.jpg,44


In [22]:
# Let's only use 5000 images to speed up training time
image_df = images.sample(5000, random_state=1).reset_index(drop=True)
image_df

Unnamed: 0,Filepath,Age
0,archive/20-50/20-50/train/30/178764.jpg,30
1,archive/20-50/20-50/train/28/146677.jpg,28
2,archive/20-50/20-50/train/36/153656.jpg,36
3,archive/20-50/20-50/train/49/161853.jpg,49
4,archive/20-50/20-50/train/23/162384.jpg,23
...,...,...
4995,archive/20-50/20-50/train/28/148801.jpg,28
4996,archive/20-50/20-50/train/39/154481.jpg,39
4997,archive/20-50/20-50/train/35/175360.jpg,35
4998,archive/20-50/20-50/train/42/154401.jpg,42


In [23]:
train_df, test_df = train_test_split(image_df, train_size=0.7, shuffle=True, random_state=1)

In [25]:
len(train_df), len(test_df)

(3500, 1500)

### Loading Images

In [26]:
train_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale = 1./255,
    validation_split=0.2
)

test_gen = tf.keras.preprocessing.image.ImageDataGenerator(
    rescale = 1./255
)

In [28]:
train_images = train_gen.flow_from_dataframe(
    dataframe = train_df,
    x_col = 'Filepath',
    y_col = 'Age',
    target_size = (120, 120),
    color_mode = 'rgb',
    class_mode = 'raw',
    batch_size = 32,
    shuffle = True,
    seed = 42,
    subset = 'training'
)

val_images = train_gen.flow_from_dataframe(
    dataframe = train_df,
    x_col = 'Filepath',
    y_col = 'Age',
    target_size = (120, 120),
    color_mode = 'rgb',
    class_mode = 'raw',
    batch_size = 32,
    shuffle = True,
    seed = 42,
    subset = 'validation'
)

test_images = test_gen.flow_from_dataframe(
    dataframe = test_df,
    x_col = 'Filepath',
    y_col = 'Age',
    target_size = (120, 120),
    color_mode = 'rgb',
    class_mode = 'raw',
    batch_size = 32,
    shuffle = False
)

Found 2800 validated image filenames.
Found 700 validated image filenames.
Found 1500 validated image filenames.


### Training

In [42]:
inputs = tf.keras.Input(shape=(120, 120, 3))
x = tf.keras.layers.Conv2D(filters=16, kernel_size=(3,3), activation='relu')(inputs)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu')(x)
x = tf.keras.layers.MaxPool2D()(x)
x = tf.keras.layers.GlobalAveragePooling2D()(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
outputs = tf.keras.layers.Dense(1, activation='linear')(x)

model = tf.keras.Model(inputs=inputs, outputs=outputs)

model.compile(
    optimizer = 'adam',
    loss = 'mse'
)

history = model.fit(
    train_images,
    validation_data = val_images,
    epochs=100,
    callbacks = [
        tf.keras.callbacks.EarlyStopping(
            monitor = 'val_loss',
            patience = 5,
            restore_best_weights = True
        )
    ]
)

Epoch 1/100


2025-03-22 13:03:53.414345: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]




2025-03-22 13:04:10.209971: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]


Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100


In [36]:
inputs, x

(<KerasTensor: shape=(None, 120, 120, 3) dtype=float32 (created by layer 'input_3')>,
 <KerasTensor: shape=(None, 118, 118, 16) dtype=float32 (created by layer 'conv2d_3')>)

In [39]:
x

<KerasTensor: shape=(None, 28, 28, 32) dtype=float32 (created by layer 'max_pooling2d_3')>

In [40]:
tf.keras.layers.Flatten()(x)

<KerasTensor: shape=(None, 25088) dtype=float32 (created by layer 'flatten')>

### Results

In [43]:
predicted_ages = np.squeeze(model.predict(test_images))

 1/47 [..............................] - ETA: 8s

2025-03-22 13:22:01.988122: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]




In [44]:
true_ages = test_images.labels
rmse = np.sqrt(model.evaluate(test_images, verbose=0))
print("Test RMSE: {:.5f}".format(rmse))

r2 = r2_score(true_ages, predicted_ages)
print("Test R2 Score: {:.5f}".format(r2))

2025-03-22 13:25:03.183849: I tensorflow/core/common_runtime/executor.cc:1197] [/device:CPU:0] (DEBUG INFO) Executor start aborting (this does not indicate an error and you can ignore this message): INVALID_ARGUMENT: You must feed a value for placeholder tensor 'Placeholder/_0' with dtype int32
	 [[{{node Placeholder/_0}}]]


Test RMSE: 8.91531
Test R2 Score: 0.00358


In [45]:
np.mean(true_ages)

34.660666666666664

In [47]:
np.sqrt(np.sum((true_ages - np.mean(true_ages))**2)/len(true_ages))

8.93130372466541