## Age and Gender Recognition 

Age and gender classification from facial images is an important task in computer vision, with various applications in advertising, healthcare and security. 

This notebook will focus on using the UTKFace dataset, a diverse collection of over 20,000 facial images, annotated with age, gender and ethnicity. 
Dataset: 
https://www.kaggle.com/datasets/jangedoo/utkface-new/code

Here, I will employ convolutional neural networks (CNNs), which have demonstrated great performance in image classification tasks, to predict age and gender from these types of images. The goal of this exercise is to develop, and optimise a model that accurately classifies age and gender, whilst addressing the shortcomings involved. 


## Import Modules

First, import the necessary libraries for data processing, model building and evaluation. 

In [18]:
import pandas as pd
import numpy as np
import os 
import warnings
import matplotlib.pyplot as plt 
import seaborn as sns 
from tqdm.notebook import tqdm 

warnings.filterwarnings('ignore')
%matplotlib inline

import tensorflow as tf 
from tensorflow.keras.preprocessing.image import load_img
from keras.models import Sequential, Model 
from keras.layers import Dense, Conv2D, Dropout, Flatten, MaxPooling2D, Input

## Load the Dataset 

First, t

In [2]:
import opendatasets as od 
dataset = 'https://www.kaggle.com/datasets/jangedoo/utkface-new/data'

Downloading the dataset from https://www.kaggle.com/datasets/jangedoo/utkface-new/data

In [3]:
od.download(dataset)

Skipping, found downloaded files in ".\utkface-new" (use force=True to force download)


In [11]:
DATA_DIR = './utkface-new/UTKFace/'

In [15]:
# Initialize lists to store image paths, ages, and genders
image_files = []
ages = []
genders = []

# Process each file in the directory
for file in tqdm(os.listdir(DATA_DIR)):
    full_path = os.path.join(DATA_DIR, file)  # Get full file path
    parts = file.split('_')  # Split filename based on the underscore delimiter
    image_age = int(parts[0])  # Age is the first part
    image_gender = int(parts[1])  # Gender is the second part

    # Append extracted data to respective lists
    image_files.append(full_path)
    ages.append(image_age)
    genders.append(image_gender)


100%|██████████| 23708/23708 [00:00<00:00, 192705.53it/s]


In [16]:
#Convert to dataframe 
df = pd.DataFrame()
df['image'], df['age'], df['gender'] = image_paths, age_labels, gender_labels
df.head()

Unnamed: 0,image,age,gender
0,./utkface-new/UTKFace/100_0_0_2017011221350090...,100,0
1,./utkface-new/UTKFace/100_0_0_2017011221524034...,100,0
2,./utkface-new/UTKFace/100_1_0_2017011018372639...,100,1
3,./utkface-new/UTKFace/100_1_0_2017011221300198...,100,1
4,./utkface-new/UTKFace/100_1_0_2017011221330369...,100,1
