# Gender Classification MiniProject

This mini-project utilizes the pre-trained EfficientNetB3 model to construct a Convolutional Neural Network (CNN) for gender recognition through images. This is an intriguing application of artificial intelligence in classifying and predicting vital information from image data. The objective of this project is to create an automated system capable of accurately determining the gender of individuals in images.

## Step 1: Prepare data

**Data source:** https://www.kaggle.com/datasets/ashishjangra27/gender-recognition-200k-images-celeba

**Data preprocessing:** Convert photos into digital format and re-size it to 64x64 to fit current configuration and model EfficientNetB3. Only use images from the train folder and test folder. I will mix them all and split them again later with my ratio

In [1]:
# Import the necessary libraries for data preprocessing
import matplotlib.pyplot as plt
import os
import numpy as np
from PIL import Image

### Preprecessing Female images

In [2]:
# Reading image folder path
folders = [
    "/kaggle/input/gender-recognition-200k-images-celeba/Dataset/Train/Female",
    "/kaggle/input/gender-recognition-200k-images-celeba/Dataset/Test/Female"
]

female_images_paths = []

for folder in folders:
    file_names = os.listdir(folder)
    sorted_file_names = sorted(file_names, key=lambda x: int(''.join(filter(str.isdigit, x))))
    image_paths = [os.path.join(folder, filename) for filename in sorted_file_names]
    female_images_paths.extend(image_paths)

len(female_images_paths)

104387

In [3]:
# Convert and re-size images
female_data_x = []

for image_path in female_images_paths:
    img = Image.open(image_path)
    img = img.resize((64, 64))
    img = np.array(img, dtype="uint8")
    female_data_x.append(img)

print("number of picture:", len(female_data_x))

number of picture: 104387


In [4]:
# Create labels for images (female will be labeled as 1)
female_data_y = np.ones(len(female_data_x))
len(female_data_y)

104387

### Preprecessing Male images

In [5]:
# Reading image folder path
folders = [
    "/kaggle/input/gender-recognition-200k-images-celeba/Dataset/Train/Male",
    "/kaggle/input/gender-recognition-200k-images-celeba/Dataset/Test/Male"
]

male_images_paths = []

for folder in folders:
    file_names = os.listdir(folder)
    sorted_file_names = sorted(file_names, key=lambda x: int(''.join(filter(str.isdigit, x))))
    image_paths = [os.path.join(folder, filename) for filename in sorted_file_names]
    male_images_paths.extend(image_paths)

len(male_images_paths)

75614

In [6]:
# Convert and re-size images
male_data_x = []

for image_path in male_images_paths:
    img = Image.open(image_path)
    img = img.resize((64, 64))
    img = np.array(img, dtype="uint8")
    male_data_x.append(img)

print("number of picture:", len(male_data_x))

number of picture: 75614


In [7]:
# Create labels for images (male will be labeled as 0)
male_data_y = np.zeros(len(male_data_x))
len(male_data_y)

75614

### Merge data

In [8]:
# Merge features data
df_x = np.concatenate((female_data_x, male_data_x))
len(df_x)

180001

In [9]:
# Merge labels data
df_y = np.concatenate((female_data_y, male_data_y))
len(df_y)

180001

### Split data

In [10]:
from sklearn.model_selection import train_test_split

# 90% of the data will be used for training, the remaining 10% will be used for testing
train_data, val_data, train_labels, val_labels = train_test_split(df_x, df_y, test_size=0.1, random_state=42)

print('lenght train_data:', len(train_data))
print('lenght val_data:', len(val_data))
print('lenght train_labels:', len(train_labels))
print('lenght val_labels:', len(val_labels))



## Step 2: Training model

**Pre-trained model:** EfficientNetB3

**Optimization algorithm:** Adam

**Epoch:** 5

In [11]:
# Run this if this is the first time run this notebook
!pip install tensorflow==2.9

Collecting tensorflow==2.9
  Downloading tensorflow-2.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (511.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m511.7/511.7 MB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting flatbuffers<2,>=1.12 (from tensorflow==2.9)
  Downloading flatbuffers-1.12-py2.py3-none-any.whl (15 kB)
Collecting keras<2.10.0,>=2.9.0rc0 (from tensorflow==2.9)
  Downloading keras-2.9.0-py2.py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m52.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting keras-preprocessing>=1.1.1 (from tensorflow==2.9)
  Downloading Keras_Preprocessing-1.1.2-py2.py3-none-any.whl (42 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m42.6/42.6 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
Collecting tensorboard<2.10,>=2.9 (from tensorflow==2.9)
  Downloading tensorboard-2.9.1-py3-none-any.whl (5.8 MB)
[2K     [90m

In [12]:
# Import the necessary libraries for training model
from tensorflow.keras import datasets, layers, models
from tensorflow.keras.optimizers.legacy import Adam
from tensorflow.keras.optimizers import RMSprop

import tensorflow as tf
from sklearn.model_selection import train_test_split
from tensorflow.keras.preprocessing import image
import tensorflow.keras.optimizers as optimizers
from tensorflow.keras.applications.efficientnet import preprocess_input

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint

### ImageDataGenerator to augment data for training

In [13]:
datagen = ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    vertical_flip=False,
    fill_mode='nearest'
)

batch_size = 512
augmented_data_generator = datagen.flow(train_data, train_labels, batch_size=batch_size)

### Setup to save the model with the best results

In [14]:
checkpoint = ModelCheckpoint(
    'best_model_checkpoint.h5',
    monitor='val_accuracy',
    save_best_only=True,
    mode='max',
    verbose=1
)

### Build structure for the model

In [15]:
pre_model = models.Sequential()

base_model = tf.keras.applications.EfficientNetB3(
              include_top= False,
              weights="imagenet",
)


for layer in base_model.layers:
    layer.trainable = True

pre_model.add(base_model)
pre_model.add(layers.GlobalAveragePooling2D())
pre_model.add(layers.Flatten())
pre_model.add(layers.Dense(256, activation='relu'))
pre_model.add(layers.Dropout(0.2))
pre_model.add(layers.Dense(128, activation='relu'))
pre_model.add(layers.Dropout(0.2))
pre_model.add(layers.Dense(1, activation='sigmoid'))

pre_model.summary()

Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 efficientnetb3 (Functional)  (None, None, None, 1536)  10783535 
                                                                 
 global_average_pooling2d (G  (None, 1536)             0         
 lobalAveragePooling2D)                                          
                                                                 
 flatten (Flatten)           (None, 1536)              0         
                                                                 
 dense (Dense)               (None, 256)               393472    
                                                                 
 dropout (Dropout)           (None, 256)               0         
                                                                 
 dense_1 (Dense)          

### Fitting data for the model for training

In [16]:
pre_model.compile(optimizer= Adam(learning_rate=0.0001), loss = 'binary_crossentropy', metrics = ['accuracy'])
history = pre_model.fit(
    augmented_data_generator,
    validation_data=(val_data, val_labels),
    epochs=5,
    batch_size=512,
    callbacks=[checkpoint]
)

Epoch 1/5
Epoch 1: val_accuracy improved from -inf to 0.93456, saving model to best_model_checkpoint.h5
Epoch 2/5
Epoch 2: val_accuracy improved from 0.93456 to 0.94950, saving model to best_model_checkpoint.h5
Epoch 3/5
Epoch 3: val_accuracy improved from 0.94950 to 0.95672, saving model to best_model_checkpoint.h5
Epoch 4/5
Epoch 4: val_accuracy improved from 0.95672 to 0.96078, saving model to best_model_checkpoint.h5
Epoch 5/5
Epoch 5: val_accuracy improved from 0.96078 to 0.96506, saving model to best_model_checkpoint.h5
