# Predicting Ethnicities, Age, and Gender with Neural Networks

## Table of Contents:
1. *Introduction*
2. *Data Preperation*
    * 2.1 Data Cleaning/Preparation
    * 2.2 Data Loading
    * 2.3 Data Preprocessing
    * 2.4 Data Exploration
3. *Model Architecture*
    * 3.1 Neural Network Design
    * 3.2 Model Compilation
4. *Model Training*
    * 4.1 Training Process
5. *Model Evaluation*
    * 5.1 Model Performance
    * 5.2 Confusion Matrix
6. *Model Deployment*
    * 6.1 Model Saving
7. *Conclusion*
8. *References*


## 1. Introduction:
In this notebook, we will explore the process of building a neural network model to predict ethnicities, age, and gender based on certain features. Predicting ethnicities, age, and gender can have various applications, including demographic analysis, social studies, and more. We will follow a step-by-step approach, covering data preprocessing, model architecture design, training, evaluation, and interpretation.

## 2. Data Preparation

### 2.1 Data Cleaning/Preparation

The dataset from FairFace is split between a training set and a validation set, lacking the test set. In this section, I will:
- Combine all the images into one folder
- Rename the images starting from 1 to the total number of images
- Compile all the csv files into one with the new names
- Split the images and csv into training, validation, and test sets

In [25]:
import os
import shutil
import csv

In [24]:
# Moving the training images to the all images folder
temp_train_path = "C:\\Users\\mashe\\Downloads\\temp\\train"
all_images = "C:\\Users\\mashe\\Downloads\\temp\\all"

train_files = os.listdir(temp_train_path)

for file_name in train_files:
    train_path = os.path.join(temp_train_path, file_name)
    all_path = os.path.join(all_images, file_name)
    shutil.move(train_path, all_path)

In [42]:
# Moving the contents of the train csv to a general csv file
# Changing the file column to only include the name of the file
temp_train_csv = "C:\\Users\\mashe\\Downloads\\temp_csv\\images_train.csv"
general_csv = "C:\\Users\\mashe\\Downloads\\temp_csv\\all.csv"

new_rows = []
with open(temp_train_csv, 'r') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        modified_row = [row[0].split('/')[-1]] + row[1:]
        new_rows.append(modified_row)

with open(general_csv, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(new_rows)

In [23]:
# Setting a variable for the name of the last image in train
img_name = str(len(os.listdir(temp_train_path)) + 1) + ".jpg"
img_name

'86745.jpg'

In [None]:
# Moving the validation images
temp_val_path = "C:\\Users\\mashe\\Downloads\\temp\\val"
all_images = "C:\\Users\\mashe\\Downloads\\temp\\all"

val_files = os.listdir(temp_val_path)

# for val, take an image, get its name
# go to the csv, and rename both the image and it's name in the csv
# then move the image to the val folder
# at the end, copy the csv to the new csv

### 2.2 Data Loading

### 2.3 Data Preprocessing

### 2.4 Data Exploration

## 3. Model Architecture

### 3.1 Neural Network Design

### 3.2 Model Compilation

## 4. Model Training

### 4.1 Training Process

## 5. Model Evaluation

### 5.1 Model Performance

### 5.2 Confusion Matrix

## 6. Model Deployment

### 6.1 Model Saving

## 7. Conclusion

## 8. References

The dataset was retrieved from the FairFace study.

Karkkainen, Kimmo and Joo, Jungseock. (2021). FairFace: Face Attribute Dataset for Balanced Race, Gender, and Age for Bias Measurement and Mitigation. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1548-1558. 10.1109/WACV48630.2021.00159