# Gender Classification Project

In this notebook we will use [Gender Classification Dataset](https://www.kaggle.com/cashutosh/gender-classification-dataset) from Kaggle. This is a dataset with cropped images of male and female. It is split into training and validation directory. Training contains ~23,000 images of each class and validation directory contains ~5,500 iamges of each class. Our *goal* is to correctly predict gender: `male` or `female`. 

To achieve better result, we will use [ResNet](https://arxiv.org/abs/1512.03385)(Residual Network) model variants. ResNet was designed for the [ImageNet challenge](http://www.image-net.org/challenges/LSVRC/), and won it in 2015.


## Data Processing

We start by importing all the necessary modules
 - `lr-scheduler` - for using the one cycle learning rate schedule, from [this paper](https://arxiv.org/abs/1803.09820)
 - `namedtuple` - for handling ResNet configurations
 - `os` and `shutil` for handling custom datasets

In [3]:
import numpy as np
import matplotlib.pyplot as plt
import copy 
from collections import namedtuple
import os
import random
import shutil
import time

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.optim.lr_scheduler as lr_scheduler
from torch.optim.lr_scheduler import _LRScheduler
import torch.utils.data as data

import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models

We set the random seeds for reproducability

In [4]:
SEED = 1234

random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
torch.cuda.manual_seed(SEED)
torch.backends.cudnn.deterministic = True

## Load and Augment Data

In [5]:
data_dir = 'D:/datasets/gender_kaggle'
train_dir = os.path.join(data_dir, 'Training')
test_dir = os.path.join(data_dir, 'Validation')
classes = os.listdir(train_dir)

In [9]:
classes

['female', 'male']

### Normalizing the data

In [11]:
train_data = datasets.ImageFolder(root=train_dir, transform = transforms.ToTensor())

means = torch.zeros(3)
stds = torch.zeros(3)

for img, label in train_data:
    means += torch.mean(img, dim = (1,2))
    stds += torch.std(img, dim = (1,2))
    
means /= len(train_data)
stds /= len(train_data)

print(f'Calculated means: {means}')
print(f'Calculated stds: {stds}')

Calculated means: tensor([0.6527, 0.4830, 0.4047])
Calculated stds: tensor([0.2061, 0.1797, 0.1678])
