<img src=https://brand.uark.edu/_resources/images/UA_Logo_Horizontal.jpg width="400" height="96">

###_Artificial intelligence for image processing and analysis._

# Notebook 4.1 Predicting Age from Faces
---
##### The purpose of this notebook is introduce the UTKFaces dataset and give an example of how to utilize the dataset class in pytorch.



### Required packages
---
##### **_NOTE: This notebook will require the use of GPU hardware acceleration. please refer to notebook 2.4_ParallelProcessing if you need  refresher on how to do this._**
##### **_Run this code chunk first. If you encounter an error when trying to run code chunks in this notebook, then first try re-running this chunk._**


In [None]:
# Import all of the necessary packages
import numpy as np
import imageio
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torchvision.transforms as T
import time
import pandas as pd
from IPython.display import clear_output
import ipywidgets as widgets

# UTK Face Dataset
---
##### For this notebook, we will be using the [UTK Face](https://susanqq.github.io/UTKFace/) dataset, which contains roughly 23,000 images of peoples faces of varying age, gender, and ethnicity. 
<img src="https://susanqq.github.io/UTKFace/icon/samples.png" alt="samples" width="574" height="330">

##### In this notebook, we will be using this dataset to predict the age of people just based on photos of their face.




### Downloading the dataset
---
##### Downloading this dataset is a little different than in previous, but keep in mind that **these files are not stored on your computer, and may have to be re-downloaded if you close out and re-open this notebook.**
##### To make the training a order of magnitude faster, all of the images and ages for the images in the dataset are pre-loaded into memory, which are then used later.

In [None]:
import os

# Check to see if file exists
filename = "/content/age_gender.csv"
if not os.path.isfile(filename):
  # If it does not, then download it
  !gdown --id 1IqVy6z09vymy4KJb4AcJtV3Q5hnyFLFB

# Load the dataset
data = pd.read_csv(filename)

# Move important information out of the data variable since pandas does not like to hold on to information like...
 
# Age of people in images
data_ages = np.expand_dims(np.array(data['age']),1) / 1.

# Images of people
print('Breaking up dataset...')
data_images = np.zeros((48,48,len(data)))
for i in range(len(data)):
  data_images[:,:,i] = np.array(data['pixels'][i].split(),'float64').reshape(48,48) / 255.
print('...Done')

### Observing the dataset
---
###### To make the network training more easy to complete, the once R/G/B and variably sized images in the dataset have been converted into greyscale, the faces have been centered and cropped, and the images have been downsized from 200 by 200 pixels to 48 by 48 pixels. In the code chunk below, a random set of 5 images in the dataset are displayed.

In [None]:
# Display a random set of five images in the dataset
random_int = np.random.randint(low = 0, high = len(data),size=5)
fig = plt.figure(figsize=(20,20))
for i in range(5):
  fig.add_subplot(1,5,i+1)
  plt.imshow(data_images[:,:,random_int[i]],cmap='gray')
  plt.gca().set_title('Age: ' + str(data_ages[random_int[i]][0]))

# Datasets in the wild
---
##### Although some datasets have become standard, such as the MNIST digits dataset used previously, a large portion of them have not, and can be generated by anyone. 
##### This results in inconsistent and potentially confusing naming conventions or data in general that requires some form of documentation to understand.
##### Due to these inconsistencies, we need to create a way to easily and reliably read data from a dataset. In `torch`, there is a general `Dataset` class that allows us to build upon and output data in a consistent fashion that is used for training. 
##### The following code chunk is all of the code involved for translating the list of images obtained in the last code chunk to a dataset that can be used for training.

In [None]:
class UTKFaceDataset(torch.utils.data.Dataset):
  # Define what will be ran at initialization of the UTKFaceDataset class
  def __init__(self, image_array, age_array):
    # Attach the list of images and ages to the class
    # These are attached to the class so that they can be accessed by other methods in the class
    self.images = image_array
    self.ages = age_array

    # Initialize a transform that will be used later
    self.tform = T.ToTensor()                       

  # There are two required methods for a class that inherits from torch.utils.data.Dataset:
  # __len__()
  # __getitem__()

  # Return the length of the dataset
  def __len__(self):
    return len(self.ages)

  # Return a single image from the dataset, as well as the age associated with the image
  def __getitem__(self,idx):
    # Return a single variable (dict) that contains both the image and the age
    out = {
        'image': self.tform(self.images[:,:,idx]).float(),
        'age': torch.tensor(self.ages[idx]).float()
    }
    return out

### Checking our class
---
##### The following code chunk just verifies that our class is working the way we expect

In [None]:
# Initialize the dataset
dataset = UTKFaceDataset(data_images,data_ages)

# Display the first five images in the dataset
fig = plt.figure(figsize=(20,20))
for i in range(5):
  data = dataset.__getitem__(random_int[i])
  fig.add_subplot(1,5,i+1)
  plt.imshow(data['image'].squeeze(0),cmap='gray')
  plt.gca().set_title('Age: ' + str(data['age'][0].numpy()))

# Manually predicting age
---
##### In the following code chunk, you will be shown 10 images of faces, and your goal is to predict the age of the faces.
##### When you are ready, submit your first guess by inputting an age and pressing enter. Once you press enter, you will be timed until you complete the task.
##### Here is the output from when I tried predicting the age:
`Your final average loss is: 39.9`

`You have an average prediction error of 6.32 years.`

`On average, you took 3.47 seconds to guess the age.`

In [None]:
#@title --- Hidden code (double-click to show code) ---
# Generate a random list of 10 images
test_ind = np.random.randint(low = 0, high = len(dataset),size=11)

# Initialize the interface that will be used
plt.figure(figsize=(5,5))
data = dataset.__getitem__(test_ind[0])
plt.imshow(data['image'].squeeze(0),cmap='gray');
plt.show()
age_guess = input("Guess the age:")
guesses = []

# When the first guess is inputted, the timer starts
time_elapsed = []

for i in range(1,11):
  clear_output(wait=True)
  start_time = time.time()
  plt.figure(figsize=(5,5))
  data = dataset.__getitem__(test_ind[i])
  plt.imshow(data['image'].squeeze(0),cmap='gray');
  plt.show()
  age_guess = input("Guess the age:")
  guesses.append(int(age_guess))
  end_time = time.time()
  time_elapsed.append((end_time-start_time))

clear_output(wait=True)
avg_mse_error = torch.mean((torch.tensor(guesses).unsqueeze(1) - dataset.__getitem__(test_ind[1:11])['age'])**2).item()
avg_time = torch.mean(torch.tensor(time_elapsed)).item()
print('Your final average loss is: ' + str(round(avg_mse_error,2)))
print('You have an average prediction error of ' + str(round((avg_mse_error)**0.5,2)) + ' years.')
print('On average, you took ' + str(round(avg_time,2)) + ' seconds to guess the age.')