# Read Features of the Datasets

The dataset we used are pictures of different people.The name of each picture contains the date of birth and photo taken date of the person.The 'wiki.mat'file records some other information such as genders,names and so on.This step is to read the information of this file.

In [1]:
import scipy.io
mat = scipy.io.loadmat('./wiki_crop/wiki.mat')
print(mat)

{'wiki': array([[(array([[723671, 703186, 711677, ..., 720620, 723893, 713846]], dtype=int32), array([[2009, 1964, 2008, ..., 2013, 2011, 2008]], dtype=uint16), array([[array(['17/10000217_1981-05-05_2009.jpg'], dtype='<U31'),
        array(['48/10000548_1925-04-04_1964.jpg'], dtype='<U31'),
        array(['12/100012_1948-07-03_2008.jpg'], dtype='<U29'), ...,
        array(['09/9998109_1972-12-27_2013.jpg'], dtype='<U30'),
        array(['00/9999400_1981-12-13_2011.jpg'], dtype='<U30'),
        array(['80/999980_1954-06-11_2008.jpg'], dtype='<U29')]],
      dtype=object), array([[1., 1., 1., ..., 1., 1., 0.]]), array([[array(['Sami Jauhojärvi'], dtype='<U15'),
        array(['Dettmar Cramer'], dtype='<U14'),
        array(['Marc Okrand'], dtype='<U11'), ...,
        array(['Michael Wiesinger'], dtype='<U17'),
        array(['Johann Grugger'], dtype='<U14'),
        array(['Greta Van Susteren'], dtype='<U18')]], dtype=object), array([[array([[111.29109473, 111.29109473, 252.66993082, 25

# Create a Dataframe

Converting the information we need("photo_taken", "full_path", "gender","face_score") in the 'wiki.mat' file into a pandas dataframe.

In [2]:
instances = mat['wiki'][0][0][0].shape[1]
 
columns = ["dob", "photo_taken", "full_path", "gender", "name", "face_location", "face_score", "second_face_score"]
 
import pandas as pd
df = pd.DataFrame(index = range(0,instances), columns = columns)
 
for i in mat:
    if i == "wiki":
        current_array = mat[i][0][0]
        for j in range(len(current_array)):
            df[columns[j]] = pd.DataFrame(current_array[j][0])
print(df)

          dob  photo_taken                          full_path  gender  \
0      723671         2009  [17/10000217_1981-05-05_2009.jpg]     1.0   
1      703186         1964  [48/10000548_1925-04-04_1964.jpg]     1.0   
2      711677         2008    [12/100012_1948-07-03_2008.jpg]     1.0   
3      705061         1961  [65/10001965_1930-05-23_1961.jpg]     1.0   
4      720044         2012  [16/10002116_1971-05-31_2012.jpg]     0.0   
5      716189         2012  [02/10002702_1960-11-09_2012.jpg]     0.0   
6      707745         1971  [41/10003541_1937-09-27_1971.jpg]     1.0   
7      695763         1982    [39/100039_1904-12-07_1982.jpg]     1.0   
8      711000         2007  [13/10004113_1946-08-26_2007.jpg]     1.0   
9      723987         2011  [22/10004122_1982-03-17_2011.jpg]     1.0   
10     697114         1950  [99/10004299_1908-08-19_1950.jpg]     1.0   
11     706177         1969   [56/1000456_1933-06-12_1969.jpg]     1.0   
12     725873         2010  [82/10004882_1987-05-16

# Calculate the Age 

We use the information of "photo_taken", "full_path"(which contains the time of birth) to calculate the actual time of birth of the person (unit as year, for example:year of birth: 1995.50 represents the actual date of birth:07/01/1995 )using the following formulations.

In [3]:
from datetime import datetime, timedelta
def datenum_to_datetime(datenum):
    days = datenum % 1
    hours = days % 1 * 24
    minutes = hours % 1 * 60
    seconds = minutes % 1 * 60
    exact_date = datetime.fromordinal(int(datenum)) \
    + timedelta(days=int(days)) + timedelta(hours=int(hours)) \
    + timedelta(minutes=int(minutes)) + timedelta(seconds=round(seconds)) \
    - timedelta(days=366)
    return exact_date.year
 
df['date_of_birth'] = df['dob'].apply(datenum_to_datetime)

obtain the age using the actual time of birth we calculated above and photo taken time.

In [4]:
df['age'] = df['photo_taken'] - df['date_of_birth']

# Remove "Noisy" pictures

Some pictures don’t include face of the people in the wiki data set.And some images of people have wrong or missing information.For example,age information is missing for some records.Also some pictures are too vague to detect.These pictures may confuse the model.We should take some steps to ignore of drop them.In the following steps,we removed the pictures that do not include face,include more than one face and do not have a gender information.

In [5]:
import numpy as np
#remove pictures does not include face
df = df[df['face_score'] != -np.inf]
 
#some pictures include more than one face, remove them
df = df[df['second_face_score'].isna()]
 
#check threshold
df = df[df['face_score'] >= 3]
 
#some records do not have a gender information
df = df[~df['gender'].isna()]
 
df = df.drop(columns = ['name','face_score','second_face_score','date_of_birth','face_location'])

Remove the images that have ages more than 100 or less than 0

In [6]:
#some guys seem to be greater than 100. some of these are paintings. remove these old guys
df = df[df['age'] <= 100]
 
#some guys seem to be unborn in the data set
df = df[df['age'] >= 0]

# Reclassify Data  

We create 100 folders to contain the images with ages from 0 to 100.Each folder name corresponds to the images of people with that age.(for example,folder 20 contains all images of people aging 20). This step is to unify the data form so we can use the data generator in the following training process.

In [7]:
import os,shutil
def mycopyfile(df):
    dst = './test'
    dir_exists = os.path.isdir(dst)
    if not dir_exists:
        os.mkdir(dst)
        print("Making directory %s" % dst)
    dst = './train'
    dir_exists = os.path.isdir(dst)
    if not dir_exists:
        os.mkdir(dst)
        print("Making directory %s" % dst)
        
    df_age1 = np.empty(101)
    for i in range(0,101):
        df_age1[i] = df[df.age == i].shape[0]
        dst = './train/{0:03d}'.format(i)
        dir_exists = os.path.isdir(dst)
        if not dir_exists:
            os.mkdir(dst)
            print("Making directory %s" % dst)
        else:
            print("Will store images in directory %s" % dst)
        dst = './test/{0:03d}'.format(i)
        dir_exists = os.path.isdir(dst)
        if not dir_exists:
            os.mkdir(dst)
            print("Making directory %s" % dst)
        else:
            print("Will store images in directory %s" % dst)
            
    df_age2 = np.around(df_age1/4, 0)
    
    for i,ind in enumerate (df.index):
        srcfile = './wiki_crop/{0:s}'.format(df.loc[ind,'full_path'][0])
        if df_age2[df.loc[ind,'age']] >= 1:
            dstfile = './test/{1:03d}/{2:05d}_{0:.0f}_{1:03d}.jpg'.format(df.loc[ind,'gender'],df.loc[ind,'age'],i)
            df_age2[df.loc[ind,'age']] = df_age2[df.loc[ind,'age']]-1
            dst = './test/{0:03d}'.format(df.loc[ind,'age'])
        else:
            dstfile = './train/{1:03d}/{2:05d}_{0:.0f}_{1:03d}.jpg'.format(df.loc[ind,'gender'],df.loc[ind,'age'],i)
            dst = './train/{0:03d}'.format(df.loc[ind,'age'])
        dir_exists = os.path.isdir(dst)
        if not dir_exists:
            os.mkdir(dst)
            print("Making directory %s" % dst)
        else:
            print("Will store images in directory %s" % dst)
        
        if not os.path.isfile(srcfile):
            print ("%s not exist!"%(srcfile))
        else:
            fpath,fname=os.path.split(dstfile)    #分离文件名和路径
            if not os.path.exists(fpath):
                os.makedirs(fpath)                #创建路径
            shutil.copyfile(srcfile,dstfile)      #复制文件cd
            print ("copy %s -> %s"%( srcfile,dstfile))
        
mycopyfile(df)

Making directory ./test
Making directory ./train
Making directory ./train/000
Making directory ./test/000
Making directory ./train/001
Making directory ./test/001
Making directory ./train/002
Making directory ./test/002
Making directory ./train/003
Making directory ./test/003
Making directory ./train/004
Making directory ./test/004
Making directory ./train/005
Making directory ./test/005
Making directory ./train/006
Making directory ./test/006
Making directory ./train/007
Making directory ./test/007
Making directory ./train/008
Making directory ./test/008
Making directory ./train/009
Making directory ./test/009
Making directory ./train/010
Making directory ./test/010
Making directory ./train/011
Making directory ./test/011
Making directory ./train/012
Making directory ./test/012
Making directory ./train/013
Making directory ./test/013
Making directory ./train/014
Making directory ./test/014
Making directory ./train/015
Making directory ./test/015
Making directory ./train/016
Making dir