# Identifying the faces of Nogizaka 46 with CNN
<p>Student ID: 71745242<br>
Name: Shozen Dan <br>
Class: Heuristic Computing <br>
Instructor: Takefuji Yoshiyasu</p>

## 1. Objective
<p>The other day, when I was surfing the internet for interesting machine learning ideas, I discovered a blog on Aidemy about facial identification[1]. The author is a fan of the popular idol group Nogizaka 46 and wanted to create a program that would identify the faces of his favorite 5 memebers. The steps he took are as simple. He obtained 100 images for each of the target members using Google's custom search API. Next he used the image processing library OpenCV to find the faces in each picture and crop them out. At this point there were only about 70 images remaining for each member, thus he augmented the data and increased the data size 8 times. Finally, he created a network using Keras and Tesorflow and achieved an accuracy of about 70 percent. I found another blog[2] trying to solve the same problem and the author achieved an accuracy of 75~80 percent. Although the problem its self is rather frivolous and have no practical applications, the steps involved in solving it can be applied widely. Thus the objective of my project is as follows</p>
<ol>
    <li>Learn the methods involved in dealing with small datasets for deep learning</li>
    <li>Achieve an accuracy of more than 80 percent</li>
</ol>

## 2. Obtaining Data
<p>The first step is to obtain the images. Since there were no database, I need to scrape the images from the internet. For this I am going to use Google's Custom Search API. It enables us to obtain the search results for specific key words. For more on how to use the API, please visit the link in the citation[2].</p>

### Google Custom Search API

Importing the necessary python libraries

In [None]:
import httplib2
import json
import os 
import urllib.request
import glob
import cv2
import numpy as np
import math
import shutil
from scipy import ndimage
from urllib.parse import quote

Setting up basic configuration parameters. The API will return links to 10 images per query, and the free edition only accepts 100 queries per day. Also the number of images requested in one search cannot exceed 100, else the API will return a error.

In [None]:
API_KEY='AIzaSyDje-1_ZTr3amF89Yy3fI6aSnTqp1xgufc'
CUSTOM_SEARCH_ENGINE='018321335581135235938:lyu926pg63g'
KEYWORDS=["生田絵梨花","齋藤飛鳥","白石麻衣","西野七瀬","橋本奈々未"]
NUM_OF_IMAGES=100

<p>This function takes the list of seach keywords and the number of images as parameters. It will use Google's custom search API to obtain the links to each image and return them a list containing the links.</p>

In [None]:
def getImageUrl(search_item: list, total_num: int):
    img_list = []
    i = 0
    while i < total_num:
        query_img = "https://www.googleapis.com/customsearch/v1?key=" + API_KEY + "&cx=" + CUSTOM_SEARCH_ENGINE + "&num=" + str(10 if(total_num-i)>10 else (total_num-i)) + "&start=" + str(i+1) + "&q=" + quote(search_item) + "&searchType=image"
        res = urllib.request.urlopen(query_img)
        data = json.loads(res.read().decode('utf-8'))
        for j in range(len(data['items'])):
            img_list.append(data['items'][j]['link'])
        i=i+10
    return img_list

This function takes the list of search keywords, the list of links to images, and the path of the directory to house the downloaded images. It will create a directory for each keyword and place the downloaded images in it.

In [None]:
def getImage(search_item: list, img_list: list, base_dir_name='images'):
    os.mkdir(base_dir_name) # create base dir
    item_dir = os.path.join(base_dir_name, search_item)
    os.mkdir(item_dir) # directory to house the images
    http = httplib2.Http(".cache") # Initiate http request object instance 
    for i in range(len(img_list)):
        try:
            response, content = http.request(img_list[i])
            filename = os.path.join(item_dir, search_item + '.' + str(i) + '.jpg')
            with open(filename, 'wb') as f:
                f.write(content)
        except:
            print('Error: failed to download image')
            continue

In [None]:
for j in range(len(KEYWORDS)):
    print('=== downloading images for {} ==='.format(KEYWORDS[j]))
    img_list=getImageUrl(KEYWORDS[j],NUM_OF_IMAGES)
    getImage(KEYWORDS[j], img_list)

## 3. Preprocessing
<p>The next step in the process is pre-processing our data. The first step is using a face recognition algorithm to find faces in each picture, crop and resize them to a format that the neural network can accept as input. The second step will be to divide the data into train, validation, and test groups. The third step is to convert in the images in to tensors(A form of data that the tensorflow network recognizes as input). The final step is augmenting the data.</p>

### Recoginize,  Crop, and Resize with OpenCV
<p>Here I will use the image processing library OpenCV to recognize, crop, and resize the faces in each image. I usedf a method called haarcascades. More on haarcascades can be found in the link in the citations[3].</p>

In [None]:
root="./images/*" # The directory where the downloaded images are housed
dst_dir="./cropped" # The directory to place the cropped and resized images
os.mkdir(dst_dir)
src_dir=glob.glob(root)

In [None]:
# Will crop and resize the downloaded images using OpenCV and place the results in the destination directory declared above
for path in src_dir:
    dst = os.path.join(dst_dir, path.split('/')[2])
    os.mkdir(dst)
    for img in os.listdir(path):
        image = cv2.imread(os.path.join(path, img))
        if image is None:
            continue
        image_grey=cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # Convert image to greyscale
        cascade=cv2.CascadeClassifier("/usr/local/opt/opencv/share/OpenCV/haarcascades/haarcascade_frontalface_alt.xml") #Import Classifier
        face_list=face_list=cascade.detectMultiScale(image_grey, scaleFactor=1.1, minNeighbors=2,minSize=(64,64)) # Face recognition
        # If faces were detected
        if len(face_list) > 0: 
            for rect in face_list:
                x,y,width,height=rect
                image=image[rect[1]:rect[1]+rect[3],rect[0]:rect[0]+rect[2]]
                if image.shape[0] < 64: 
                    continue
                image = cv2.resize(image,(64,64))
            fileName=os.path.join(dst, img)
            cv2.imwrite(fileName, image) # Save image
        else:
            continue

### Dividing the Dataset into Training, Validation, and Test Subsets
<p>I will be using the Deep-learning framework Keras for this project. It is common practice in deep learning to divide the data in to three subsets: training, validation, and testing. The code below is mostly about creating the directories and moving the cropped pictures into them.</p>

In [None]:
# Path to the directory that will hold the data
base_dir='./input_data'
os.mkdir(base_dir)

In [None]:
# The directories to house the training images, validation images, and test images
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

In [None]:
# Creating the directory to house the training images
train_erika_dir = os.path.join(train_dir, 'erika')
os.mkdir(train_erika_dir)
train_asuka_dir = os.path.join(train_dir, 'asuka')
os.mkdir(train_asuka_dir)
train_mai_dir = os.path.join(train_dir, 'mai')
os.mkdir(train_mai_dir)
train_nanase_dir = os.path.join(train_dir, 'nanase')
os.mkdir(train_nanase_dir)
train_nanami_dir = os.path.join(train_dir, 'nanami')
os.mkdir(train_nanami_dir)

In [None]:
# The directory to house the testing images
test_erika_dir = os.path.join(test_dir, 'erika')
os.mkdir(test_erika_dir)
test_asuka_dir = os.path.join(test_dir, 'asuka')
os.mkdir(test_asuka_dir)
test_mai_dir = os.path.join(test_dir, 'mai')
os.mkdir(test_mai_dir)
test_nanase_dir = os.path.join(test_dir, 'nanase')
os.mkdir(test_nanase_dir)
test_nanami_dir = os.path.join(test_dir, 'nanami')
os.mkdir(test_nanami_dir)

The following snippets of code, moves the cropped images in to the directories created above. Note that 70 percent of the images are used for training, 20 for validation, and 10 for testing.

In [None]:
fnames = glob.glob("./cropped/生田絵梨花/*") 
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_erika_dir, 'erika.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_erika_dir, 'erika.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./cropped/齋藤飛鳥/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_asuka_dir, 'asuka.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_asuka_dir, 'asuka.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./cropped/白石麻衣/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_mai_dir, 'mai.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_mai_dir, 'mai.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./cropped/橋本奈々未/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_nanase_dir, 'nanami.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_nanase_dir, 'nanami.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./cropped/西野七瀬/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_nanami_dir, 'nanase.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_nanami_dir, 'nanase.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

### Augmentation

<p>After filtering out the defected images by hand, we only have 35~50 images for training and about 20 for testing. While I can train a model using the data at hand, due to the fact that dataset is simply too small, the model will fall into overfitting after a handful of epochs. Overfitting is caused by having too few samples to learn from, rendering us unable to train a model able to generalize to new data. In order to fix this problem, I am going to use a method called data augmentation. Data augmentation takes the approach of generating more training data from existing training samples, by "augmenting" the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, our model would never see the exact same picture twice. This helps the model get exposed to more aspects of the data and generalize better.</p>

In [None]:
names = ["asuka","mai","erika","nanami","nanase"]
for name in names:
    in_dir = "./input_data/train/"+name+"/*"
    out_dir = "./input_data/train/"+name
    in_jpg=glob.glob(in_dir)
    img_file_name_list=os.listdir("./input_data/train/"+name+"/")
    for i in range(len(in_jpg)):
        img = cv2.imread(str(in_jpg[i]))
        # Rotate Images
        for ang in [-10,0,10]:
            img_rot = ndimage.rotate(img,ang)
            img_rot = cv2.resize(img_rot,(64,64))
            fileName=os.path.join(out_dir,str(i)+"_"+str(ang)+".jpg")
            cv2.imwrite(str(fileName),img_rot)
            # Threshold
            img_thr = cv2.threshold(img_rot, 100, 255, cv2.THRESH_TOZERO)[1]
            fileName=os.path.join(out_dir,str(i)+"_"+str(ang)+"thr.jpg")
            cv2.imwrite(str(fileName),img_thr)
            # Filter Images
            img_filter = cv2.GaussianBlur(img_rot, (5, 5), 0)
            fileName=os.path.join(out_dir,str(i)+"_"+str(ang)+"filter.jpg")
            cv2.imwrite(str(fileName),img_filter)

### Validity Check

In [None]:
print('total training erika images: ', len(os.listdir(train_erika_dir)))
print('total test erika images: ', len(os.listdir(test_erika_dir)))

In [None]:
print('total training asuka images: ', len(os.listdir(train_asuka_dir)))
print('total test asuka images: ', len(os.listdir(test_asuka_dir)))

In [None]:
print('total training mai images: ', len(os.listdir(train_mai_dir)))
print('total test mai images: ', len(os.listdir(test_mai_dir)))

In [None]:
print('total training nanase images: ', len(os.listdir(train_nanase_dir)))
print('total test nanase images: ', len(os.listdir(test_nanase_dir)))

In [None]:
print('total training nanami images: ', len(os.listdir(train_nanami_dir)))
print('total test nanami images: ', len(os.listdir(test_nanami_dir)))

### Converting the Data into Tensors
<p>Currently the images are in jpeg format and the network doest not support it as input. Data need to be formatted into appropriately pre-processed floating point tensors before being fed into our network. The required steps are as follows.
<ol>
    <li>Read the picture files.
    <li>Decode the JPEG content to RBG grids of pixels.
    <li>Convert these into floating point tensors.
    <li>Rescale the pixel values (between 0 and 255) to the [0, 1] interval (neural networks prefer to deal with small input values).
</ol>

In [None]:
name = ["asuka","mai","erika","nanami","nanase"]

In [None]:
# Labeling the training data
train_dir="./input_data/train/"
X_train = []
Y_train = []
for i in range(len(name)):
    img_file_name_list=os.listdir(train_dir+name[i])
    print('Found {} training images for {}'.format(len(img_file_name_list), name[i]))
    for j in range(0, len(img_file_name_list)-1):
        n=os.path.join(train_dir+name[i]+"/", img_file_name_list[j])
        img = cv2.imread(n)
        b,g,r = cv2.split(img)
        img = cv2.merge([r,g,b])
        # refactoring the image
        img = np.divide(img, 255)
        X_train.append(img)
        Y_train.append(i)

In [None]:
# Labeling the validation data
validation_dir="./input_data/test/"
X_test = []
Y_test = []
for i in range(len(name)):
    img_file_name_list=os.listdir(validation_dir+name[i])
    print('Found {} testing images for {}'.format(len(img_file_name_list), name[i]))
    for j in range(0, len(img_file_name_list)-1):
        n=os.path.join(validation_dir+name[i]+"/", img_file_name_list[j])
        img=cv2.imread(n)
        b,g,r = cv2.split(img)
        img = cv2.merge([r,g,b])
        # Refactoring the images
        img = np.divide(img, 255)
        X_test.append(img)
        Y_test.append(i)
X_train=np.array(X_train)
X_test=np.array(X_test)

In [None]:
from keras.utils.np_utils import to_categorical
y_train = to_categorical(Y_train)
y_test = to_categorical(Y_test)

## 4. Testing the Original Model

This is the model used by the author of the original article. It consists of 4 sets of convolution/pooling layers and two dense layerz. Usually, relu is used as the activation function, however according to the article, sigmoid is the better choice in this case(perhaps sigmoid is better for shallow networks). The loss function used to evaluate the network is categorical crossentropy, and the optimizer is standard gradient decent. 

In [None]:
from keras import layers, Input
from keras.models import Model, Sequential

In [None]:
model = Sequential()
model.add(layers.Conv2D(32, (2, 2), input_shape=(64,64,3), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32, (2, 2), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32, (2, 2), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128, (2, 2), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='sigmoid'))
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(5, activation='softmax'))

In [None]:
model.summary()

### Compilation

In [None]:
from keras import optimizers

In [None]:
model.compile(loss='categorical_crossentropy',
             optimizer='sgd',
             metrics=['accuracy'])

###  Training

In [None]:
# Using the batch generator to fit the model to the data
history = model.fit(X_train, y_train, 
                    batch_size=32,
                    epochs=100,
                    validation_data=(X_test, y_test),
                    verbose=1
                    )

### Saving the Model

In [None]:
model.save('ngz46_1.h5')

### Visualizing the Result

In [None]:
import matplotlib.pyplot as plt

In [None]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

In [None]:
epochs = range(1, len(acc) + 1)

In [None]:
# Plot the accuracy
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and Validation Accuracy')
plt.legend()
plt.savefig('./orig_acc.png')

plt.figure()

# Plot the loss value
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and Validation Loss')
plt.legend()
plt.savefig('./orig_loss.png')
plt.show()

### Consideration

<p>The two graphs above show the result of original network. After testing the network several times, I discovered that the maximum accuracy for the test data was 80(an average of 75). This is not an excellent result. In addition to that, the validation accuracy is very unstable.</p>
<p>The most likely reason for this is the amount and quality of data. Since we only have 300~500 images per person(not to mention that most are augmented data) and 20 or less to validate the results on, we cannot expect good results without improvements to either the amount or the quality of data</p>
<p>The second improvement that can be done is to the network. The original network is very simple and not optimized for face recognition. By using a model that is more suited to the task we can expect a rise in accuracy.</p>

## 5. Improving the Dataset

<p>The three steps commonly involved in face recognition is as follows[3]:</p>
<ol>
    <li>Face Detection</li>
    <li>Face Alignment</li>
    <li>Face Recognition</li>
</ol>
<p>In the previous experiment with the original model, I skipped the second step. Therefore before moving on with optimizing the network we will begin with properly aligning the images we have. Here I will not go into depths about face alignment algorithms but rather utilize the model and algorithm implemented by dlib. The great thing about dlib is that it accomplishes the task of detection, alignment, and cropping all at once, reducing the amount of code we have to write. The original code can be found at dlib.net with the link in the citation[4]</p>

In [5]:
import dlib
import cv2
print('Using dlib version: {}'.format(dlib.__version__))
print('Using cv2 version: {}'.format(cv2.__version__))

Using dlib version: 19.16.0
Using cv2 version: 3.4.1


In [None]:
def FaceAligner(face_file_path, output_path, predictor_path='./shape_predictor_5_face_landmarks.dat'):
    # Load all the models we need: a detector to find the faces, a shape predictor
    # to find face landmarks so we can precisely localize the face
    detector = dlib.get_frontal_face_detector()
    sp = dlib.shape_predictor(predictor_path)
    
    img=cv2.imread(face_file_path)
    if img is None:
        return
    
    b,g,r = cv2.split(img)
    img = cv2.merge([r,g,b])
    
    # Ask the detector to find the bounding boxes of each face. The 1 in the
    # second argument indicates that we should upsample the image 1 time. This
    # will make everything bigger and allow us to detect more faces.
    dets = detector(img, 1)
    num_faces = len(dets)

    if num_faces == 0:
        return
    
    # Find the 5 face landmarks we need to do the alignment.
    faces = dlib.full_object_detections()
    for detection in dets:
        faces.append(sp(img, detection))

    # Save Image
    image = dlib.get_face_chip(img, faces[0])
    dlib.save_image(image, output_path)

In [None]:
# Testing
FaceAligner('./images/橋本奈々未/橋本奈々未.3.jpg', './align_test.jpg')

In [None]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

In [None]:
original_img = mpimg.imread('./images/橋本奈々未/橋本奈々未.3.jpg')
aligned_img = mpimg.imread('./align_test.jpg')

In [None]:
plt.subplot(1, 2, 1)
plt.axis("off")
plt.title('Original Image')
plt.imshow(original_img)

plt.subplot(1, 2, 2)
plt.axis("off")
plt.title('Aligned Image')
plt.imshow(aligned_img)

plt.show()

In [None]:
import os
import glob

In [None]:
root="./images/*" # The directory where the downloaded images are housed
dst_dir="./aligned" # The directory to place the cropped and resized images
os.mkdir(dst_dir)
src_dir=glob.glob(root)

# Will crop and resize the downloaded images using OpenCV and place the results in the destination directory declared above
for path in src_dir:
    dst = os.path.join(dst_dir, path.split('/')[2])
    os.mkdir(dst)
    for img in os.listdir(path):
        FaceAligner(os.path.join(path, img), os.path.join(dst, img))

In [None]:
# Path to the directory that will hold the data
base_dir='./input_data2'
os.mkdir(base_dir)

In [None]:
# The directories to house the training images, validation images, and test images
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)

In [None]:
# Creating the directory to house the training images
train_erika_dir = os.path.join(train_dir, 'erika')
os.mkdir(train_erika_dir)
train_asuka_dir = os.path.join(train_dir, 'asuka')
os.mkdir(train_asuka_dir)
train_mai_dir = os.path.join(train_dir, 'mai')
os.mkdir(train_mai_dir)
train_nanase_dir = os.path.join(train_dir, 'nanase')
os.mkdir(train_nanase_dir)
train_nanami_dir = os.path.join(train_dir, 'nanami')
os.mkdir(train_nanami_dir)

In [None]:
# The directory to house the testing images
test_erika_dir = os.path.join(test_dir, 'erika')
os.mkdir(test_erika_dir)
test_asuka_dir = os.path.join(test_dir, 'asuka')
os.mkdir(test_asuka_dir)
test_mai_dir = os.path.join(test_dir, 'mai')
os.mkdir(test_mai_dir)
test_nanase_dir = os.path.join(test_dir, 'nanase')
os.mkdir(test_nanase_dir)
test_nanami_dir = os.path.join(test_dir, 'nanami')
os.mkdir(test_nanami_dir)

In [None]:
import math
import shutil

In [None]:
fnames = glob.glob("./aligned/生田絵梨花/*") 
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_erika_dir, 'erika.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_erika_dir, 'erika.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./aligned/齋藤飛鳥/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_asuka_dir, 'asuka.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_asuka_dir, 'asuka.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./aligned/白石麻衣/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_mai_dir, 'mai.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_mai_dir, 'mai.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./aligned/橋本奈々未/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_nanase_dir, 'nanami.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_nanase_dir, 'nanami.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
fnames = glob.glob("./aligned/西野七瀬/*")
train_len = math.floor(len(fnames) * 0.7)
for i in range(len(fnames)):
    if i < train_len:
        src = fnames[i]
        dst = os.path.join(train_nanami_dir, 'nanase.{}.jpg'.format(i))
        shutil.copyfile(src, dst)
    else: 
        src = fnames[i]
        dst = os.path.join(test_nanami_dir, 'nanase.{}.jpg'.format(i))
        shutil.copyfile(src, dst)

In [None]:
import numpy as np

In [None]:
name = ["asuka","mai","erika","nanami","nanase"]

In [None]:
# Labeling the training data
train_dir="./input_data2/train/"
X_train = []
Y_train = []
for i in range(len(name)):
    img_file_name_list=os.listdir(train_dir+name[i])
    print('Found {} training images for {}'.format(len(img_file_name_list), name[i]))
    for j in range(0, len(img_file_name_list)-1):
        n=os.path.join(train_dir+name[i]+"/", img_file_name_list[j])
        img = cv2.imread(n)
        b,g,r = cv2.split(img)
        img = cv2.merge([r,g,b])
        # refactoring the image
        img = np.divide(img, 255)
        X_train.append(img)
        Y_train.append(i)

In [None]:
# Labeling the validation data
validation_dir="./input_data2/test/"
X_test = []
Y_test = []
for i in range(len(name)):
    img_file_name_list=os.listdir(validation_dir+name[i])
    print('Found {} testing images for {}'.format(len(img_file_name_list), name[i]))
    for j in range(0, len(img_file_name_list)-1):
        n=os.path.join(validation_dir+name[i]+"/", img_file_name_list[j])
        img=cv2.imread(n)
        b,g,r = cv2.split(img)
        img = cv2.merge([r,g,b])
        # Refactoring the images
        img = np.divide(img, 255)
        X_test.append(img)
        Y_test.append(i)
X_train=np.array(X_train)
X_test=np.array(X_test)

In [None]:
from keras.utils.np_utils import to_categorical
y_train = to_categorical(Y_train)
y_test = to_categorical(Y_test)

## 5. Original Model

Based on the results above, I created a new network. The difference between the previous network is that there is one more set of convolution/pooling layers, and that the feature maps are deeper. Also the optimizer was changed to RMSprop.

### Consideration

When compared to the orignal network, this one learns faster, achieving a accuracy rate of roughly 75 percent after only 25 epochs. At the same time it falls into overfitting much faster while the accuracy peaks at about the same as the previous network. In order to reduce overfitting I placed a dropout layer (with a dropout rate of 0.5) into the network, but the results instead of improving, the results were catastrophic. I adjusted the dropout rate and also tried a few other regularization techniques such as L2 and L3 regularizations but all these methods do is reduce the rate of overfitting. This  it would prove very difficult to go any higher just by training our own convnet from scratch, simply because we have so little data to work with. As a next step to improve our accuracy on this problem, we will have to leverage a pre-trained model, which will be the focus of the next two sections.

### Using Pretrained Networks

<p>A common and highly effective approach to deep learning on small image datasets is to leverage a pretrained network. A pretrained network is simply a saved network previously trained on a large dataset, typically on a large-scale image classification task. If the original dataset is large enough and general enough, the features of the network can act as an effective model of the visual world, therefore its feature can prove useful for many different computer vision problems, even though these new problems might involve completely different classes from the original task. For instance, one might train a network on ImageNet (where classes are mostly animals and everyday objects) and then reuse this network for identifying furniture items in images. Such portability of learned features across different problems is a key advantage of deep learning comapared to many older shallow learning approaches, and it makes deep learning very effective for small-data problems.</p>

<p>For this problem I will use a large convnet trained on the ImageNet dataset(1.4 million labeled images and 1000 different classes). I must note however that ImageNet is a dataset for object recognition not face recognition, thus it is not ideal. However the Keras provides easy access to networks trained on the dataset and it is more preferable to training a vast network from scratch myself.</p>

<p>For my first attempt I will use a architecture called VGG16, developed by Karen Simonyan and Andrew Zisserman in 2014. The network is simple, widely used, and simple to understand. It is a bit of an older model, and cannot compare with the current state of the art heavier ones, but I prefer it to the other because I can understand the underlying concepts.</p>

### Feature Extraction

<p>There are two ways to leverage a pre-trained network: feature extraction and fine-tuning. I will start with feature extraction. Feature extraction consists of using the representations learned by a previous network to extract interesting features from new samples. These features are then run through a new classifier, which is trained from scratch.</p>

<p> As can be seen in the earlier examples, convnets used for image classification are comprised from two parts: a series of pooling and convolution layers and a densely-connected classifier. The first part is called the "convolutional base". In the case of feature extraction, we simply take the convolutional base of a previously-trained network, run the new data through it, and train a new classifier on top of the output.</p>

<p>The reason we only reuse the convolution base is, that the representations learned by the convolution base are likely to be more generic and therefore more reusable: the feature maps of a convnet are presence maps of generic concepts over a picture, which is likely to be useful regardless of the computer vision problem at hand. On the other hand, the representations learned by the classifier will only contain information specific to the set of classes that the model was trained on. Additionally, representations found in densely-connected layers no longer contain any information about where objects are located in the input image: these layers get rid of the notion of space, whereas the object location is still described by convolutional feature maps. For problems where object location matters, densely-connected features would be largely useless.</p>

In [1]:
from keras.engine import  Model
from keras.layers import Flatten, Dense, Input
from keras_vggface.vggface import VGGFace

Using TensorFlow backend.


In [2]:
conv_base = VGGFace(include_top=False, input_shape=(150, 150, 3))

In [3]:
conv_base.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 150, 150, 64)      1792      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 150, 150, 64)      36928     
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 75, 75, 64)        0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 75, 75, 128)       73856     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 75, 75, 128)       147584    
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 37, 37, 128)       0         
__________

<p>The output above shows a summary of the VGG16 model. The classification layers has been removed from the model upon import therefore we are only looking at the convolution base. As you can see, the model is very simple and can be understood very easily. It is comprised of 5 blocks of convolution and pooling layers and has a total of 14,714,688 parameters, which is very large. The classifier I will be adding on top has 2 million parameters.</p>

<p>However before we can complie and train our model, a very important thing to do is to freeze the convolutional base. "Freezing" a layer of set of layers means preventing their weights from getting updated during training. If we don't do this, then the representations that were previously learned by the convolutional base would get modified during training. Since the Dense layers on top are randomly intialized, very large weight updates would be propagated throught the network, effectively destroying the representations previously learned.</p>

<p>We must note here that the level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model. Layers that come earlier in the model extract local. highly generic feature maps (such as visual edges, colors, and textures), while layers higher-up extract more abstract concepts. So if the new dataset differs a lot from the dataset that the original model was trained on, it might be better off using only the first few layers of the model to do feature extraction, rather than using the entire convolutional base.</p>

<p>In our case, the ImageNet class set did contain people, however the dataset is not optimized for face recognition, thus it is best to remove the classifier and use only the top layers of the network. That is where the second method, fine tuning, comes in.</p>

### Fine Tuning

<p>Fine tuning is another technique for model reuse that is complementary to feature extraction. While feature extraction reuses the weight of the pre-trained model by freezing layers, fine tuning does the opposite and unfreezes the top layer of the model. It is called "fine-tuning" because it slightly adjusts the more abstract representations of the model being reused, in order to make them more relevant for the problem at hand.</p>

<p>As I explained previously, it is necessary to freeze the convolution base in order to be able to train a randomly intialized classifier on top. For the smae reason, it is only possible to fine-tune the top layer of the convolution base once the classifier on top has already been trained. Thus the steps for fine-tuning a network are as follow:
<ol>
    <li>Add your custom network on top of an already trained base network.
    <li>Freeze the base network.
    <li>Train the part you added.
    <li>Unfreeze some layers in the base network.
    <li>Jointly train both these layers and the part you added. 
</ol>
</p>

In [None]:
#custom parameters
nb_class = 5
hidden_dim = 512

last_layer = conv_base.get_layer('pool5').output
x = Flatten(name='flatten')(last_layer)
x = Dense(hidden_dim, activation='relu', name='fc6')(x)
x = Dense(hidden_dim, activation='relu', name='fc7')(x)
out = Dense(5, activation='softmax', name='fc8')(x)
model = Model(conv_base.input, out)

In [None]:
model.summary()

In [None]:
# Freezing the Network
print('This is the number of trainable weights '
      'before freezing the conv base:', len(model.trainable_weights))
conv_base.trainable = False
print('This is the number of trainable weights '
      'after freezing the conv base:', len(model.trainable_weights))

In [None]:
from keras import optimizers

In [None]:
model.compile(loss='categorical_crossentropy',
             optimizer='sgd',
             metrics=['accuracy'])

In [None]:
# Using the batch generator to fit the model to the data
history = model.fit(X_train, y_train, 
                    batch_size=29,
                    epochs=25,
                    validation_data=(X_test, y_test),
                    verbose=1
                    )

In [None]:
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']

In [None]:
epochs = range(1, len(acc) + 1)

In [None]:
# Plot the accuracy
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

# Plot the loss value
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

## Citations
<p>
    <ol>
        <li>“機械学習で乃木坂46を顏分類してみた.” Aidemy Blog, 株式会社アイデミー, 4 Dec. 2018, blog.aidemy.net/entry/2017/12/17/214715.</li>
        <li>nirs_kd56. “乃木坂メンバーの顔をCNNで分類.” Qiita, 26 Sept. 2018, qiita.com/nirs_kd56/items/bc78bf2c3164a6da1ded.</li>
        <li>Lian, Qianli. A Summary of Deep Models for Face Recognition. cs.wellesley.edu/~vision/slides/Qianli_summary_deep_face_models.pdf.</li>
        <li>“Face Alignment - Dlib.” Dlib C++ Library, dlib.net/face_alignment.py.html.</li>
        <li>Raghuvanshi, Arushi, and Vivek Choksi. Facial Expression Recognition with Convolutional Neural Networks. cs231n.stanford.edu/reports/2016/pdfs/023_Report.pdf.</li>
        <li>A. V. Omkar M. Parkhi and A. Zisserman. Deep face recog-
nition. 2015. www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf.</li>
    </ol>
</p>