# Identifying the faces of Nogizaka 46 with CNN
<p>Student ID: 71745242<br>
Name: Shozen Dan <br>
Class: Heuristic Computing <br>
Instructor: Takefuji Yoshiyasu</p>

## 1. Objective
<p>The other day, when I was surfing the internet for interesting machine learning ideas, I discovered a blog on Aidemy about facial identification[1]. The author is a fan of the popular idol group Nogizaka 46 and wanted to create a program that would identify the faces of his favorite 5 memebers. The steps he took are as follows. He obtained 100 images for each of the target members using Google's custom search API. Next he used the image processing library OpenCV to find the faces in each picture and crop them out. At this point there were only about 70 images remaining for each member, thus he augmented the data and increased the data size 8 times. Finally, he created a network using Keras and Tesorflow and achieved an accuracy of about 70 percent. I found another blog[2] trying to solve the same problem and the author achieved an accuracy of 75~80 percent. Although the problem its self is rather frivolous and have no practical applications, the steps involved in solving it can be applied widely. Thus the objective of my project is as follows</p>
<ol>
    <li>Learn the methods involved in dealing with small datasets for deep learning</li>
    <li>Achieve an accuracy of more than 80 percent</li>
</ol>

## 2. Obtaining Data

### Google Custom Search
<p>Since the original authors did not provide any database, the first thing we need to do is to scrape the images from the internet. For this we are going to use Google Custom Search and its JSON API.</p>
<p>__Google Custom Search__ enables you to create a search engine for your website, your blog, or a collection of websites. You can configure your engine to search both web pages and images. You can fine-tune the ranking, add your own promotions and customize the look and feel of the search results. You can monetize the search by connecting your engine to your Google AdSense account[3].</p> 
<p>__Custom Search JSON API__ lets you develop websites and applications to retrieve and display search results from Google Custom Search programmatically. With this API, you can use RESTful requests to get either web search or image search results in JSON format[4].</p>

<p>The API will return 10 image urls per query, and the free edition will only accept 100 queries per day. Also the number of images requested in one search cannot exceed 100, else the API will return a error. For more details on how to use the API, please visit go to the link in the citations[5]. The source code used to obtain the images used in this project can be found in the home directory under GetImage.py. Below are some examples of the obtained images.</p>

<table>
    <tr>
        <td><img src="Images/橋本奈々未/橋本奈々未.3.jpg" alt="橋本奈々未.3.jpg" style="width: 250px;"/></td>
        <td><img src="Images/生田絵梨花/生田絵梨花.1.jpg" alt="生田絵梨花.1.jpg" style="width: 250px;"/></td>
        <td><img src="Images/白石麻衣/白石麻衣.0.jpg" alt="白石麻衣.0.jpg" style="width: 250px;"/></td>
        <td><img src="Images/西野七瀬/西野七瀬.5.jpg" alt="西野七瀬.5.jpg" style="width: 250px;"/></td>
        <td><img src="Images/齋藤飛鳥/齋藤飛鳥.6.jpg" alt="齋藤飛鳥.6.jpg" style="width: 250px;"/></td>
    </tr>
</table>

## 3. Data Pre-processing
<p>Since we cannot and should not feed raw data directly into our network, the next step in the process is pre-processing the obtained images.
</p>
<ol>
    <li>First we will use a face recognition algorithm to find the faces in each picture. Then we will crop and resize them to a format that the neural network can accept.</li>
    <li>The second step will be to divide the data into training, and testing groups.</li>
    <li>Because we are dealing with such a small amount of data, the third step will be data augmentation.</li>
    <li>Finally, since we are going to use tensorflow and Keras, we will convert the images in to tensors(A form of data that tensorflow recognizes as input).</li>
<ol/>

### Recoginize,  Crop, and Resize with OpenCV
<p>For the recoginition, croping, and resizing process we will use the image processing library OpenCV. The OpenCV library comes with a method for face recognition called Haar Cascades. More details on Haar Cascades and its implemention can be found in the link in the citations[6]. Once the faces have been found they are cropped and resized to a dimention of 64 x 64. The code for this process is based on the code written by the original author[1] and can be found in the home directory under Preprocessing.py. Below are some examples of the cropped images.</p>

<table>
    <tr>
        <td><img src="Cropped/橋本奈々未/橋本奈々未.2.jpg" alt="橋本奈々未.2.jpg" style="width: 250px;"/></td>
        <td><img src="Cropped/生田絵梨花/生田絵梨花.1.jpg" alt="生田絵梨花.1.jpg" style="width: 250px;"/></td>
        <td><img src="Cropped/白石麻衣/白石麻衣.2.jpg" alt="白石麻衣.2.jpg" style="width: 250px;"/></td>
        <td><img src="Cropped/西野七瀬/西野七瀬.5.jpg" alt="西野七瀬.5.jpg" style="width: 250px;"/></td>
        <td><img src="Cropped/齋藤飛鳥/齋藤飛鳥.6.jpg" alt="齋藤飛鳥.6.jpg" style="width: 250px;"/></td>
    </tr>
</table>

### Dividing the Dataset into Training and Test Subsets
<p>We will be using the Deep Learning framework Keras with Tensorflow as backend for this project. It is common practice in deep learning to divide the data in to two subsets: training and testing. We will first create a root directory called "Input_Data". In this directory we will create the training and testing directories. And finally within each of those, we will have a total of five directories: one for each memeber. Of the total data, 70% was used for training and 30% was used for testing. The code for this process can be found in the home directory under Preprocessing.py.</p>

### Data Augmentation

<p>After filtering out the defected images by hand, we only have 35~50 images for training and about 20 for testing. While we can train a model using the data at hand, due to the fact that dataset is simply too small, the model will fall into overfitting after a handful of epochs. Overfitting is caused by having too few samples to learn from, rendering us unable to train a model able to generalize to new data. In order to fix this problem, we are going to use a method called data augmentation. Data augmentation takes the approach of generating more training data from existing training samples, by "augmenting" the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, our model would never see the exact same picture twice. This helps the model get exposed to more aspects of the data and generalize better[7].</p>
<p>The images were augmented in 3 ways:
<ol>
    <li>Rotated at an angle of 10 to -10 degrees</li>
    <li>A Threshold Filter was applied</li>
    <li>A Gaussian Blur Filter with a kernel of (5, 5) was applied</li>
</ol>
</p>
<p>
This increases the amount of training data to 9 times its original size. Note that we are only augmenting the training data. The augmentation code was base on the following blog[2]. The testing data should not be alterned in any circumstances. The code for this process can be found in the home directory under Preprocessing.py. Below is an example of augmented pictures.
</p>

<table>
    <tr>
        <td><img src="Input_Data/train/nanami/0_0.jpg" alt="0_0.jpg" style="width: 250px;"/></td>
        <td><img src="Input_Data/train/nanami/0_-10.jpg" alt="0_-10.jpg" style="width: 250px;"/></td>
        <td><img src="Input_Data/train/nanami/0_10.jpg" alt="0_10.jpg" style="width: 250px;"/></td>
        <td><img src="Input_Data/train/nanami/0_0thr.jpg" alt="0_0thr.jpg" style="width: 250px;"/></td>
        <td><img src="Input_Data/train/nanami/0_0filter.jpg" alt="0_0filter.jpg" style="width: 250px;"/></td>
    </tr>
</table>

### Converting the Data into Tensors
<p>Currently all the data are in jpeg format and the network does not support it as input. The data needs to be formatted into floating point tensors before being fed into our network. The required steps are as follows.
<ol>
    <li>Read the image files.
    <li>Decode the JPEG content to RBG grids of pixels.
    <li>Convert these into floating point tensors.
    <li>Rescale the pixel values (between 0 and 255) to the [0, 1] interval (neural networks prefer to deal with small input values).
</ol>
<p>The code is base on the following blog[2] and can be found in the home directory under Basic_CNN.py</p>

## 4. Testing the Original CNN Model

In [1]:
from keras import layers, optimizers, models

Using TensorFlow backend.


### Architecture
<p>This is the model used by the author of the original article. It consists of 4 sets of convolution and pooling layers and two dense layers. Usually, relu is used as the activation function for the dense layers, however according to the original artical, sigmoid was the better choice in this case (perhaps sigmoid is better for shallow networks). As can be seen in the summary below, this network has a input shape of (None, 64, 64, 3) and a output of (None, 5) with a total of 583,269 trainable parameters.</p>

In [2]:
model = models.Sequential()
model.add(layers.Conv2D(32, (2, 2), input_shape=(64,64,3), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32, (2, 2), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32, (2, 2), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128, (2, 2), strides=(1,1), padding='same'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='sigmoid'))
model.add(layers.Dense(128, activation='sigmoid'))
model.add(layers.Dense(5, activation='softmax'))

In [3]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_1 (Conv2D)            (None, 64, 64, 32)        416       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 32)        4128      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 16, 16, 32)        0         
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 16, 16, 32)        4128      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 32)          0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 8, 8, 128)         16512     
__________

### Compilation and Training

<p>The model is compiled with:
    <ul>
        <li>__Loss Function__: Categorical Crossentropy</li>
        <li>__Optimizer__: Stocastic Gradient Decent</li>
        <li>__Metrics__: Accuracy</li>
    </ul>
</p>
<p>The model is trained with:
    <ul>
        <li>__Batch Size__: 32</li>
        <li>__Epochs__: 100</li>
        <li>__Training Data__: 2270</li>
        <li>__Testing Data__: 102</li>
    </ul>
</p>
<p>The code for this section and the previous section can be found in the home directory under BasicCNN.py</p>

### Results and Consideration
<table>
    <tr>
        <td><img src="BasicCNN_acc.png" alt="BasicCNN_acc.png" style="width: 500px;"/></td>
        <td><img src="BasicCNN_loss.png" alt="BasicCNN_loss.png" style="width: 500px;"/></td>
    </tr>
</table>

<p>The two graphs above show the result of original network. After testing the network several times, I discovered that the maximum accuracy for the test data was 80 with an average of 75. This result was expected since nothing was changed from the original artical.</p>
<p>The model has a unusual curve, showing no significant improvements during the first 20 epochs then rising dramatically after 20 epochs. The model then falls into overfitting after 40 epochs and the validation accuracy refuses to rise above 80 percent while the training accuracy shows signs of continueing grow even after 100 epochs. The same story can be seen in the second graph showing the training and validation loss. It is easier to see in this graph, where this model starts to overfit.</p>
<p>Althoug we were relatively successful in reconstructing the experiment from the original article[1], this certainly is not a very good model. The most likely reason for the heavy overfitting is the amount and quality of data. Since we only have 300~500 images per person, not to mention that most are augmented data, and 20 or less to validate on, we cannot expect good results without improvements to either the amount or the quality of data.</p>
<p>To increase the accuracy, the model architechture needs to be updated as well. The original network is very simple and not optimized for face recognition. By using a model that is more suited to the task we can hope for a rise in accuracy.</p>
<p>After conducting tests with slightly deeper models, introducing normalization methods such as Dropout, L1, and L2 to reduce overfitting, but still achieving the same results, I reached the conclusion that we simply cannot train a good model from scratch with the data at hand. After all normalization will only reduce the rate of overfitting and cannot heighten the maximum accuracy rate.</p>

## 5. Improving the Dataset

<p>The three steps commonly involved in face recognition is as follows[8]:</p>
<ol>
    <li>Face Detection</li>
    <li>Face Alignment</li>
    <li>Face Recognition</li>
</ol>
<p>In the previous experiment with the original model, we skipped the second step. Therefore before moving on with optimizing the network we will begin with properly aligning the images we have. Here I will not go into depths about face alignment algorithms but rather utilize the model and algorithm implemented by dlib. The great thing about dlib is that it accomplishes the task of detection, alignment, and cropping all at once, reducing the amount of code we have to write. The original code can be found at dlib.net with the link in the citation[9]. The code for this step can be found in the home directory under FaceAligner.py. Below are a example of a original picture and a aligned picture.</p>
<table>
    <tr>
        <td><img src="Images/橋本奈々未/橋本奈々未.3.jpg" alt="橋本奈々未.3.jpg" style="width: 300px;"/></td>
        <td><img src="align_test.jpg" alt="align_test.jpg" style="width: 300px;"/></td>
    </tr>
</table>
<p>As we can see the aligner finds the face in the image, rotates it so that the eyes are level and crops out the face as a 150 x 150 pixel image. In the previous experiment, the cropped image had a dimension of 64 x 64. There was no reason stated as to why the images were resized so, but by making it larger, we hope that the network can extract features easier.</p>
<p>At this point one might notice that we a a dilema. The aligner rotated the image so that the two eyes align. This makes it easier for neural network to extract the features related to the face. However we only have about 50 images per person. We could augment the data using the same algorithm we used previously but since that involves rotating the image, it would render useless that work that the aligner has done for us. One solution will be to test both approches and see which will return a better result. Another will be to take the middle path and change the alignment algorythm so that it will not rotate the pictures. We will test all three approches to see which returns the best results. The code for the preprocessing process can be found in the home directory under PreprocessingNoChange.py, PreprocessingRotate.py, PreprocessingNoRotate.py</p>

## 6 Original Model

### Using Pretrained Networks

<p>In the previous experiment, I reached the conclusion that it is very difficult to increase the accuracy when learning from scratch with the current amount of data. A common and highly effective approach to deep learning on small image datasets is to leverage a pretrained network. A pretrained network is simply a saved network previously trained on a large dataset, typically on a large-scale image classification task. If the original dataset is large enough and general enough, the features of the network can act as an effective model of the visual world, therefore its feature can prove useful for many different computer vision problems, even though these new problems might involve completely different classes from the original task. For instance, one might train a network on ImageNet (where classes are mostly animals and everyday objects) and then reuse this network for identifying furniture items in images. Such portability of learned features across different problems is a key advantage of deep learning comapared to many older shallow learning approaches, and it makes deep learning very effective for small-data problems.</p>

<p>For this problem we will use a convnet called VGGFace. It is a network with an architecture called VGG19. VGG19 is developed by Karen Simonyan and Andrew Zisserman at the Visual Geometry Group at Oxford University. It consists of 19 convolution layers and thus the name VGG16. This network is simple, widely used, and easy to understand. VGG19 was originally used for object recognition. It was trained on the ImageNet dataset(1.4 million labeled images and 1000 different classes). VGGFace is the VGG19 convolutional network trained for the task of face detection[11].</p>

### Feature Extraction

<p>There are two ways to leverage a pre-trained network: feature extraction and fine-tuning. We will start with feature extraction. Feature extraction consists of using the representations learned by a previous network to extract interesting features from new samples. These features are then run through a new classifier, which is trained from scratch.</p>

<p> As can be seen in the earlier examples, convnets used for image classification are comprised from two parts: a series of pooling and convolution layers and a densely-connected classifier. The first part is called the "convolutional base". In the case of feature extraction, we simply take the convolutional base of a previously-trained network, run the new data through it, and train a new classifier on top of the output.</p>

<p>The reason we only reuse the convolution base is, that the representations learned by the convolution base are likely to be more generic and therefore more reusable: the feature maps of a convnet are presence maps of generic concepts over a picture, which is likely to be useful regardless of the computer vision problem at hand. On the other hand, the representations learned by the classifier will only contain information specific to the set of classes that the model was trained on. Additionally, representations found in densely-connected layers no longer contain any information about where objects are located in the input image: these layers get rid of the notion of space, whereas the object location is still described by convolutional feature maps. For problems where object location matters, densely-connected features would be largely useless.</p>

In [1]:
from keras.engine import  Model
from keras.layers import Flatten, Dense, Input
from keras_vggface.vggface import VGGFace

Using TensorFlow backend.


In [2]:
conv_base = VGGFace(include_top=False, input_shape=(150, 150, 3))

In [3]:
conv_base.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 150, 150, 64)      1792      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 150, 150, 64)      36928     
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 75, 75, 64)        0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 75, 75, 128)       73856     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 75, 75, 128)       147584    
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 37, 37, 128)       0         
__________

<p>The output above shows a summary of the VGGFace(VGG19) model. The classification layers has been removed from the model upon import therefore we are only looking at the convolution base. It is comprised of 5 blocks of convolution and pooling layers and has a total of 14,714,688 parameters.</p>

<p>However before we can complie and train our model, a very important thing to do is to freeze the convolutional base. "Freezing" a layer of set of layers means preventing their weights from getting updated during training. If we don't do this, then the representations that were previously learned by the convolutional base would get modified during training. Since the Dense layers on top are randomly intialized, very large weight updates would be propagated throught the network, effectively destroying the representations previously learned.</p>

<p>We must note here that the level of generality (and therefore reusability) of the representations extracted by specific convolution layers depends on the depth of the layer in the model. Layers that come earlier in the model extract local. highly generic feature maps (such as visual edges, colors, and textures), while layers higher-up extract more abstract concepts. So if the new dataset differs a lot from the dataset that the original model was trained on, it might be better off using only the first few layers of the model to do feature extraction, rather than using the entire convolutional base.</p>

<p>Since VGGFace was trained for face detection, we can reuse the deeper layers and the classifier as well and hope to achieve relatively good results. However, because our task is face recoginition or idenfication, which is slightly different from the original task, we will create a classifier from scratch.</p>

### Fine Tuning

<p>Fine tuning is another technique for model reuse that is complementary to feature extraction. While feature extraction reuses the weight of the pre-trained model by freezing layers, fine tuning does the opposite and unfreezes the top layer of the model. It is called "fine-tuning" because it slightly adjusts the more abstract representations of the model being reused, in order to make them more relevant for the problem at hand.</p>

<p>As I explained previously, it is necessary to freeze the convolution base in order to be able to train a randomly intialized classifier on top. For the same reason, it is only possible to fine-tune the top layer of the convolution base once the classifier on top has already been trained. Thus the steps for fine-tuning a network are as follow:
<ol>
    <li>Add your custom network on top of an already trained base network.
    <li>Freeze the base network.
    <li>Train the part you added.
    <li>Unfreeze some layers in the base network.
    <li>Jointly train both these layers and the part you added. 
</ol>
</p>

In [5]:
#custom parameters
NB_CLASS = 5
HIDDEN_DIM = 512
last_layer = conv_base.get_layer('pool5').output
x = Flatten(name='flatten')(last_layer)
x = Dense(HIDDEN_DIM, activation='relu', name='fc6')(x)
x = Dense(HIDDEN_DIM, activation='relu', name='fc7')(x)
out = Dense(NB_CLASS, activation='softmax', name='fc8')(x)
model = Model(conv_base.input, out)

In [6]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 150, 150, 64)      1792      
_________________________________________________________________
conv1_2 (Conv2D)             (None, 150, 150, 64)      36928     
_________________________________________________________________
pool1 (MaxPooling2D)         (None, 75, 75, 64)        0         
_________________________________________________________________
conv2_1 (Conv2D)             (None, 75, 75, 128)       73856     
_________________________________________________________________
conv2_2 (Conv2D)             (None, 75, 75, 128)       147584    
_________________________________________________________________
pool2 (MaxPooling2D)         (None, 37, 37, 128)       0         
__________

<p>The model is compiled and trained with:
    <ul>
        <li>__Loss Function__: Categorical Crossentropy</li>
        <li>__Optimizer__: Stocastic Gradient Decent</li>
        <li>__Metrics__: Accuracy</li>
        <li>__Batch Size__: 29</li>
        <li>__Epochs__: 50</li>
    </ul>
</p>
<p>The model is fine tuned with:
        <ul>
        <li>__Loss Function__: Categorical Crossentropy</li>
        <li>__Optimizer__: Stocastic Gradient Decent</li>
        <li>__Metrics__: Accuracy</li>
        <li>__Batch Size__: 29</li>
        <li>__Epochs__: 50</li>
    </ul>
</p>
<p>The code for this section can be found in the home directory under VGGFace.py. It is highly recommended that a GPU machine is used for this task.</p>

### Results
<p>The tables below shows the training and validation results of leveraging the pretrained VGGFace network. The first table shows the feature extraction stage and the second table shows the result of the fine tuning stage. The first row show the training results for no data augmentation, while the second row shows the results for augmentation without rotation and the third, augmentation with rotation.</p>

#### Feature Extraction
<table>
    <tr>
        <th>No Augmentation</th>
        <th>Augmentation without Rotation</th>
        <th>Augmentation with Rotation</th>
    </tr>
    <tr>
        <td><img src="pret_acc_nc.png" alt="pret_acc_nc.png" style="width: 300px;"/></td>
        <td><img src="pret_acc.png" alt="pret_acc.png" style="width: 300px;"/></td>
        <td><img src="pret_acc_r.png" alt="pret_acc_r.png" style="width: 300px;"/></td>
    </tr>
        <tr>
        <td><img src="pret_loss_nc.png" alt="pret_loss_nc.png" style="width: 300px;"/></td>
        <td><img src="pret_loss.png" alt="pret_loss.png" style="width: 300px;"/></td>
        <td><img src="pret_loss_r.png" alt="pret_loss_r.png" style="width: 300px;"/></td>
    </tr>
</table>

#### Fine Tuning
<table>
    <tr>
        <th>No Augmentation</th>
        <th>Augmentation without Rotation</th>
        <th>Augmentation with Rotation</th>
    </tr>
    <tr>
        <td><img src="pret_acc_ft_nc.png" alt="pret_acc_ft_nc.png" style="width: 300px;"/></td>
        <td><img src="pret_acc_ft.png" alt="pret_acc_ft.png" style="width: 300px;"/></td>
        <td><img src="pret_acc_ft_r.png" alt="pret_acc_ft_r.png" style="width: 300px;"/></td>
    </tr>
        <tr>
        <td><img src="pret_loss_ft_nc.png" alt="pret_loss_ft_nc.png" style="width: 300px;"/></td>
        <td><img src="pret_loss_ft.png" alt="pret_loss_ft.png" style="width: 300px;"/></td>
        <td><img src="pret_loss_ft_r.png" alt="pret_loss_ft_r.png" style="width: 300px;"/></td>
    </tr>
</table>

### Consideration
<p>Beginning with the feature extraction stage, it seems that training with an augmented dataset helps speed up the learning process. The validation accuracy for both the rotated and non-rotated dataset reaches 90% before 10 epochs while the non-augmented dataset peaks at about 87%. However training with augmented data also increases the likelyhood of overfitting. When we look at the validation loss, we can see that the models using augmented data falling into overfitting after about 10 epochs.</p>
<p>In the fine tuning stage, all models exceeds a validation accuracy of 90%. However looking at the training and validation loss graphs, we can see that the models fall into overfitting after only one or two epochs while the accuracy remains beteween 87 ~ 94 percent. It seems that fine-tuning is not making siginificant improvements to the the model. The highest validation accuracy was achieve with training using non-augmented data with a number of 95%.</p>
<p>All in all, it seems that augmenting the data will not increase the accuracy in this case. Thus, since it uses the least data, we can say that the non-augmenting method is the best choice for this particular problem. This is very interesting due to the fact that data augmentation is a common practice when training a CNN. Perhaps it is not such an effective method when it comes to face recognition and not object detection.</p>

## 7. Conclusion
<p>The goals of the this project was as follows:</p>
<ol>
    <li>Learn the methods involved in dealing with small datasets for deep learning</li>
    <li>Achieve an accuracy of more than 80 percent</li>
</ol>
<p>The problem we were trying to solve here is a rather frivolous one. Identifying the faces of Nogizaka46 members with CNN has no practical use in the real world(at least at the current time). Also, when it comes to face identification, large firms such as Google and Facebook can achive higher accuracies with their complicated networks trained on vast amounts of data. However we were able to learn and utilize my deep learning methods during the process and with them, create a model that largely exceeds the benchmark.</p>

## 8. Citations
<p>
    <ol>
        <li>“機械学習で乃木坂46を顏分類してみた.” Aidemy Blog, 株式会社アイデミー, 4 Dec. 2018, blog.aidemy.net/entry/2017/12/17/214715.</li>
        <li>nirs_kd56. “乃木坂メンバーの顔をCNNで分類.” Qiita, 26 Sept. 2018, qiita.com/nirs_kd56/items/bc78bf2c3164a6da1ded.</li>
        <li>Google Customs Search Website. developers.google.com/custom-search/</li>
        <li>Google Custom Search JSON API. developers.google.com/custom-search/v1/overview</li>
        <li>@onlyzs. "Google Custom Search APIを使って画像収集". Qiita, 24 Nov. 2016, qiita.com/onlyzs/items/c56fb76ce43e45c12339</li>
        <li>“Face Detection Using Haar Cascades.” Cascade Classification - OpenCV 2.4.13.7 Documentation, docs.opencv.org/3.4/d7/d8b/tutorial_py_face_detection.html.</li>
        <li>Fchollet. “Fchollet/Deep-Learning-with-Python-Notebooks.” GitHub, github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.2-using-convnets-with-small-datasets.ipynb.</li>
        <li>Lian, Qianli. A Summary of Deep Models for Face Recognition. cs.wellesley.edu/~vision/slides/Qianli_summary_deep_face_models.pdf.</li>
        <li>“Face Alignment - Dlib.” Dlib C++ Library, dlib.net/face_alignment.py.html.</li>
        <li>Raghuvanshi, Arushi, and Vivek Choksi. Facial Expression Recognition with Convolutional Neural Networks. cs231n.stanford.edu/reports/2016/pdfs/023_Report.pdf.</li>
        <li>A. V. Omkar M. Parkhi and A. Zisserman. Deep face recog-
nition. 2015. www.robots.ox.ac.uk/~vgg/publications/2015/Parkhi15/parkhi15.pdf.</li>
        <li>Fchollet. “Fchollet/Deep-Learning-with-Python-Notebooks.” GitHub, github.com/fchollet/deep-learning-with-python-notebooks/blob/master/5.3-using-a-pretrained-convnet.ipynb.</li>
    </ol>
</p>