<a href="https://colab.research.google.com/github/DrAlexSanz/Faces/blob/master/Face_recognition.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome to the first assignment of week 4! Here you will build a face recognition system. Many of the ideas presented here are from FaceNet. In lecture, we also talked about DeepFace.

**Face recognition** problems commonly fall into two categories:

**Face Verification** - "is this the claimed person?". For example, at some airports, you can pass through customs by letting a system scan your passport and then verifying that you (the person carrying the passport) are the correct person. A mobile phone that unlocks using your face is also using face verification. This is a 1:1 matching problem.
Face Recognition - "who is this person?". For example, the video lecture showed a face recognition ([video](https://www.youtube.com/watch?v=wr4rx0Spihs)) of Baidu employees entering the office without needing to otherwise identify themselves. This is a 1:K matching problem.
FaceNet learns a neural network that encodes a face image into a vector of 128 numbers. By comparing two such vectors, you can then determine if two pictures are of the same person.

In this assignment, you will:

Implement the triplet loss function
Use a pretrained model to map face images into 128-dimensional encodings
Use these encodings to perform face verification and face recognition
In this exercise, we will be using a pre-trained model which represents ConvNet activations using a "channels first" convention, as opposed to the "channels last" convention used in lecture and previous programming assignments. In other words, a batch of images will be of shape $(m, n_C, n_H, n_W)$ instead of $(m, n_H, n_W, n_C)$. Both of these conventions have a reasonable amount of traction among open-source implementations; there isn't a uniform standard yet within the deep learning community.

Let's load the required packages.

In [12]:
import numpy as np
import keras
import tensorflow as tf

import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = [18, 12]
import h5py

from PIL import Image

from keras import layers, optimizers
from keras.layers import Input, Dense, Conv2D, Activation, ZeroPadding2D, BatchNormalization, Flatten, Add
from keras.layers import AveragePooling2D, MaxPooling2D, Dropout, GlobalMaxPooling2D, GlobalAveragePooling2D

from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
K.set_image_data_format('channels_first')

from keras.models import Model, Sequential

from keras.preprocessing import image

from keras.utils import layer_utils, plot_model, to_categorical

from keras.callbacks import History, ModelCheckpoint


%matplotlib inline

print("Everything imported correctly")

Everything imported correctly


In [13]:
!rm -rf Faces

!git clone https://github.com/DrAlexSanz/Faces.git

Cloning into 'Faces'...
remote: Enumerating objects: 47, done.[K
remote: Counting objects:   2% (1/47)[Kremote: Counting objects:   4% (2/47)[Kremote: Counting objects:   6% (3/47)[Kremote: Counting objects:   8% (4/47)[Kremote: Counting objects:  10% (5/47)[Kremote: Counting objects:  12% (6/47)[Kremote: Counting objects:  14% (7/47)[Kremote: Counting objects:  17% (8/47)[Kremote: Counting objects:  19% (9/47)[Kremote: Counting objects:  21% (10/47)[Kremote: Counting objects:  23% (11/47)[Kremote: Counting objects:  25% (12/47)[Kremote: Counting objects:  27% (13/47)[Kremote: Counting objects:  29% (14/47)[Kremote: Counting objects:  31% (15/47)[Kremote: Counting objects:  34% (16/47)[Kremote: Counting objects:  36% (17/47)[Kremote: Counting objects:  38% (18/47)[Kremote: Counting objects:  40% (19/47)[Kremote: Counting objects:  42% (20/47)[Kremote: Counting objects:  44% (21/47)[Kremote: Counting objects:  46% (22/47)[Kremote: Counting ob

In [14]:
%cd "/content/Faces"

/content/Faces


In [0]:
from fr_utils import *
from inception_blocks_v2 import *

In Face Verification, you're given two images and you have to tell if they are of the same person. The simplest way to do this is to compare the two images pixel-by-pixel. If the distance between the raw images are less than a chosen threshold, it may be the same person!
Of course, this algorithm performs really poorly, since the pixel values change dramatically due to variations in lighting, orientation of the person's face, even minor changes in head position, and so on.

You'll see that rather than using the raw image, you can learn an encoding $f(img)$ so that element-wise comparisons of this encoding gives more accurate judgements as to whether two pictures are of the same person.

#1 - Encoding face images into a 128-dimensional vector


##1.1 - Using an ConvNet to compute encodings
The FaceNet model takes a lot of data and a long time to train. So following common practice in applied deep learning settings, let's just load weights that someone else has already trained. The network architecture follows the Inception model from Szegedy et al.. We have provided an inception network implementation. You can look in the file inception_blocks.py to see how it is implemented.

The key things you need to know are:

This network uses 96x96 dimensional RGB images as its input. Specifically, inputs a face image (or batch of $m$ face images) as a tensor of shape $(m, n_C, n_H, n_W) = (m, 3, 96, 96)$
It outputs a matrix of shape $(m, 128)$ that encodes each input face image into a 128-dimensional vector
Run the cell below to create the model for face images.

In [16]:
FRmodel = faceRecoModel(input_shape=(3, 96, 96))




Let's see how many parameters I have now.

In [17]:
print("Total Params:", FRmodel.count_params())

Total Params: 3743280


Not bad. So this means I have pictures and I learned an inception network, which will produce a 128 dimensional vector for each picture I have. Pictures from the same person in different situations should be closer than a reasonable threshold, and pics of different people will have a greater distance.

To explain it, I pass 2 pictures through the network. Then I will compare the two outputs (distance, substraction, or whatever). And with this I decide if they are the same person or not.

So if I have a picture A in my database. Then I get one of the same person, A', and another from a different person, B. My encoding will minimize the distance between A and A' and maximize the distance between A and B. The distance in 128 dimensions, careful with this. A is usually called anchor.