# Face Recognition

## 1、问题描述

人脸识别问题其实主要分为两类：
* **人脸验证** 通过一对一比对预测是否属于同一人脸
* **人脸识别** 通过一对多的样本与数据库比较预测人脸所属身份

如果有一些图片处理的神经网络的知识的话就会知道，神经网络不同深度神经元计算图片的不同层次的特征，所以我们可以通过对比网络提取的人脸图片的高级特征的相似度来完成人脸验证或者是人脸识别任务。

## 2、Inception Network(V1)

这里我们使用Inception Network作为图片特征提取网络，所以这里介绍Inception Network(V1)的基本内容(参考：[Inception V1](https://arxiv.org/pdf/1503.03832.pdf))。

### a、写在前面

* 通过“Network in Network”方式扩展网络规模
* 通过“$1\times1$ Conv”网络进行维度缩减减小计算成本

### b、动机

提升神经网络的基本方法就是增加神经网络的深度和每层神经元的规模，但随之而来就存在两个问题：一是更大规模的神经网络势必会训练更大量的参数，这对监督学习条件下的任务的训练集的覆盖范围提出了更高的要求，否则会导致严重的过拟合；另一个问题就是大规模神经网络的计算成本会急剧增加，同时存在其他一系列问题。

Inception Network主要优化这两个问题来获得更大规模的网络而同时计算代价可控。

### c、详述

Inception Network的基本思想来自于[Provable Bounds for Learning Some Deep
Representations](https://arxiv.org/pdf/1310.6343.pdf)：如果某分布的数据可以用足够规模并且稀疏的深层神经网络表示，那么我们可以根据神经网络每层的神经元与最后一层输出的关联程序来构建最优的的网络拓扑，类似于“Hebbian Principle”。

但是非一致的稀疏网络结构更加复杂同时计算成本极高，主要的问题是后者。

#### ** $1\times1$ Conv **

    Inception 使用“1x1 Conv”的灵感来自一篇名为“On Two-Dimensional Sparse Matrix Partitioning: Models, Methods, and a Recipe”的文章，其中给出的结论是：对于稀疏矩阵相乘我们可以通过将稀疏矩阵“集中到”相关的稠密矩阵中后再计算来获得计算性能上的提升。
    
    所以Inception Network就是在使用稀疏网络从而可以得到更大型网络的同时通过“1x1 Conv”来削减计算量获得了性能上的提升。

### d、模型

* Naive Inception Module

  Inception固定使用“1 x 1”,"3 x 3","5 x 5"的Convolution来实现不同层次特征的连接 ，在浅层网络主要体现低维特征可以用“1 x 1”来覆盖，在深层主要体现高维特征可以用“5 x 5”来覆盖，当然这些形状的固定只是为了方便，也可以使用其他不同形状替换。但是这样的稀疏网络计算效率较低，所以不能满足计算成本要求。

<img src="images/NaiveInceptionModule.png" style="width:380px;height:150px;">

* Dimension Reduction Inception Module
  
  为了保持网络大部分连接为稀疏连接的同时降低计算开销这里使用“1 x 1 Conv”进行降维，即在"3 x 3","5 x 5"等之前使用“1 x 1 Conv”然后进行其他运算。
  
<img src="images/DimReductionInceptionModule.png" style="width:380px;height:150px;">

* GoogLeNet

<img src="images/GoogLeNet.jpeg" style="width:380px;height:240px;">

## 3、Triplet Loss

现有一张图片$x$(来自deeplearning.ai), 我们将其放入网络计算得出的特征向量记作$f(x)$。

<img src="images/f_x.png" style="width:380px;height:150px;">

在训练网络时我们需要三张图片，$(A, P, N)$:  

- A "Anchor image" -- 某人的脸部图片 
- P "Positive image" -- 与"Anchor image"同属一人的脸部图片
- N "Negative image" -- 与"Anchor image"不属一人的脸部图片

如下所示我们希望$A$与$P$之间的差异和$A$与$N$之间的差异至少存在
$\alpha$大小的缓冲:

$$\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2 + \alpha < \mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2$$

所以我们训练网络的目的就是最小化下面的"triplet loss":

$$\mathcal{J} = \sum^{m}_{i=1} \large[ \small \underbrace{\mid \mid f(A^{(i)}) - f(P^{(i)}) \mid \mid_2^2}_\text{(1)} - \underbrace{\mid \mid f(A^{(i)}) - f(N^{(i)}) \mid \mid_2^2}_\text{(2)} + \alpha \large ] \small_+$$

"$[m]_+$" 表示 $max(m,0)$。

## 4、识别

识别过程为：
- 将待识别图片放入网络进行特征提取
- 计算待识别图片特征向量与数据库中某人面部图像特征向量距离
- 使用某“threshold”来裁定待识别图片是否匹配数据库中某人图片

In [11]:
from keras.models import Sequential
from keras.layers import Conv2D, ZeroPadding2D, Activation, Input, concatenate
from keras.models import Model
from keras.layers.normalization import BatchNormalization
from keras.layers.pooling import MaxPooling2D, AveragePooling2D
from keras.layers.merge import Concatenate
from keras.layers.core import Lambda, Flatten, Dense
from keras.initializers import glorot_uniform
from keras.engine.topology import Layer
from keras import backend as K
K.set_image_data_format('channels_first')
import os
import numpy as np
from numpy import genfromtxt
import pandas as pd
import tensorflow as tf
from fr_utils import *
from inception_blocks_v2 import *

#进行配置，使用33%的GPU
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.333
session = tf.Session(config=config)

# 设置session
K.set_session(session)

%matplotlib inline
%load_ext autoreload
%autoreload 2

np.set_printoptions(threshold=np.nan)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## 5、实现

### a、模型

这里使用已完成训练模型建立应用(https://github.com/iwantooxxoox/Keras-OpenFace, Deeplearning.ai)

In [12]:
model = faceRecoModel(input_shape = (3, 96, 96))
model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, 3, 96, 96)    0                                            
__________________________________________________________________________________________________
zero_padding2d_24 (ZeroPadding2 (None, 3, 102, 102)  0           input_2[0][0]                    
__________________________________________________________________________________________________
conv1 (Conv2D)                  (None, 64, 48, 48)   9472        zero_padding2d_24[0][0]          
__________________________________________________________________________________________________
bn1 (BatchNormalization)        (None, 64, 48, 48)   256         conv1[0][0]                      
__________________________________________________________________________________________________
activation

### b、Triplet Loss

In [13]:
# def triplet_loss(f_a, f_p, f_n, alpha=0.2):
#     '''
#     Arguments:
#         f_a -- Encoding of anchor image
#         f_p -- Encoding of positive image
#         f_n -- Encoding of negative image
#     Returns:
#         triplet_loss -- loss
#     '''
    
#     dist_ap = tf.reduce_sum(tf.square(tf.subtract(f_a, f_p)), axis=-1)
#     dist_an = tf.reduce_sum(tf.square(tf.subtract(f_a, f_n)), axis=-1)
    
#     dist_margin = dist_ap - dist_an + alpha
    
#     triplet_loss = tf.reduce_sum(tf.maximum(dist_margin, 0))
    
#     return triplet_loss

# keras issues lead to this form
def triplet_loss(y_true, y_pred, alpha=0.2):
    '''
    Arguments:
        y_true -- 
        y_pred -- 
                  f_a -- Encoding of anchor image
                  f_p -- Encoding of positive image
                  f_n -- Encoding of negative image
    Returns:
        triplet_loss -- loss
    '''
    
    f_a, f_p, f_n = y_pred[0], y_pred[1], y_pred[2]
    
    dist_ap = tf.reduce_sum(tf.square(tf.subtract(f_a, f_p)), axis=-1)
    dist_an = tf.reduce_sum(tf.square(tf.subtract(f_a, f_n)), axis=-1)
    
    dist_margin = dist_ap - dist_an + alpha
    
    triplet_loss = tf.reduce_sum(tf.maximum(dist_margin, 0))
    
    return triplet_loss

In [14]:
with tf.Session() as test:
    tf.set_random_seed(1)
    y_true = (None, None, None)
    y_pred = (tf.random_normal([3, 128], mean=6, stddev=0.1, seed = 1),
              tf.random_normal([3, 128], mean=1, stddev=1, seed = 1),
              tf.random_normal([3, 128], mean=3, stddev=4, seed = 1))
    loss = triplet_loss(y_true, y_pred)
    
    print("loss = " + str(loss.eval()))

loss = 528.1427


### c、加载权重参数

In [15]:
model.compile(optimizer = 'adam', loss = triplet_loss, metrics = ['accuracy'])
load_weights_from_FaceNet(model)

### d、建立简单数据库

In [16]:
database = {}
database["danielle"] = img_to_encoding("images/danielle.png", model)
database["younes"] = img_to_encoding("images/younes.jpg", model)
database["tian"] = img_to_encoding("images/tian.jpg", model)
database["andrew"] = img_to_encoding("images/andrew.jpg", model)
database["kian"] = img_to_encoding("images/kian.jpg", model)
database["dan"] = img_to_encoding("images/dan.jpg", model)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", model)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", model)
database["kevin"] = img_to_encoding("images/kevin.jpg", model)
database["felix"] = img_to_encoding("images/felix.jpg", model)
database["benoit"] = img_to_encoding("images/benoit.jpg", model)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", model)

### e、身份认证

In [17]:
def verify(image_path, target_name, database, model):
    """
    Function that verifies if the person on the "image_path" image is "identity".
    
    Arguments:
    image_path -- path to an image
    target_name -- string, name of the person you'd like to verify the identity.
    database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
    model -- your Inception model instance in Keras
    
    Returns:
    dist -- distance between the image_path and the image of "target" in the database.
    match -- True or False.
    """
    
    f_t = img_to_encoding(image_path, model)
    
    dist = np.linalg.norm(f_t - database[target_name])
    
    # Step 3: Open the door if dist < 0.7, else don't open (≈ 3 lines)
    if 0.7 > dist:
        print("It's " + str(target_name) + ", welcome home!")
        match = True
    else:
        print("It's not " + str(target_name) + ", please go away")
        match = False
        
    ### END CODE HERE ###
        
    return dist, match

In [18]:
verify("images/camera_0.jpg", "younes", database, model)

It's younes, welcome home!


(0.6710067, True)

### f、身份识别

In [19]:
def recognition(image_path, database, model):
    """
    Function that recognize someone.
    
    Arguments:
        image_path -- path to an image
        database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
        model -- your Inception model instance in Keras
    
    Returns:
        dist -- distance between the image_path and the image of "target" in the database.
        identity -- name matched.
    """
    
    f_t = img_to_encoding(image_path, model)
    
    min_dist = 100000
    identity = ''
    
    for (name, encoding) in database.items():
        
        dist = np.linalg.norm(f_t - encoding)
        if min_dist > dist:
            identity = name
            min_dist = dist
    
    if (0.7 < min_dist):
        print("No Match")
    else:
        print ("it's " + str(identity) + ", the distance is " + str(min_dist))
        
    return min_dist, identity

In [20]:
recognition("images/camera_0.jpg", database, model)

it's younes, the distance is 0.6710067


(0.6710067, 'younes')