# Convolutional Neural Networks
## Face Recognition

Author: Binghen Wang

Last Updated: 21 Dec, 2022

<nav>
    <b>Deep learning navigation:</b> <a href="./Deep Learning Basics.ipynb">Deep Learning Basics</a> |
    <a href="./Deep Learning Optimization.ipynb">Optimization</a>
    <br>
    <b>CNN navigation:</b> <a href="./Convolutional Neural Networks.ipynb">CNN Basics</a> |
    <a href="./Object Detection.ipynb">Object Detection</a>
</nav>

---
<nav>
    <a href="../Machine%20Learning.ipynb">Machine Learning</a> |
    <a href="../Supervised Learning/Supervised%20Learning.ipynb">Supervised Learning</a>
</nav>

---

## Content
- [Face Verification vs Face Recognition](#FVvsFR)
- [Siamese Neural Network](#SNN)
- [Triplet Loss](#TL)
- [Face Verification as Binary Classification](#FVasBC)

<a name = "FVvsFR"></a>
## Face Verification vs Face Recognition

### Face Verification
**Training**: pre-train the system using images of different individuals (each person in the training set should ideally have at least two images).<br>
**Database/Storage**: images of $K$ people and their IDs<br>
**Input**: image, name/ID<br>
**Output**: whether image matches with the claimed person

### Face Recognition
**Training**: pre-train the system using images of different individuals (each person in the training set should ideally have at least two images).<br>
**Database/Storage**: images of $K$ people and their IDs<br>
**Input**: image<br>
**Output**: ID if the image is a person in the database; 'not recognized' otherwise

### One-Shot Learning
Both face verification and face recognition are problems of **one-shot learning**–learning from one example to recognize the person again. Traditional network structures that use a softmax layer do not work well in this case, and they do not address the issue of staff turnover. A better approach is to learn a **similarity function** that outputs the degree of difference between images.

$$
\begin{align}
\mathrm{d}(\mathrm{img1}, \mathrm{img2}) \leq \tau &\implies \text{same person} \\
\mathrm{d}(\mathrm{img1}, \mathrm{img2}) > \tau &\implies \text{different people}
\end{align}
$$

<a name = "SNN"></a>
## Siamese Neural Network
<blockquote>
    A <b>Siamese neural network</b> (sometimes called a twin neural network) is an artificial neural network that uses the same weights while working in tandem on two different input vectors to compute comparable output vectors.
    <div style = "text-align: right">– Siamese neural network, <a href = "https://en.wikipedia.org/wiki/Siamese_neural_network">Wikipedia</a></div>
</blockquote>
    
<div style = "text-align: center;">
    <img src="./images/Siamese Neural Network.png" style="width:70%;" >
</div>

The idea is to take a usual CNN structure and remove the output layer (e.g. softmax). The last flattened layer outputs an **encoding** $f(x^{(i)})$ of an input image $x^{(i)}$. In the above, it is a vector of length 128. Encodings of two inputs are then passed into a **similarity function** to get an estimate of the difference. This difference is then used in the **loss** calculation, the results of which provides information for backpropagation and updating of the network. 

Here, the similarity function is defined as the squared **Frobenius norm** ($L_2$ norm).

$$
d(x^{(1)}, x^{(2)}) = \left\Vert f(x^{(1)} - f(x^{(2)})\right\Vert^2_2
$$

$d(x^{(1)}, x^{(2)})$ should be small if $x^{(1)}$ and $x^{(2)}$ represent the same person and large otherwise. For the above illustration, we expect it to be large.

<a name = "TL"></a>
## Triplet Loss

To learn the parameters for the above network, we need to specify a suitable loss function. One commonly used candidate is the **triplet loss** function.

<div style = "text-align: center;">
    <img src="./images/triplet loss.png" style="width:90%;" >
</div>

Rearrange the inequality and we get:
$$
\left\Vert f(A) - f(P) \right\Vert^2 - \left\Vert f(A) - f(N) \right\Vert^2 + \alpha \leq 0
$$

The triplet loss is defined as:
$$
L(A,P,N) = \max\left(\left\Vert f(A) - f(P) \right\Vert^2 - \left\Vert f(A) - f(N) \right\Vert^2 + \alpha, 0\right)
$$

The cost function for a training set of $m$ triplets $(A,P,N)$ is:
$$
J = \sum_{i=1}^m L(A^{(i)},P^{(i)},N^{(i)})
$$

<div class = "alert alert-block alert-success"><b>Note:</b> For the training of a face recognition algorithm, we need <b>multiple images</b> (at least 2) for the same person. This is important in the formation of training triplets. For the application of the algorithm, we can keep a <b>single image</b> for a person in the database. <b>Mind the distinction between training and application of a face recognition algorithm.</b></div>

<div class = "alert alert-block alert-warning"><b>Training tip:</b> To make the training more effective, when forming the training set, we want to <b>select triplets that are difficult to train on</b>. (E.g. select negatives that look similar to the anchors, select positives from different angles, select anchors under different settings.)</div>
<div class = "alert alert-block alert-warning"><b>Application tip:</b> To save time on inference, images can be <b>encoded ahead of time</b> with their <b>encodings stored in the database</b>. Therefore, when a new image is given, the encoding process is conducted only once. The encoding of the new image can then be compared with existing encodings in the database for verification/recognition. </div>

<a name = "FVasBC"></a>
## Face Verification as Binary Classification

Instead of using the triplet loss function, an alternative way is to train the face verification/recognition algorithm as a binary classification problem using **pairs of images**. The training process takes **two input images** at a time and uses a Siamese neural network to conduct inference on the images separately before connecting the two strands and outputting a binary classification (whether they are the same person or not).

<div style = "text-align: center;">
    <img src="./images/face verification as binary classification.png" style="width:90%;" >
</div>

An example of a training set using this approach looks like the following:

<div style = "text-align: center;">
    <img src="./images/face verification binary training set.png" style="width:30%;" >
</div>
