## Face Recognition 

#### What is face recognition? 

* Face verification vs. face recognition 
   * Verification - given an image and name/ID, output whether the image is the claimed person 
   * Recognition - given you have $K$ persons in a database and their respective names/IDs, when you get an input image, output ID if image is any of the $K$ persons (or "not recognized")
   
#### One Shot Learning 

* One-shot learning - recognize that person given just one image of that person's face
    * aka learn from just **one** example 
    * to make it work, we learn a "similarity" function 
        * $d($img1, img2$)$ $=$ degree of difference between images
            * If $d($img1,img2$) \leq \tau \rightarrow$ "same" else "different"
            * pairwise comparison between the input image and each of the $K$ persons in the database
            
#### Siamese Network 

* Taigman et. al., 2014. DeepFace closing the gap to human level performance
* Siamese Network - running two **identical CNNs** on two different inputs 
* Siamese Network explained: Suppose you have images $x^{(1)}$ and $x^{(2)} $ such that
    * $x^{(1)} \rightarrow$ ConvNet until last FC layer $\rightarrow f(x^{(1)}) =$ "encoding of $x^{(1)}$"
    * $x^{(2)} \rightarrow$ ConvNet until last FC layer $\rightarrow f(x^{(2)}) =$ "encoding of $x^{(2)}$"
    * Then, we can define $d(x^{(1)}, x^{(2)}) = \Vert {f(x^{(1)}) -  f(x^{(2)})}\Vert^2_2$ such that:
        * $\Vert {f(x^{(i)}) -  f(x^{(j)})}\Vert^2_2$ is small if $x^{(i)}$ and  $x^{(j)}$ are the same person 
        * $\Vert {f(x^{(i)}) -  f(x^{(j)})}\Vert^2_2$ is large if $x^{(i)}$ and  $x^{(j)}$ are different people 
            * Use backprop until all these conditions are satisfied


            
#### Triplet Loss 

* Schroff et al. 2015, FaceNet: A unified embedding for face recognition and clustering 
* One way to learn parameters to get a good encoding for faces is to define and apply gradient descent on triplet loss function
* Suppose $A$ = anchor image, $P$ = positive (same person), $N$ = negative (different person)
    * Want $\Vert f(A) - f(P)\Vert^2 \leq \Vert f(A) - f(N)\Vert^2$ 
        * Trivial solution workaround since $\Vert f(A) - f(P)\Vert^2 - \Vert f(A) - f(N)\Vert^2 \leq 0 $ you can just make everything zero, so we add a margin $\alpha$ such that:
           *  $\Vert f(A) - f(P)\Vert^2 - \Vert f(A) - f(N)\Vert^2 + \alpha \leq 0 $
        * Thus, we have $\Vert f(A) - f(P)\Vert^2 + \alpha \leq \Vert f(A) - f(N)\Vert^2$
        
* Triplet loss function: Given 3 images $A, P, N$:
    * Define $ L(A,P,N) = \max(\Vert f(A) - f(P)\Vert^2 - \Vert f(A) - f(N)\Vert^2 + \alpha, 0)$
    * Thus, overall cost is $J = \sum\limits_{i=1}^{m}{L(A^{(i)},P^{(i)},N^{(i)})}$
    * Training set can be: 10k pictures of 1k people with some pairs $A$ and $P$ of the same person 
        * During training, if $A,P,N$ are chosen randomly, then  $d(A,P) + \alpha \leq d(A,N)$ is easily satisfied
        * So, choose triplets that are "hard" to train on s.t.:
            * $d(A,P) \approx d(A,N)$ so that the model has to train "extra hard" so that there is at least a margin $\alpha$ between a positive and negative differential 
        * use gradient descent to minimize the cost as per usual 
* some companies used 100millions of images, just get the parameters they trained 
    

#### Face Verification and Binary Classification

* Have a siamese NN and have two encodings $f(x^{(i)})$ feed into a logistic regression unit to make a prediction $\hat{y}$ where $\hat{y} = 1$ if they are the same person and $0$ if they are not.
    * Alternative to the triplet loss 
    * This makes face recognition into a binary classificaion problem!
    * Let's formulate $\hat{y}$ as follows. Say you have an encoding that has $h$ features and $i$ and $j$ are 2 inputs (faces) : Then 
        * $\hat{y} = \sigma(\sum\limits_{k=1}^{h} w_i \vert f(x^{(i)})_k - f(x^{(j)})_k \vert + b) $
        * Chi-square formula   
    * precompute faces in a database so you only have to compute new images 
    * use different pairs to train the Siamese NN

## Neural Style Transfer 

#### What is neural style transfer?

* Given a content image $C$ and a style image $S$, output a generated image $G$ that is in the style of $S$ and has the contents of $C$