# FaceNet
In this notebook, we are going to analyse the FaceNet paper(Schroff et al.). First let's embed the pdf and try to summarise it:

In [1]:
from IPython.display import IFrame
IFrame("https://arxiv.org/pdf/1503.03832.pdf", width=980, height=900)

## Summary:

1. The paper introduces a novel loss function, called triplet loss function, motivated specifically by the task at hand: irrespective of pose/illumination, similar faces should be closer to each other than dissimmilar faces.
2. To this end, a CNN based on Zeiler-Fergus paper is trained (with additional tweakings)
3. The output of the CNN is then L2-normalised so that the image lies in an embedded space: on a surface of a 128-dimensional hypersphere.

4. These embeddings of images are used to construct a triplet loss function, only those triplets which can improve learning are used for optimization: a= embedding of an anchor, p= embedding of a positive, n= embedding of a negative. Whenever $$\mathbb{L}_2(a-n) < \mathbb{L}_2(a-p), \qquad for \  a \  triple \  (a,p,n) \in \mathbb{\tau}$$ it is used to calculate loss:$$\mathscr{L} = \sum_i^N [\mathbb{L}_2^2(a-p)+\alpha-\mathbb{L}_2^2(a-n)]_{+}$$ 
5. The optimizer then tries to minimize the error by forcing n **away** from a and by **bringing** p closer to a at least by a margin of alpha.
6. Once the whole network along with the Euclidean embedding is optimized, a simple L2 distance threshold is used to classify faces as same or different.

                ############################ END of summary ####################################

## Ideally, what can we do about the FaceNet?

1. We can implement the NNx/NNSx architecture to minimize the embedding loss according to the triple loss and learn the hyperparameter D, the L2 distance separator of faces.
2. Using the weights and D, we can verify whether two faces are same or not.
3. We can recognize a face by k-NN
4. We can cluster similar faces using standard clustering techniques.

But the most interesting extension would be applying t-SNE on embedding to visualise the results, to verify whether similar face are really visually similar.

The following sections cover a simplistic approach by which the model could be created, trained and used.

## Create NN1

In this section we will create the NN1 architecture. The architecture here is slightly different that the one used in paper, for we implement ReLU+dropout instead of **maxout** for all the fully-connected layers.

In [1]:
from functools import reduce
import tensorflow as tf

class FaceNet(object):
    def __init__(self):
        pass
    def convoluteR(self, inputs, filters, kernel_size, strides,padding="same", activation=tf.nn.relu ):
        #conv+ReLU
        return tf.layers.conv2d(inputs=inputs, filters=filters, kernel_size=kernel_size,\
                                strides=strides, padding=padding, activation=activation)
    
    def pool(self, inputs, pool_size, strides, padding="same"):
        return tf.layers.max_pooling2d(inputs=inputs, pool_size=pool_size,\
                                       strides=strides, padding=padding)
    
    def lrn(self, inputs, radius=1,alpha=2e-05, beta=0.75, bias=1.0):
        return tf.nn.local_response_normalization(inputs, depth_radius = radius,\
                                                  alpha = alpha, beta = beta, bias = bias)
    def fc(self, inputs, units, activation=tf.nn.relu, drop=0.5, mode=True):
        inputs = tf.reshape(inputs, [-1, reduce(lambda x, y: x*y, inputs.get_shape().as_list()[1:])])
        _fc = tf.layers.dense(inputs=inputs, units=reduce(lambda x, y: x*y, units), activation=activation)
        out = tf.layers.dropout(inputs=_fc, rate=0.5, training=mode)
        return tf.reshape(out, [-1]+units)
    def create(self, inputs):
        #layer1
        conv1 = self.convoluteR(inputs=inputs, filters=64, kernel_size=[7,7], strides=2)
        pool1 = self.pool(inputs=conv1,pool_size=[3,3],strides=2)
        rnorm1 = self.lrn(inputs=pool1)
        
        #layer2
        conv2a = self.convoluteR(inputs=rnorm1, filters=64, kernel_size=[1,1], strides=1)
        conv2 = self.convoluteR(inputs=conv2a, filters=192, kernel_size=[3,3], strides=1)
        rnorm2 = self.lrn(inputs=conv2)
        pool2 = self.pool(inputs=rnorm2, pool_size=[3,3], strides=2)
        
        #layer3
        conv3a = self.convoluteR(inputs=pool2, filters=192, kernel_size=[1,1], strides=1)
        conv3 = self.convoluteR(inputs=conv3a, filters=384, kernel_size=[3,3], strides=1)
        pool3 = self.pool(inputs=conv3, pool_size=[3,3], strides=2)
        
        #layer4
        conv4a = self.convoluteR(inputs=pool3, filters=384, kernel_size=[1,1], strides=1)
        conv4 = self.convoluteR(inputs=conv4a, filters=256, kernel_size=[3,3], strides=1)
        
        #layer5
        conv5a = self.convoluteR(inputs=conv4,filters=256, kernel_size=[1,1], strides=1)
        conv5 = self.convoluteR(inputs=conv5a, filters=256, kernel_size=[3,3], strides=1)
        
        #layer6
        conv6a = self.convoluteR(inputs=conv5, filters=256, kernel_size=[1,1], strides=1)
        conv6 = self.convoluteR(inputs=conv6a, filters=256, kernel_size=[3,3], strides=1)
        pool4 = self.pool(inputs=conv6, pool_size=[3,3], strides=2)
        
        #fc
        f1 = self.fc(pool4, units=[1,32,128])
        f2 = self.fc(f1,units=[1,32,128])
        f3 = self.fc(f2,units=[1,1,128])
        
        #l2 sphere
        l2 = tf.divide(f3, tf.sqrt(tf.reduce_sum(tf.square(f3))))
        #self.embeddings = l2
        return l2
    
    def triplet_loss(self, triplet, alpha=0.2):
        anchor, positive, negative = triplet
        p_diff = tf.reduce_sum(tf.square(tf.add(anchor, tf.negative(positive))), axis=-1)
        n_diff = tf.reduce_sum(tf.square(tf.add(anchor, tf.negative(negative))), axis=-1)
        diff = tf.add(tf.add(p_diff,tf.negative(n_diff)), alpha)
        loss = tf.reduce_sum(tf.reduce_sum(tf.maximum(diff, 0.0), axis=0))
        return loss
    #not for training
    def infer(self, inputs):
        return self.create(inputs)
    def train(self,loss):
        train_op = tf.contrib.layers.optimize_loss(loss=loss,global_step=tf.contrib.framework.get_global_step(),\
                                                learning_rate=0.001,optimizer="Adam")

In [2]:
url = '/path/to/train/folder/' #contains images from datasets like LFW dataset

In [4]:
import os, random, itertools
import numpy as np
def get_triplets(url):
    faces = random.sample(os.listdir(url),2)
    pos = os.path.join(url, faces[0],'face')
    neg = os.path.join(url, faces[1],'face')
    url_triplet = [[pos+'/'+x[0], pos+'/'+x[1],neg+'/'+random.sample(os.listdir(neg),1)[0]] for x in itertools.combinations(random.sample(os.listdir(pos),6), 2)]
    url_triplet = sum(url_triplet,[])
    return pos, neg, url_triplet

In [5]:
pos, neg, triplet = get_triplets(url)

In [6]:
len(triplet)

45

In [7]:
from PIL import Image

In [8]:
def read_img(triplet):
    #triplet is a list of urls
    arr = []
    for x in triplet:
        arr.append(np.array(Image.open(x).resize((220,220))))
    return np.array(arr, dtype=np.float32)

In [9]:
arr = read_img(triplet[:15])

In [10]:
arr.shape

(15, 220, 220, 3)

In [16]:
p, q, r = tf.placeholder(tf.float32, shape=[None,220,220,3]),tf.placeholder(tf.float32, shape=[None,220,220,3]),tf.placeholder(tf.float32, shape=[None,220,220,3])

In [17]:
nn = FaceNet()

In [18]:
anchor = nn.create(tf.reshape(p,[-1,220,220,3]))
positive = nn.create(tf.reshape(q,[-1,220,220,3]))
negative = nn.create(tf.reshape(r,[-1,220,220,3]))

In [19]:
loss = nn.triplet_loss((anchor, positive, negative))

In [20]:
train_step = tf.train.AdamOptimizer(0.01).minimize(loss)

## Train

In [None]:
sess = tf.Session()
sess.run(tf.global_variables_initializer())

In [None]:
for i in range(1024):
    _,_, triplet = get_triplets(url)
    arr = read_img(triplet[:15])
    sess.run(train_step, feed_dict={p:arr[0::3], q:arr[1::3], r:arr[2::3]})
    if i%100==0:
        print("current batch loss %s" % sess.run(loss, feed_dict={p:arr[0::3], q:arr[1::3], r:arr[2::3]}))

current batch loss [[ 0.18999195]]
