# Inception Net, Face Verification and Recognition

<p style="text-align:justify"> Welcome to the third exercise of this CNN Tutorial! This exercise is similar to the first assignment of week 4 in the Deep Learning Specialization - course 4 by Professor Andrew Ng [https://www.coursera.org/learn/convolutional-neural-networks]. Though there students were not asked to code Inception Net, we will be doing it here. And it will be implemented in PyTorch. Inception Net was introduced by [Szegedy et al.](https://arxiv.org/pdf/1409.4842.pdf). It won the ImageNet Large-Scale Visual Recognition Challenge 2014 ([ILSVRC14](http://www.image-net.org/challenges/LSVRC/2014/)). Our goal in this exercise is two fold: Build an Inception Net and use it for face verification and recognition application. The Inception Net we are going to build in this exercise has 75 layers excluding relu layers and zero padding layers. This net is similar to the one used in the Facenet paper by [Schroff et al.](https://arxiv.org/pdf/1503.03832.pdf).

In [1]:
# Run this cell
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader

import numpy as np
import matplotlib.pyplot as plt
import h5py
import time
import glob

from sklearn import decomposition
from sklearn.preprocessing import StandardScaler
from facenet_utils import *

import matplotlib
from mpl_toolkits.mplot3d import Axes3D

%load_ext autoreload
%autoreload 2
%matplotlib inline

## Motivation for Inception Net

<div style="text-align: justify"> To improve performance we tend to make networks "big" in terms of depth (i.e levels) and width (i.e num of units in each layer). This gives rise to two issues: (i) larger number of parameters implying vulnerability to overfitting and (ii) computational cost. It has been stated in [[1]](#references_cell) that uniform increase in filter sizes of two convolutional layers chained together increases the computation time quadratically. A fundamental way to solve these two issues would be to move from fully connected structure to sparsely connected structure even inside convolutions. Inception Net achieves this by finding approximate local optimal sparse constructions as it moves from i/p layer to o/p layer. The idea is that for a layer close to input, larger number of groups of activated neurons correspond to local regions. These can be covered by a larger number of smaller scale filters, say of size 1x1 and 3x3. Of course there may be a few groups of activated neurons corresponding to larger regions. These can be covered by, say, a small number of 5x5, 7x7 filters. By concatenating the outputs from 1x1, 3x3, 5x5 etc. at this level, optimal sparse construction at this level is captured. This arrangement of multiple convolution units with appropriate number of feature maps and appropriate filter sizes based on the level/depth and with their outputs concatenated forms an Inception module. As we move away from i/p layer, the inception modules will have larger number of feature maps at larger scales and smaller number of feature maps at smaller scales. An Inception Net is basically a stack of these inception modules. One point to keep in mind is that as we move deeper, large number of the larger scale filters have to be applied to a very large number of feature maps thereby increasing computation overload. To avoid this, a 1x1 convolution is done prior to application of larger scale filters to reduce the number of feature maps. For historic reasons, occasional pooling layers  are also introduced. An example of an inception module is shown in Figure 1.</div>

<img src="images/inception_module.png" style="width:450px;height:250px;">
<caption><center> <u> <font color='purple'> **Figure 1** </u><font color='purple'>  : **An Inception Module (see bottom up) [(source)](https://medium.com/initialized-capital/we-need-to-go-deeper-a-practical-guide-to-tensorflow-and-inception-50e66281804f)** </center></caption>
<br>


## Inception_3a Module
<div style="text-align: justify"> The inception net we are going to build will have intially 3 standard convolutional layers with batch normalization, relu activation and a couple of pooling layers. This is followed by a stack of 7 inception modules subsequent to which an average pooling layer and a fully connected layer is present. These seven inception modules are categorized under the names Inception 3a, 3b, 3c, 4a, 4e, 5a and 5b. You will be asked to build Inception 3a in the next cell. Others are similar with respect to coding and we have done it for you. You need to go through those cells, extract the details of the inception module for yourself (which may be even done outside the tutorial session time so that you can try other assignments) and run them. 

The Inception 3a module is shown in Figure 2.</div>

<img src="images/inception3a.png" style="width:550px;height:550px;">
<caption><center> <u> <font color='purple'> **Figure 2** </u><font color='purple'>  : **Inception 3a Module (see bottom up)** </center></caption>
<br>


<div style = "text-align:justify"> You have to read Figure 2 bottom-up. There are 4 sections in Inception 3a module. They correspond to 3x3, 5x5, pool and 1x1 sections. Each of them operate on the incoming input x from the previous layer/module. For eg, the 3x3 section in Fig 2 has two layers of convolution + batch normalization + relu, the first one being a 1x1 convolution for feature map reduction and the second being a 3x3 convolution as the name of the section suggests. Similar are other layers. At the end, o/p's from all the sections are concatenated along the feature map dimension (dimension 1 in PyTorch) and subsequently passed onto next layer/module.</div> 
<br>
<div style = "text-align:justify">*conv(ic, oc, f, s, p)* in the figure reads as convolution layer with incoming_channels = ic, outgoing_channels = oc, kernel_size = (f, f), stride = (s, s) and padding = (p, p). Here p = 0 means no padding. *MaxPool(f, s)* reads as max pooling layer with kernel_size = (f, f) and stride = (s, s). *ZeroPad(l, r, t, b)* reads as padding left, right, top and bottom boundaries with l, r, t and b zeros respectively. You may use [nn.ZeroPad2d](http://pytorch.org/docs/master/nn.html#zeropad2d) for coding this padding. For concatenating the o/ps from different sections, you may use [torch.cat](http://pytorch.org/docs/master/torch.html?highlight=torch%20cat#torch.cat). With this information and experience of having completed two exercises, you should able to complete coding the *Inception_3a* class  in the following cell. You are expected to replace *None* in the rhs with your code.</div>

In [3]:
class Inception_3a(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_3a_3x3_conv1 = None
        self.inception_3a_3x3_bn1 = None
        self.inception_3a_3x3_relu1 = None
        self.inception_3a_3x3_conv2 = None
        self.inception_3a_3x3_bn2 = None
        self.inception_3a_3x3_relu2 = None
        
        self.inception_3a_5x5_conv1 = None
        self.inception_3a_5x5_bn1 = None
        self.inception_3a_5x5_relu1 = None
        self.inception_3a_5x5_conv2 = None
        self.inception_3a_5x5_bn2 = None
        self.inception_3a_5x5_relu2 = None
        
        self.inception_3a_max_pool = None
        self.inception_3a_pool_conv = None
        self.inception_3a_pool_bn = None
        self.inception_3a_pool_relu = None
        self.inception_3a_pool_pad = None
    
        self.inception_3a_1x1_conv = None
        self.inception_3a_1x1_bn = None
        self.inception_3a_1x1_relu = None
        
    def forward(self, x):
        
        x_3x3 = self.inception_3a_3x3_conv1(x)
        x_3x3 = self.inception_3a_3x3_bn1(x_3x3)
        x_3x3 = self.inception_3a_3x3_relu1(x_3x3)
        x_3x3 = self.inception_3a_3x3_conv2(x_3x3)
        x_3x3 = self.inception_3a_3x3_bn2(x_3x3)
        x_3x3 = self.inception_3a_3x3_relu2(x_3x3)
        
        x_5x5 = self.inception_3a_5x5_conv1(None)
        x_5x5 = self.inception_3a_5x5_bn1(x_5x5)
        x_5x5 = self.inception_3a_5x5_relu1(x_5x5)
        x_5x5 = self.inception_3a_5x5_conv2(x_5x5)
        x_5x5 = self.inception_3a_5x5_bn2(x_5x5)
        x_5x5 = self.inception_3a_5x5_relu2(x_5x5)
        
        x_pool = self.inception_3a_max_pool(None)
        x_pool = self.inception_3a_pool_conv(x_pool)
        x_pool = self.inception_3a_pool_bn(x_pool)
        x_pool = self.inception_3a_pool_relu(x_pool)
        x_pool = self.inception_3a_pool_pad(x_pool)
        
        x_1x1 = self.inception_3a_1x1_conv(None)
        x_1x1 = self.inception_3a_1x1_bn(x_1x1)
        x_1x1 = self.inception_3a_1x1_relu(x_1x1)
        
        inception_3a = None # concat o/p from all sections; across which dimension??? use torch.cat 
    
        return inception_3a

In [17]:
# To check your code run this cell to see if you get the expected output
torch.manual_seed(23)
x = torch.randn(1, 64, 12, 12)
in_3a = Inception_3a()
op = in_3a(Variable(x))
print("Size of output:", op.size())
print("op[0, 0, 0:3, 0:3] :\n{}".format(op[0, 0, 0:3, 0:3]))

<b> Expected Output:</b>
<br>
<span style="color:green">
<br>
Size of output: torch.Size([1, 256, 12, 12])
<br>
op[0, 0, 0:3, 0:3] :
<br>
Variable containing:
<br>
 0.0000  1.7229  1.2050
 <br>
 0.9036  0.0000  0.0000
 <br>
 0.0000  0.7581  0.5902
 <br>
[torch.FloatTensor of size 3x3]</span>

### Xavier Initialization
<div style = "text-align:justify"> We will digress a bit just to let you know how to use *init* module in *nn* package in PyTorch  in case you want to initialize weights based on other standard methods. Suppose we do not want random initialization of weights (which is automatic when the layer is instantiated) of convolutional layers in *Inception_3a* module. Instead we want to initialize them using Xavier's method [[3]](#references_cell). We can use 
[xavier_uniform](http://pytorch.org/docs/master/nn.html#torch-nn-init) or [xavier_normal](http://pytorch.org/docs/master/nn.html#torch-nn-init). Note that the weight tensor needs to be passed as input to these methods. If *conv1* is an instantiation of *conv2d* class, then its 4d weight tensors are available in conv2d.weight.data. </div>
<br>
<div style = "text-align:justify"> Now, there are many convolution layers in *Inception_3a* module. Instead of dealing with each of these layers in isolation for weight initialization, we will write a loop that does this. For this we will need the method *named_children* available in *Inception_3a* module automatically as it has been inherited from *nn.Module* (see [named_children](http://pytorch.org/docs/master/nn.html#module)). This method will return an iterator over all the immediate children of the module on which it is invoked yielding both the modules and their names. Let us use this to do our job</div>

**You are required to replace pass statement inside the if statement in the for loop of the following cell**

In [40]:
# Xavier_initialization; Fill the body of the if loop
torch.manual_seed(23)
in_3a = Inception_3a()
for name, child_module in in_3a.named_children():
    if 'conv' in name:
        # replace pass with 1 line of code 
        pass
        

In [38]:
# You can run this cell to see how some weights have got initialized
print(in_3a.inception_3a_5x5_conv2.weight.data[0, 0])

**Expected Output:** (You may get ouput similar to this but may not be exact replica)
 <br>
 <span style = "color:green">
 <br>
 1.00000e-02 *
 <br>
 -1.5621  1.5904 -3.3996 -4.9784  4.7155
 <br>
  9.1694 -0.2622 -0.8654 -1.5381  5.9182
  <br>
 -5.5133  1.8418 -4.7256 -0.5185  2.6104
 <br>
 -1.8518 -3.7201 -1.5192 -2.4632  2.9022
 <br>
 -0.0992 -0.6217 -1.0555 -1.0972  3.7040
 <br>
[torch.FloatTensor of size 5x5]</span>    

<div style = "text-align:justify"> Back to Inception Net, the rest six inception modules are coded below for you. You can have a look at them. Also, run all these cells. </div>

In [4]:
class Inception_3b(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_3b_3x3_conv1 = nn.Conv2d(256, 96, kernel_size = (1, 1), stride = (1, 1))
        self.inception_3b_3x3_bn1 = nn.BatchNorm2d(96)
        self.inception_3b_3x3_relu1 = nn.ReLU(inplace = True)
        self.inception_3b_3x3_conv2 = nn.Conv2d(96, 128, kernel_size = (3, 3), stride = (1, 1), padding = (1, 1))
        self.inception_3b_3x3_bn2 = nn.BatchNorm2d(128)
        self.inception_3b_3x3_relu2 = nn.ReLU(inplace = True)
        
        self.inception_3b_5x5_conv1 = nn.Conv2d(256, 32, kernel_size = (1, 1), stride = (1, 1))
        self.inception_3b_5x5_bn1 = nn.BatchNorm2d(32)
        self.inception_3b_5x5_relu1 = nn.ReLU(inplace = True)
        self.inception_3b_5x5_conv2 = nn.Conv2d(32, 64, kernel_size = (5, 5), stride = (1, 1), padding = (2, 2))
        self.inception_3b_5x5_bn2 = nn.BatchNorm2d(64)
        self.inception_3b_5x5_relu2 = nn.ReLU(inplace = True)
        
        self.inception_3b_avg_pool = nn.AvgPool2d(kernel_size = (3, 3), stride = (3, 3))
        self.inception_3b_pool_conv = nn.Conv2d(256, 64, kernel_size = (1, 1), stride = (1, 1))
        self.inception_3b_pool_bn = nn.BatchNorm2d(64)
        self.inception_3b_pool_relu = nn.ReLU(inplace = True)
        self.inception_3b_pool_pad = nn.ZeroPad2d(4)
            
        self.inception_3b_1x1_conv = nn.Conv2d(256, 64, kernel_size = (1, 1), stride = (1, 1))
        self.inception_3b_1x1_bn = nn.BatchNorm2d(64)
        self.inception_3b_1x1_relu = nn.ReLU(inplace = True)
        
    def forward(self, x):
        
        x_3x3 = self.inception_3b_3x3_conv1(x)
        x_3x3 = self.inception_3b_3x3_bn1(x_3x3)
        x_3x3 = self.inception_3b_3x3_relu1(x_3x3)
        x_3x3 = self.inception_3b_3x3_conv2(x_3x3)
        x_3x3 = self.inception_3b_3x3_bn2(x_3x3)
        x_3x3 = self.inception_3b_3x3_relu2(x_3x3)
    
        x_5x5 = self.inception_3b_5x5_conv1(x)
        x_5x5 = self.inception_3b_5x5_bn1(x_5x5)
        x_5x5 = self.inception_3b_5x5_relu1(x_5x5)
        x_5x5 = self.inception_3b_5x5_conv2(x_5x5)
        x_5x5 = self.inception_3b_5x5_bn2(x_5x5)
        x_5x5 = self.inception_3b_5x5_relu2(x_5x5)
        
        x_pool = self.inception_3b_avg_pool(x)
        x_pool = self.inception_3b_pool_conv(x_pool)
        x_pool = self.inception_3b_pool_bn(x_pool)
        x_pool = self.inception_3b_pool_relu(x_pool)
        x_pool = self.inception_3b_pool_pad(x_pool)
        
        x_1x1 = self.inception_3b_1x1_conv(x)
        x_1x1 = self.inception_3b_1x1_bn(x_1x1)
        x_1x1 = self.inception_3b_1x1_relu(x_1x1)
   
        inception_3b = torch.cat((x_3x3, x_5x5, x_pool, x_1x1), dim = 1)
    
        return inception_3b

In [5]:
class Inception_3c(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_3c_3x3_conv1 = nn.Conv2d(320, 128, kernel_size = (1, 1), stride = (1, 1))
        self.inception_3c_3x3_bn1 = nn.BatchNorm2d(128)
        self.inception_3c_3x3_relu1 = nn.ReLU(inplace = True)
        self.inception_3c_3x3_conv2 = nn.Conv2d(128, 256, kernel_size = (3, 3), stride = (2, 2), padding = (1, 1))
        self.inception_3c_3x3_bn2 = nn.BatchNorm2d(256)
        self.inception_3c_3x3_relu2 = nn.ReLU(inplace = True)
        
        self.inception_3c_5x5_conv1 = nn.Conv2d(320, 32, kernel_size = (1, 1), stride = (1, 1))
        self.inception_3c_5x5_bn1 = nn.BatchNorm2d(32)
        self.inception_3c_5x5_relu1 = nn.ReLU(inplace = True)
        self.inception_3c_5x5_conv2 = nn.Conv2d(32, 64, kernel_size = (5, 5), stride = (2, 2), padding = (2, 2))
        self.inception_3c_5x5_bn2 = nn.BatchNorm2d(64)
        self.inception_3c_5x5_relu2 = nn.ReLU(inplace = True)
        
        self.inception_3c_max_pool = nn.MaxPool2d(kernel_size = (3, 3), stride = (2, 2))
        self.inception_3c_pool_pad = nn.ZeroPad2d((0, 1, 0, 1))
        
    def forward(self, x):

        x_3x3 = self.inception_3c_3x3_conv1(x)
        x_3x3 = self.inception_3c_3x3_bn1(x_3x3)
        x_3x3 = self.inception_3c_3x3_relu1(x_3x3)
        x_3x3 = self.inception_3c_3x3_conv2(x_3x3)
        x_3x3 = self.inception_3c_3x3_bn2(x_3x3)
        x_3x3 = self.inception_3c_3x3_relu2(x_3x3)
    
        x_5x5 = self.inception_3c_5x5_conv1(x)
        x_5x5 = self.inception_3c_5x5_bn1(x_5x5)
        x_5x5 = self.inception_3c_5x5_relu1(x_5x5)
        x_5x5 = self.inception_3c_5x5_conv2(x_5x5)
        x_5x5 = self.inception_3c_5x5_bn2(x_5x5)
        x_5x5 = self.inception_3c_5x5_relu2(x_5x5)
        
        x_pool = self.inception_3c_max_pool(x)
        x_pool = self.inception_3c_pool_pad(x_pool)
                
        inception_3c = torch.cat((x_3x3, x_5x5, x_pool), dim = 1)
    
        return inception_3c

In [6]:
class Inception_4a(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_4a_3x3_conv1 = nn.Conv2d(640, 96, kernel_size = (1, 1), stride = (1, 1))
        self.inception_4a_3x3_bn1 = nn.BatchNorm2d(96)
        self.inception_4a_3x3_relu1 = nn.ReLU(inplace = True)
        self.inception_4a_3x3_conv2 = nn.Conv2d(96, 192, kernel_size = (3, 3), stride = (1, 1), padding = (1, 1))
        self.inception_4a_3x3_bn2 = nn.BatchNorm2d(192)
        self.inception_4a_3x3_relu2 = nn.ReLU(inplace = True)
        
        self.inception_4a_5x5_conv1 = nn.Conv2d(640, 32, kernel_size = (1, 1), stride = (1, 1))
        self.inception_4a_5x5_bn1 = nn.BatchNorm2d(32)
        self.inception_4a_5x5_relu1 = nn.ReLU(inplace = True)
        self.inception_4a_5x5_conv2 = nn.Conv2d(32, 64, kernel_size = (5, 5), stride = (1, 1), padding = (2, 2))
        self.inception_4a_5x5_bn2 = nn.BatchNorm2d(64)
        self.inception_4a_5x5_relu2 = nn.ReLU(inplace = True)
        
        self.inception_4a_avg_pool = nn.AvgPool2d(kernel_size = (3, 3), stride = (3, 3))
        self.inception_4a_pool_conv = nn.Conv2d(640, 128, kernel_size = (1, 1), stride = (1, 1))
        self.inception_4a_pool_bn = nn.BatchNorm2d(128)
        self.inception_4a_pool_relu = nn.ReLU(inplace = True)
        self.inception_4a_pool_pad = nn.ZeroPad2d(2)
        
        self.inception_4a_1x1_conv = nn.Conv2d(640, 256, kernel_size = (1, 1), stride = (1, 1))
        self.inception_4a_1x1_bn = nn.BatchNorm2d(256)
        self.inception_4a_1x1_relu = nn.ReLU(inplace = True)
        
    def forward(self, x):

        x_3x3 = self.inception_4a_3x3_conv1(x)
        x_3x3 = self.inception_4a_3x3_bn1(x_3x3)
        x_3x3 = self.inception_4a_3x3_relu1(x_3x3)
        x_3x3 = self.inception_4a_3x3_conv2(x_3x3)
        x_3x3 = self.inception_4a_3x3_bn2(x_3x3)
        x_3x3 = self.inception_4a_3x3_relu2(x_3x3)
    
        x_5x5 = self.inception_4a_5x5_conv1(x)
        x_5x5 = self.inception_4a_5x5_bn1(x_5x5)
        x_5x5 = self.inception_4a_5x5_relu1(x_5x5)
        x_5x5 = self.inception_4a_5x5_conv2(x_5x5)
        x_5x5 = self.inception_4a_5x5_bn2(x_5x5)
        x_5x5 = self.inception_4a_5x5_relu2(x_5x5)
        
        x_pool = self.inception_4a_avg_pool(x)
        x_pool = self.inception_4a_pool_conv(x_pool)
        x_pool = self.inception_4a_pool_bn(x_pool)
        x_pool = self.inception_4a_pool_relu(x_pool)
        x_pool = self.inception_4a_pool_pad(x_pool)
        
        x_1x1 = self.inception_4a_1x1_conv(x)
        x_1x1 = self.inception_4a_1x1_bn(x_1x1)
        x_1x1 = self.inception_4a_1x1_relu(x_1x1)
       
        inception_4a = torch.cat((x_3x3, x_5x5, x_pool, x_1x1), dim = 1)
    
        return inception_4a

In [7]:
class Inception_4e(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_4e_3x3_conv1 = nn.Conv2d(640, 160, kernel_size = (1, 1), stride = (1, 1))
        self.inception_4e_3x3_bn1 = nn.BatchNorm2d(160)
        self.inception_4e_3x3_relu1 = nn.ReLU(inplace = True)
        self.inception_4e_3x3_conv2 = nn.Conv2d(160, 256, kernel_size = (3, 3), stride = (2, 2), padding = (1, 1))
        self.inception_4e_3x3_bn2 = nn.BatchNorm2d(256)
        self.inception_4e_3x3_relu2 = nn.ReLU(inplace = True)
        
        self.inception_4e_5x5_conv1 = nn.Conv2d(640, 64, kernel_size = (1, 1), stride = (1, 1))
        self.inception_4e_5x5_bn1 = nn.BatchNorm2d(64)
        self.inception_4e_5x5_relu1 = nn.ReLU(inplace = True)
        self.inception_4e_5x5_conv2 = nn.Conv2d(64, 128, kernel_size = (5, 5), stride = (2, 2), padding = (2, 2))
        self.inception_4e_5x5_bn2 = nn.BatchNorm2d(128)
        self.inception_4e_5x5_relu2 = nn.ReLU(inplace = True)
        
        self.inception_4e_max_pool = nn.MaxPool2d(kernel_size = (3, 3), stride = (2, 2))
        self.inception_4e_pool_pad = nn.ZeroPad2d((0, 1, 0, 1))        
        
        
    def forward(self, x):

        x_3x3 = self.inception_4e_3x3_conv1(x)
        x_3x3 = self.inception_4e_3x3_bn1(x_3x3)
        x_3x3 = self.inception_4e_3x3_relu1(x_3x3)
        x_3x3 = self.inception_4e_3x3_conv2(x_3x3)
        x_3x3 = self.inception_4e_3x3_bn2(x_3x3)
        x_3x3 = self.inception_4e_3x3_relu2(x_3x3)
    
        x_5x5 = self.inception_4e_5x5_conv1(x)
        x_5x5 = self.inception_4e_5x5_bn1(x_5x5)
        x_5x5 = self.inception_4e_5x5_relu1(x_5x5)
        x_5x5 = self.inception_4e_5x5_conv2(x_5x5)
        x_5x5 = self.inception_4e_5x5_bn2(x_5x5)
        x_5x5 = self.inception_4e_5x5_relu2(x_5x5)
        
        x_pool = self.inception_4e_max_pool(x)
        x_pool = self.inception_4e_pool_pad(x_pool)        
        
        inception_4e = torch.cat((x_3x3, x_5x5, x_pool), dim = 1)
    
        return inception_4e

In [8]:
class Inception_5a(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_5a_3x3_conv1 = nn.Conv2d(1024, 96, kernel_size = (1, 1), stride = (1, 1))
        self.inception_5a_3x3_bn1 = nn.BatchNorm2d(96)
        self.inception_5a_3x3_relu1 = nn.ReLU(inplace = True)
        self.inception_5a_3x3_conv2 = nn.Conv2d(96, 384, kernel_size = (3, 3), stride = (1, 1), padding = (1, 1))
        self.inception_5a_3x3_bn2 = nn.BatchNorm2d(384)
        self.inception_5a_3x3_relu2 = nn.ReLU(inplace = True)       
        
        self.inception_5a_avg_pool = nn.AvgPool2d(kernel_size = (3, 3), stride = (3, 3))
        self.inception_5a_pool_conv = nn.Conv2d(1024, 96, kernel_size = (1, 1), stride = (1, 1))
        self.inception_5a_pool_bn = nn.BatchNorm2d(96)
        self.inception_5a_pool_relu = nn.ReLU(inplace = True)
        self.inception_5a_pool_pad = nn.ZeroPad2d(1) 
        
        self.inception_5a_1x1_conv = nn.Conv2d(1024, 256, kernel_size = (1, 1), stride = (1, 1))
        self.inception_5a_1x1_bn = nn.BatchNorm2d(256)
        self.inception_5a_1x1_relu = nn.ReLU(inplace = True)
        
        
    def forward(self, x):

        x_3x3 = self.inception_5a_3x3_conv1(x)
        x_3x3 = self.inception_5a_3x3_bn1(x_3x3)
        x_3x3 = self.inception_5a_3x3_relu1(x_3x3)
        x_3x3 = self.inception_5a_3x3_conv2(x_3x3)
        x_3x3 = self.inception_5a_3x3_bn2(x_3x3)
        x_3x3 = self.inception_5a_3x3_relu2(x_3x3)       
        
        x_pool = self.inception_5a_avg_pool(x)
        x_pool = self.inception_5a_pool_conv(x_pool)
        x_pool = self.inception_5a_pool_bn(x_pool)
        x_pool = self.inception_5a_pool_relu(x_pool)
        x_pool = self.inception_5a_pool_pad(x_pool)  
        
        x_1x1 = self.inception_5a_1x1_conv(x)
        x_1x1 = self.inception_5a_1x1_bn(x_1x1)
        x_1x1 = self.inception_5a_1x1_relu(x_1x1)
        
        inception_5a = torch.cat((x_3x3, x_pool, x_1x1), dim = 1)
    
        return inception_5a

In [9]:
class Inception_5b(nn.Module):
    
    def __init__(self):
        super().__init__()
        
        self.inception_5b_3x3_conv1 = nn.Conv2d(736, 96, kernel_size = (1, 1), stride = (1, 1))
        self.inception_5b_3x3_bn1 = nn.BatchNorm2d(96)
        self.inception_5b_3x3_relu1 = nn.ReLU(inplace = True)
        self.inception_5b_3x3_conv2 = nn.Conv2d(96, 384, kernel_size = (3, 3), stride = (1, 1), padding = (1, 1))
        self.inception_5b_3x3_bn2 = nn.BatchNorm2d(384)
        self.inception_5b_3x3_relu2 = nn.ReLU(inplace = True)       
        
        self.inception_5b_max_pool = nn.MaxPool2d(kernel_size = (3, 3), stride = (2, 2))
        self.inception_5b_pool_conv = nn.Conv2d(736, 96, kernel_size = (1, 1), stride = (1, 1))
        self.inception_5b_pool_bn = nn.BatchNorm2d(96)
        self.inception_5b_pool_relu = nn.ReLU(inplace = True)
        self.inception_5b_pool_pad = nn.ZeroPad2d(1) 
        
        self.inception_5b_1x1_conv = nn.Conv2d(736, 256, kernel_size = (1, 1), stride = (1, 1))
        self.inception_5b_1x1_bn = nn.BatchNorm2d(256)
        self.inception_5b_1x1_relu = nn.ReLU(inplace = True)
        
        
    def forward(self, x):

        x_3x3 = self.inception_5b_3x3_conv1(x)
        x_3x3 = self.inception_5b_3x3_bn1(x_3x3)
        x_3x3 = self.inception_5b_3x3_relu1(x_3x3)
        x_3x3 = self.inception_5b_3x3_conv2(x_3x3)
        x_3x3 = self.inception_5b_3x3_bn2(x_3x3)
        x_3x3 = self.inception_5b_3x3_relu2(x_3x3)       
        
        x_pool = self.inception_5b_max_pool(x)
        x_pool = self.inception_5b_pool_conv(x_pool)
        x_pool = self.inception_5b_pool_bn(x_pool)
        x_pool = self.inception_5b_pool_relu(x_pool)
        x_pool = self.inception_5b_pool_pad(x_pool)  
    
        x_1x1 = self.inception_5b_1x1_conv(x)
        x_1x1 = self.inception_5b_1x1_bn(x_1x1)
        x_1x1 = self.inception_5b_1x1_relu(x_1x1)
        
        inception_5b = torch.cat((x_3x3, x_pool, x_1x1), dim = 1)
    
        return inception_5b

## Face Verification and Recognition
With all inception modules ready, our face recognition model called as *FaceModel* is an inception net that consists of:
- initial 3 standard convolutional layers with batch normalization, relu activation and a couple of pooling layers
- then the stack of the 7 inception modules we implemented above 
- followed by an average pooling layer 
- finally a fully connected layer

See the code cell below.

In [10]:
class FaceModel(nn.Module):
    
    def __init__(self):
        super().__init__()
        torch.manual_seed(1)
        
        #First Block
        self.conv1 = nn.Conv2d(3, 64, kernel_size = (7, 7), stride = (2, 2), padding = (3, 3))
        self.bn1 = nn.BatchNorm2d(64)
        self.relu1 = nn.ReLU(inplace = True)
        self.max_pool1 = nn.MaxPool2d(kernel_size = (3, 3), stride = (2, 2), padding = (1, 1))
        
        # Second Block
        self.conv2 = nn.Conv2d(64, 64, kernel_size = (1, 1), stride = (1, 1))
        self.bn2 = nn.BatchNorm2d(64)
        self.relu2 = nn.ReLU(inplace = True)
        self.conv3 = nn.Conv2d(64, 192, kernel_size = (3, 3), stride = (1, 1), padding = (1, 1))
        self.bn3 = nn.BatchNorm2d(64)
        self.relu3 = nn.ReLU(inplace = True)
        self.max_pool2 = nn.MaxPool2d(kernel_size = (3, 3), stride = (2, 2), padding = (1, 1))
        
        #Inception Block 3
        self.inception_3a = Inception_3a()
        self.inception_3b = Inception_3b()
        self.inception_3c = Inception_3c()
        
        #Inception Block 4
        self.inception_4a = Inception_4a()
        self.inception_4e = Inception_4e()
        
        #Inception Block 5
        self.inception_5a = Inception_5a()
        self.inception_5b = Inception_5b()
        
        # Top layer
        self.avg_pool = nn.AvgPool2d(kernel_size = (3, 3), stride = (1, 1))
        self.dense_layer = nn.Linear(736, 128)
    
    def forward(self, x):
        
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu1(x)
        x = self.max_pool1(x)
        
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu2(x)
        x = self.conv3(x)
        x = self.bn3(x)
        x = self.relu3(x)
        x = self.max_pool2(x)
        
        x = self.inception_3a(x)
        x = self.inception_3b(x)
        x = self.inception_3c(x)
        
        x = self.inception_4a(x)
        x = self.inception_4e(x)
        
        x = self.inception_5a(x)
        x = self.inception_5b(x)
        
        x = self.avg_pool(x)
        x = x.view(-1, 736)
        x = self.dense_layer(x)
        
        x_norm = torch.sqrt(torch.sum(x ** 2, 1) + 1e-6)
        x = torch.div(x, x_norm.view(-1, 1).expand_as(x))
        
        return x
   

<div style = "text-align:justify"> The important point to note with respect to our *FaceModel* is the dimension of output from the final dense layer. It is a 128 dimensional vector. This vector has been $l2$ normalized as is clear from the last 2 lines before the return statement in the *forward()* method of the *FaceModel*. This vector describes a face. It has been shown in [[2]](#references_cell) that this face descriptor is very robust to 'wild' variations in face. Of course, the authors in [[2]](#references_cell) have trained the FaceModel on around 100M faces corresponding to 8M identities using triplet loss. We will not be able to replicate such training here. Instead we will directly use the parameters from the pre-trained model. We took help for this from [https://www.coursera.org/learn/convolutional-neural-networks]. So, let's load the pre-trained parameters. It will take around 3 to 4 minutes for this.</div>

In [13]:
# Run this cell to load parameters
torch.manual_seed(23)
face_model = FaceModel()
load_weights_from_FaceNet(face_model)

<div style = "text-align:justify"> For face verification, given a pair of faces, we forward propagate each of them through the *FaceModel* to obtain two 128d vectors respectively. Then we compute the distance between these two vectors. If the distance is smaller than a threshold, the pair matches; else mismatch.</div>
<br>
<div style = "text-align:justify"> For face recognition, given a input image (could be face or not), it's corresponding 128d descriptor is obtained by forward propagating it through *FaceModel*. Then the distance between this descriptor and the pre-computed descriptors of every face in the database is calculated. The given input is recognized as the face that has the smallest distance below a threshold among all distances computed. If the smallest distance is greater than the threshold, the given input is deemed to be not present in the database.</div>

<div style = "text-align:justify"> Certain utility functions like *img_to_encoding* that outputs the 128d face descriptor and a set of face images to form a mini database  are also borrowed from [https://www.coursera.org/learn/convolutional-neural-networks]. Below we pre-compute the face descriptors for images in the database. *database* is a dictionary whose keys are names and values are 128d descriptors. Face images in the database with their names are shown in Figure 3.

In [None]:
database = {}
database["danielle"] = img_to_encoding("images/danielle.png", face_model)
database["younes"] = img_to_encoding("images/younes.jpg", face_model)
database["tian"] = img_to_encoding("images/tian.jpg", face_model)
database["andrew"] = img_to_encoding("images/andrew.jpg", face_model)
database["kian"] = img_to_encoding("images/kian.jpg", face_model)
database["dan"] = img_to_encoding("images/dan.jpg", face_model)
database["sebastiano"] = img_to_encoding("images/sebastiano.jpg", face_model)
database["bertrand"] = img_to_encoding("images/bertrand.jpg", face_model)
database["kevin"] = img_to_encoding("images/kevin.jpg", face_model)
database["felix"] = img_to_encoding("images/felix.jpg", face_model)
database["benoit"] = img_to_encoding("images/benoit.jpg", face_model)
database["arnaud"] = img_to_encoding("images/arnaud.jpg", face_model)

<img src="images/database.png" style="width:450px;height:450px;">
<caption><center> <u> <font color='purple'> **Figure 3** </u><font color='purple'>  : **Face Database** </center></caption>
<br>

**You are required to complete the function *verify* below.**

In [11]:
# Replace None in the rhs by your code

def verify(image_path, identity, database, model):
    """
    Function that verifies if the person on the "image_path" image is "identity".
    
    Arguments:
    image_path -- path to an image
    identity -- string, name of the person you'd like to verify the identity. Has to be in the database
    database -- python dictionary mapping names of allowed people's names (strings) to their encodings (vectors).
    model -- your Inception model i.e FaceModel instance 
    """   
    
    # Step 1: Compute the encoding for the image. Use img_to_encoding(). It requires two parameters - 
                                         # image path and face model
    encoding = None
    
    # Step 2: Compute distance with identity's image. Use torch.norm
    dist = None   
    
    # Step 3: Matching if dist < 0.7 match, else no
    if dist < 0.7:
        match = True
        print("Face pair verified. It's matching. Distance is {}".format(dist))        
    else:
        match = False
        print("Face pair verified. It's not matching. Distance is {}".format(dist))
       
    return 

<div style = "text-align:justify"> Let's test your code on the following  pair (left, right) of images (see Figure 4). The left image is to be verified against the right one which is in the database.
<img src="images/verify_pairs.png" style="width:150px;height:400px;">
<caption><center> <u> <font color='purple'> **Figure 4** </u><font color='purple'>  : **Face pairs for verification** </center></caption>
<br>

In [64]:
# Run this cell to check if you get the expected output
verify("images/camera_5.jpg", "arnaud", database, face_model)
verify("images/camera_0.jpg", "younes", database, face_model)
verify("images/camera_2.jpg", "benoit", database, face_model)
verify("images/camera_3.jpg", "bertrand", database, face_model)
verify("images/dan.jpg", "danielle", database, face_model)

**Expected Output:**
<span style = "color:green">
<br>
Face pair verified. It's matching. Distance is 0.6057422377967915
<br>
Face pair verified. It's matching. Distance is 0.659650283722772
<br>
Face pair verified. It's matching. Distance is 0.23167025167812438
<br>
Face pair verified. It's matching. Distance is 0.35519912376121293
<br>
Face pair verified. It's not matching Distance is 1.3087521768296115</span>

<div style = "text-align:justify"> You can clearly see from the results that the face descriptor is doing very well in presence of illumination, pose, expression etc. We did a simple experiment to check the goodness of the descriptor. We took ten images per five different individuals, computed their descriptors and projected them onto a three dimensional space using PCA. This is shown in Figure 5. </div>

<img src="images/3dprojection.png" style="width:400px;height:400px;">
<caption><center> <u> <font color='purple'> **Figure 5** </u><font color='purple'>  : **3d projection of face descriptors** </center></caption>
<br>

Even with three dimensions we can see some kind of grouping among the faces. 

<div style = "text-align:justify">For face recognition we need to compare the closeness between i/p 128d descriptor and 128d descriptor of every image in the database. And then decide based on the distance if it is a face in the database and whose face it is. Since this is somewhat similar to verification with respect to coding and a simple exercise, we have done it for you in the function *who_is_it*.</div>

In [67]:
# Run this cell
who_is_it("images/camera_0.jpg", database, face_model)

<div style = "text-align:justify">
<span style = "color:purple">Congratulations on completing exercise 3! Now you must be famililar with basic CNN, ResNet and Inception Net. Also, now you know how to create convolution, pooling, batch normalization and linear layers in PyTorch. You also know how to initialize parameters using standard popular methods. Further, you must feel comfortable now to build a net and train it since you have familiarity with loss criteria and optimizers. And over all that, porting to GPU in PyTorch is so simple. What is missing in your repertoire is Data loading and preprocessing. We will see whether we can help you with this in this CNN tutorial through our next exercise. </span></div>

But before we go to next exercise, we will do one more experiment.

Consider the following pair of images.
<img src="images/bala.png" style="width:200px;height:150px;">
<caption><center> <u> <font color='purple'> **Figure 6** </u><font color='purple'>  : **Which is a face?** </center></caption>
<br>
Definitely the right image is not a face!! Left eye is in the place of mouth and vice versa. Let's add the left image to our database and see whether the *FaceModel* verifies the pair and also recognizes the right image when left image is in the database.

In [None]:
#run this cell
database["bala"] = img_to_encoding("images/bala96g.png", face_model)
verify("images/bala96.png", "bala", database, face_model)
who_is_it("images/bala96.png", database, face_model)

<div style = "text-align:justify"> You would see that it verified the pair as matching and also recognized the right image as the face in the left image. And the distance is also quite small compared to the threshold. That's the problem with CNN's. It only looks for mere presence of objects like eyes, nose etc but does not encode their spatial relationships. So, just the presence of eyes, nose etc irrespective of where they are can fool the conv net to believe it as a face. Solution to this problem is the latest one month old Hinton's **Capsule Nets**. This topic may be covered in the optional session in the night depending upon number of participants willing to attend. Now you can go ahead to the next excercise.

<a id='references_cell'></a>
### References
1. [Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan,Vincent Vanhoucke and Andrew Rabinovich - Going Deeper with Convolutions (2015)](https://arxiv.org/pdf/1409.4842.pdf)

2. [F. Schroff, D. Kalenichenko, and J. Philbin - FaceNet: A unified embedding for face recognition and clustering (2015)](https://arxiv.org/pdf/1503.03832.pdf)

3. [Glorot, X. & Bengio, Y. - Understanding the difficulty of training deep feedforward neural networks (2010)](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.207.2059&rep=rep1&type=pdf)
