<br>
<h2 style = "font-size:40px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> Celebrity Face Recognition using VGGFace Model </h2>
<br>

In [None]:
from google.colab import drive
drive.mount('/content/drive')

[![Celebrity.png](https://i.postimg.cc/5yp4715n/Celebrity.png)](https://postimg.cc/xNJV8w4z)

<a id = '0'></a>
<h2 style = "font-family: garamond; font-size: 40px; font-style: normal; letter-spcaing: 3px; background-color: #007580; color : #fed049; border-radius: 5px 5px; text-align:center; font-weight: bold" >Table of Contents</h2>

1. [Overview](#1.0)
2. [Import the necessary libraries](#2.0)
3. [Data Collection](#3.0)
4. [Feature Engineering](#4.0)
	- [VGG Face model](#4.1)
	- [Generate embeddings for each image in the dataset](#4.2)
	- [Plot images and get distance between the pairs](#4.3)
	- [Create train and test sets](#4.4)
	- [Reduce dimensions using PCA](#4.5)
5. [Model Building and Validation](#5.0)
    - [Build a Machine Learning Classifier](#5.1)
    - [Validate Celebrity Images](#5.2)
6. [Conclusion](#6.0)

<a id = '1.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 1. Overview </h2>

### Project Description:

In this hands-on project, the goal is to build a face identification model to recognize faces.

### Data Description:

**Aligned Face Dataset from Pinterest**

This dataset contains 17534 images for 100 people. All images are taken from 'Pinterest' and aligned using dlib library.

### Objective:

In this problem, we use a pre-trained model trained on Face recognition to recognize similar faces. Here, we are particularly interested in recognizing whether two given faces are of the same person or not.

<a id = '2.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 2. Import the necessary libraries </h2>

In [None]:
import h5py
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# used to supress display of warnings
import warnings

from sklearn.metrics import precision_recall_curve,accuracy_score,f1_score,precision_score,recall_score

#### Setting Options

In [None]:
# suppress display of warnings
warnings.filterwarnings('ignore')

<a id = '3.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 3. Data Collection </h2>

In [None]:
import os
source_dir=('/content/drive/MyDrive/105_classes_pins_dataset')

<p style = "font-size:20px; color: #007580 "><strong> Function to load images </strong></p>
- Define a function to load the images from the extracted folder and map each image with person id


In [None]:
class IdentityMetadata():
    def __init__(self, base, name, file):
        self.base = base
        # identity name
        self.name = name
        # image file name
        self.file = file

    def __repr__(self):
        return self.image_path()

    def image_path(self):
        return os.path.join(self.base, self.name, self.file)

def load_metadata(path):
    metadata = []
    for i in os.listdir(path):
        for f in os.listdir(os.path.join(path, i)):
            # Check file extension. Allow only jpg/jpeg' files.
            ext = os.path.splitext(f)[1]
            if ext == '.jpg' or ext == '.jpeg':
                metadata.append(IdentityMetadata(path, i, f))
    return np.array(metadata)

# metadata = load_metadata('images')
metadata = load_metadata(source_dir)

In [None]:
print('metadata shape :', metadata.shape)

<p style = "font-size:20px; color: #007580 "><strong> Define a function to load an image </strong></p>
- Define a function to load image from the metadata

In [None]:
import cv2
def load_image(path):
    img = cv2.imread(path, 1)
    # OpenCV loads images with color channels
    # in BGR order. So we need to reverse them
    return img[...,::-1]

<p style = "font-size:20px; color: #007580 "><strong> Load a sample image</strong></p>
- Load one image using the function "load_image"

In [None]:
load_image('/content/drive/MyDrive/105_classes_pins_dataset/pins_Emilia Clarke/Emilia Clarke247_998.jpg')

<a id = '4.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 4. Feature Engineering </h2>

<a id = '4.1'></a>
<p style = "font-size:20px; color: #007580 "><strong> 4.1 VGG Face model </strong></p>
- Here we are giving you the predefined model for VGG face

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import ZeroPadding2D, Convolution2D, MaxPooling2D, Dropout, Flatten, Activation

def vgg_face():
    model = Sequential()
    model.add(ZeroPadding2D((1,1),input_shape=(224,224, 3)))
    model.add(Convolution2D(64, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(128, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(256, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(ZeroPadding2D((1,1)))
    model.add(Convolution2D(512, (3, 3), activation='relu'))
    model.add(MaxPooling2D((2,2), strides=(2,2)))

    model.add(Convolution2D(4096, (7, 7), activation='relu'))
    model.add(Dropout(0.5))
    model.add(Convolution2D(4096, (1, 1), activation='relu'))
    model.add(Dropout(0.5))
    model.add(Convolution2D(2622, (1, 1)))
    model.add(Flatten())
    model.add(Activation('softmax'))
    return model

<p style = "font-size:20px; color: #007580 "><strong> Load the model </strong></p>

- Load the model defined above
- Then load the given weight file named "vgg_face_weights.h5"

In [None]:
model = vgg_face()

model.load_weights('../input/vgg-face-weights/vgg_face_weights.h5')

<p style = "font-size:20px; color: #007580 "><strong> Get vgg_face_descriptor </strong></p>

In [None]:
model.layers[0], model.layers[-2]

In [None]:
from tensorflow.keras.models import Model
vgg_face_descriptor = Model(inputs=model.layers[0].input, outputs=model.layers[-2].output)

In [None]:
type(vgg_face_descriptor)

In [None]:
vgg_face_descriptor.inputs, vgg_face_descriptor.outputs

<a id = '4.2'></a>
<p style = "font-size:20px; color: #007580 "><strong> 4.2 Generate embeddings for each image in the dataset </strong></p>

- Given below is an example to load the first image in the metadata and get its embedding vector from the pre-trained model.

In [None]:
# Get embedding vector for first image in the metadata using the pre-trained model
img_path = metadata[0].image_path()
img = load_image(img_path)

# Normalising pixel values from [0-255] to [0-1]: scale RGB values to interval [0,1]
img = (img / 255.).astype(np.float32)
img = cv2.resize(img, dsize = (224,224))
print(img.shape)

# Obtain embedding vector for an image
# Get the embedding vector for the above image using vgg_face_descriptor model and print the shape
embedding_vector = vgg_face_descriptor.predict(np.expand_dims(img, axis=0))[0]
print(embedding_vector.shape)

In [None]:
embedding_vector[0], type(embedding_vector), type(embedding_vector[0])

In [None]:
embedding_vector[2], embedding_vector[98], embedding_vector[-2]

<p style = "font-size:20px; color: #007580 "><strong> Generate embeddings for all images </strong></p>

- Write code to iterate through metadata and create embeddings for each image using `vgg_face_descriptor.predict()` and store in a list with name `embeddings`

- If there is any error in reading any image in the dataset, fill the emebdding vector of that image with 2622-zeroes as the final embedding from the model is of length 2622.

In [None]:
total_images = len(metadata)

print('total_images :', total_images)

In [None]:
embeddings = np.zeros((metadata.shape[0], 2622))
for i, m in enumerate(metadata):
    img_path = metadata[i].image_path()
    img = load_image(img_path)
    img = (img / 255.).astype(np.float32)
    img = cv2.resize(img, dsize = (224,224))
    embedding_vector = vgg_face_descriptor.predict(np.expand_dims(img, axis=0))[0]
    embeddings[i]=embedding_vector

In [None]:
print('embeddings shape :', embeddings.shape)

In [None]:
embeddings[0], embeddings[988], embeddings[988].shape

In [None]:
embeddings[8275]

<p style = "font-size:20px; color: #007580 "><strong> Function to calculate distance between given 2 pairs of images </strong></p>

- Consider distance metric as "Squared L2 distance"
- Squared l2 distance between 2 points (x1, y1) and (x2, y2) = (x1-x2)^2 + (y1-y2)^2

In [None]:
def distance(emb1, emb2):
    return np.sum(np.square(emb1 - emb2))

<a id = '4.3'></a>
<p style = "font-size:20px; color: #007580 "><strong> 4.3 Plot images and get distance between the pairs </strong></p>

- 900, 901 and 900, 1001
- 1100, 1101 and 1100, 1300
- 1407, 1408 and 1408, 1602

In [None]:
def show_pair(idx1, idx2):
    plt.figure(figsize=(8,3))
    plt.suptitle(f'Distance between {idx1} & {idx2}= {distance(embeddings[idx1], embeddings[idx2]):.2f}')
    plt.subplot(121)
    plt.imshow(load_image(metadata[idx1].image_path()))
    plt.subplot(122)
    plt.imshow(load_image(metadata[idx2].image_path()));

show_pair(900, 901)
show_pair(900, 1001)

In [None]:
show_pair(1100, 1101)
show_pair(1100, 1300)

In [None]:
show_pair(1407, 1408)
show_pair(1408, 1602)

<a id = '4.4'></a>
<p style = "font-size:20px; color: #007580 "><strong> 4.4 Create train and test sets </strong></p>
- Create X_train, X_test and y_train, y_test
- Use train_idx to seperate out training features and labels
- Use test_idx to seperate out testing features and labels

In [None]:
train_idx = np.arange(metadata.shape[0]) % 9 != 0     #every 9th example goes in test data and rest go in train data
test_idx = np.arange(metadata.shape[0]) % 9 == 0

# one half as train examples of 10 identities
X_train = embeddings[train_idx]

# another half as test examples of 10 identities
X_test = embeddings[test_idx]
targets = np.array([m.name for m in metadata])

#train labels
y_train = targets[train_idx]

#test labels
y_test = targets[test_idx]

In [None]:
print('X_train shape : ({0},{1})'.format(X_train.shape[0], X_train.shape[1]))
print('y_train shape : ({0},)'.format(y_train.shape[0]))
print('X_test shape : ({0},{1})'.format(X_test.shape[0], X_test.shape[1]))
print('y_test shape : ({0},)'.format(y_test.shape[0]))

In [None]:
y_test[0], y_train[988]

In [None]:
len(np.unique(y_test)), len(np.unique(y_train))

<p style = "font-size:20px; color: #007580 "><strong> Encode the Labels </strong></p>
- Encode the targets
- Use LabelEncoder

In [None]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y_train_encoded = le.fit_transform(y_train)

In [None]:
print(le.classes_)
y_test_encoded = le.transform(y_test)

In [None]:
print('y_train_encoded : ', y_train_encoded)
print('y_test_encoded : ', y_test_encoded)

<p style = "font-size:20px; color: #007580 "><strong> Standardize the feature values </strong></p>
- Scale the features using StandardScaler

In [None]:
# Standarize features
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train_std = scaler.fit_transform(X_train)

In [None]:
X_test_std = scaler.transform(X_test)

<a id = '4.5'></a>
<p style = "font-size:20px; color: #007580 "><strong> 4.5 Reduce dimensions using PCA </strong></p>
- Reduce feature dimensions using Principal Component Analysis
- Set the parameter n_components=128

In [None]:
print('X_train_std shape : ({0},{1})'.format(X_train_std.shape[0], X_train_std.shape[1]))
print('y_train_encoded shape : ({0},)'.format(y_train_encoded.shape[0]))
print('X_test_std shape : ({0},{1})'.format(X_test_std.shape[0], X_test_std.shape[1]))
print('y_test_encoded shape : ({0},)'.format(y_test_encoded.shape[0]))

In [None]:
from sklearn.decomposition import PCA

pca = PCA(n_components=128)
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

<a id = '5.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 5. Model Building and Validation </h2>

<a id = '5.1'></a>
<p style = "font-size:20px; color: #007580 "><strong> 5.1 Build a Machine Learning Classifier </strong></p>

- Use SVM Classifier to predict the person in the given image
- Fit the classifier and print the score

In [None]:
from sklearn.svm import SVC

clf = SVC(C=5., gamma=0.001)
clf.fit(X_train_pca, y_train_encoded)

In [None]:
y_predict = clf.predict(X_test_pca)

In [None]:
print('y_predict : ',y_predict)
print('y_test_encoded : ',y_test_encoded)

In [None]:
y_predict_encoded = le.inverse_transform(y_predict)

In [None]:
print('y_predict_encoded : ',y_predict_encoded)

In [None]:
print('y_predict shape : ', y_predict.shape)
print('y_test_encoded shape : ', y_test_encoded.shape)

In [None]:
y_test_encoded[32:49]

In [None]:
# Find the classification accuracy
accuracy_score(y_test_encoded, y_predict)

**Accuracy Score: 96.455%**

<a id = '5.2'></a>
<p style = "font-size:20px; color: #007580 "><strong> 5.2 Validate Celebrity Images </strong></p>

- Take  401th  image from test set and plot the image
- Report to which person(folder name in dataset) the image belongs to

In [None]:
example_idx = 401

example_image = load_image(metadata[test_idx][example_idx].image_path())
example_prediction = y_predict[example_idx]
example_identity =  y_predict_encoded[example_idx]

plt.imshow(example_image)
plt.title(f'Identified as {example_identity}');

In [None]:
example_idx = 900

example_image = load_image(metadata[test_idx][example_idx].image_path())
example_prediction = y_predict[example_idx]
example_identity =  y_predict_encoded[example_idx]

plt.imshow(example_image)
plt.title(f'Identified as {example_identity}');

In [None]:
example_idx = 317

example_image = load_image(metadata[test_idx][example_idx].image_path())
example_prediction = y_predict[example_idx]
example_identity =  y_predict_encoded[example_idx]

plt.imshow(example_image)
plt.title(f'Identified as {example_identity}');

In [None]:
example_idx = -27

example_image = load_image(metadata[test_idx][example_idx].image_path())
example_prediction = y_predict[example_idx]
example_identity =  y_predict_encoded[example_idx]

plt.imshow(example_image)
plt.title(f'Identified as {example_identity}');

<a id = '6.0'></a>
<h2 style = "font-size:35px; font-family:Garamond ; font-weight : normal; background-color: #007580; color :#fed049   ; text-align: center; border-radius: 5px 5px; padding: 5px"> 6. Conclusion </h2>

1. This dataset contains 17534 images for 100 people. All images are taken from 'Pinterest' and aligned using dlib library.
2. Generated embeddings for all images using pre-trained VGG Face model.
3. Used "Squared L2 distance" to calculate the distance between given 2 pairs of images.
4. Encoded the target variables, standardize the features and reduced dimensions using PCA.
5. Used SVM classifier to predict the celebrity in a given image and achived a 96.455% accuracy.

- Reference Link for Template used in this notebook - https://www.kaggle.com/bhuvanchennoju/ancient-roots-of-agriculture-a-data-overview

<p style = "font-size:30px; color: #007580 ;background-color:  ; text-align: left; border-radius: 5px 5px; padding: 5px" ><strong> Thanks for reading 🙂</strong></p>