# COS 429 Final Project
## VGG Face

Initial setup:
- Create instance (p2.xlarge)
- `scp` the .caffemodel and .prototxt files over
- Create ssl cert and password for Jupyter notebook

To get this up and running on AWS (after initial setup):
- `sudo ssh -i thesis.pem -L 443:127.0.0.1:8888 ubuntu@...`
- `127.0.0.1`
- Password: cos429_russakovsky
- `source activate theano_p36`
- `conda install -c anaconda pillow`
- `conda install h5py`
- `conda install scikit-learn`
- `jupyter notebook`
- `scp -i cos429.pem *.py ubuntu@...:~/cos429/`

This uses the Keras weights (hard to get caffemodel and t7 files working for caffe2/pytorch) for VGG_FACE, which was converted from vgg-face matconvnet model using as shown here: https://gist.github.com/EncodeTS/6bbe8cb8bebad7a672f0d872561782d9.

Before stopping the instance, remember to download the latest .ipynb file for the GitHub. Terminate the instance to delete all files.

In [2]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

import time
import os
import sys  
os.environ['THEANO_FLAGS'] = "device=gpu1"    
import theano

from keras.models import Model
from keras.layers import Input, Convolution2D, ZeroPadding2D, MaxPooling2D, Flatten, Dense, Dropout, Activation

from keras import backend as K
K.set_image_dim_ordering('th')

from PIL import Image

Using Theano backend.


In [3]:
weights_path = 'vgg-face-keras.h5'

In [4]:
# This network architecture is derived from Table 3 of the CNN described in Parkhi et al. 
# and based on Keras code provided in https://gist.github.com/EncodeTS/6bbe8cb8bebad7a672f0d872561782d9

def vgg_face(weights_path=None):
    img = Input(shape=(3, 224, 224))

    pad1_1 = ZeroPadding2D(padding=(1, 1))(img)
    conv1_1 = Convolution2D(64, (3, 3), activation='relu', name='conv1_1')(pad1_1)
    pad1_2 = ZeroPadding2D(padding=(1, 1))(conv1_1)
    conv1_2 = Convolution2D(64, (3, 3), activation='relu', name='conv1_2')(pad1_2)
    pool1 = MaxPooling2D((2, 2), strides=(2, 2))(conv1_2)

    pad2_1 = ZeroPadding2D((1, 1))(pool1)
    conv2_1 = Convolution2D(128, (3, 3), activation='relu', name='conv2_1')(pad2_1)
    pad2_2 = ZeroPadding2D((1, 1))(conv2_1)
    conv2_2 = Convolution2D(128, (3, 3), activation='relu', name='conv2_2')(pad2_2)
    pool2 = MaxPooling2D((2, 2), strides=(2, 2))(conv2_2)

    pad3_1 = ZeroPadding2D((1, 1))(pool2)
    conv3_1 = Convolution2D(256, (3, 3), activation='relu', name='conv3_1')(pad3_1)
    pad3_2 = ZeroPadding2D((1, 1))(conv3_1)
    conv3_2 = Convolution2D(256, (3, 3), activation='relu', name='conv3_2')(pad3_2)
    pad3_3 = ZeroPadding2D((1, 1))(conv3_2)
    conv3_3 = Convolution2D(256, (3, 3), activation='relu', name='conv3_3')(pad3_3)
    pool3 = MaxPooling2D((2, 2), strides=(2, 2))(conv3_3)

    pad4_1 = ZeroPadding2D((1, 1))(pool3)
    conv4_1 = Convolution2D(512, (3, 3), activation='relu', name='conv4_1')(pad4_1)
    pad4_2 = ZeroPadding2D((1, 1))(conv4_1)
    conv4_2 = Convolution2D(512, (3, 3), activation='relu', name='conv4_2')(pad4_2)
    pad4_3 = ZeroPadding2D((1, 1))(conv4_2)
    conv4_3 = Convolution2D(512, (3, 3), activation='relu', name='conv4_3')(pad4_3)
    pool4 = MaxPooling2D((2, 2), strides=(2, 2))(conv4_3)

    pad5_1 = ZeroPadding2D((1, 1))(pool4)
    conv5_1 = Convolution2D(512, (3, 3), activation='relu', name='conv5_1')(pad5_1)
    pad5_2 = ZeroPadding2D((1, 1))(conv5_1)
    conv5_2 = Convolution2D(512, (3, 3), activation='relu', name='conv5_2')(pad5_2)
    pad5_3 = ZeroPadding2D((1, 1))(conv5_2)
    conv5_3 = Convolution2D(512, (3, 3), activation='relu', name='conv5_3')(pad5_3)
    pool5 = MaxPooling2D((2, 2), strides=(2, 2))(conv5_3)

    # These layers are used in the original VGG Face paper for their dataset of 2,622 individuals
    # The output of the previous layer is the 4096-dimensional face descriptor
    fc6 = Convolution2D(4096, (7, 7), activation='relu', name='fc6')(pool5)
    fc6_drop = Dropout(0.5)(fc6)
    fc7 = Convolution2D(4096, (1, 1), activation='relu', name='fc7')(fc6_drop)
    fc7_drop = Dropout(0.5)(fc7)
    fc8 = Convolution2D(2622, (1, 1), name='fc8')(fc7_drop)
    flat = Flatten()(fc8)
    out = Activation('softmax')(flat)

    model = Model(inputs=img, outputs=out)

    if weights_path:
        model.load_weights(weights_path)

    return model

# Returns model that for the 4096-dimensional face descriptor 
def partial_vgg_face():
    model = vgg_face(weights_path)
    layer_name = 'fc7'
    partial_model = Model(inputs=model.input,
                                 outputs=model.get_layer(layer_name).output)
    return partial_model

In [None]:
# Test the model by passing an image through it
im = Image.open('A.J._Buckley.jpg')
im = im.resize((224,224))
im = np.array(im).astype(np.float32)
# im[:,:,0] -= 129.1863
# im[:,:,1] -= 104.7624
# im[:,:,2] -= 93.5940
im = im.transpose((2,0,1))
im = np.expand_dims(im, axis=0)
print('Shape:', im.shape)

model = vgg_face(weights_path)
out = model.predict(im)
print(out[0][0])

In [None]:
# Test the partial model by passing an image through it
model = partial_vgg_face()
im = Image.open('A.J._Buckley.jpg')
im = im.resize((224,224))
im = np.array(im).astype(np.float32)
im = im.transpose((2,0,1))
im = np.expand_dims(im, axis=0)

descriptor = model.predict(im)
print(descriptor.shape)

In [6]:
model = vgg_face(weights_path)
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 3, 224, 224)       0         
_________________________________________________________________
zero_padding2d_1 (ZeroPaddin (None, 3, 226, 226)       0         
_________________________________________________________________
conv1_1 (Conv2D)             (None, 64, 224, 224)      1792      
_________________________________________________________________
zero_padding2d_2 (ZeroPaddin (None, 64, 226, 226)      0         
_________________________________________________________________
conv1_2 (Conv2D)             (None, 64, 224, 224)      36928     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 112, 112)      0         
_________________________________________________________________
zero_padding2d_3 (ZeroPaddin (None, 64, 114, 114)      0         
__________

In [7]:
% load_ext autoreload
% aimport experiment
% aimport manipulations
% autoreload 1

from sklearn.datasets import fetch_lfw_people
from scipy import ndimage

import manipulations
import experiment
from manipulations import ManipulationInfo

In [20]:
# Manipulations
# Based on Cathy's manipulations.py

def perform_manipulation(data, manipulation_info: ManipulationInfo):
    manipulation_type = manipulation_info.type
    manipulation_parameters = manipulation_info.parameters
    if manipulation_type == "none":
        return data
    elif manipulation_type == "occlude_lfw":
        occlusion_size = manipulation_parameters["occlusion_size"]
        return occlude_lfw_dataset(data, occlusion_size)
    elif manipulation_type == "radial_distortion":
        k = manipulation_parameters["k"]
        return radially_distort_lfw_dataset(data, k)
    elif manipulation_type == "blur":
        blurwindow_size = manipulation_parameters["blurwindow_size"]
        return blur_lfw_dataset(data, blurwindow_size)
    else:
        raise Exception("UNKNOWN MANIPULATION.")
        
def occlude_lfw_dataset(data, occlusion_size):
    num_images = data.shape[0]
    dataset_images = [add_occlusion(data[i], occlusion_size) for i in range(num_images)]
    return np.asarray(dataset_images)

def add_occlusion(input_image, occlusion_size):
    """Randomly selects an occlusion_size-by-occlusion_size square in the image
    and sets the pixels to random values between 0 and 256. Same value for all 3 channels."""
    max_value = 256
    max_i, max_j, ch = input_image.shape
    input_image = np.copy(input_image)
    start_i = np.random.randint(0, max_i-occlusion_size)
    start_j = np.random.randint(0, max_j-occlusion_size)
    occlusion_square = np.random.rand(occlusion_size, occlusion_size)*max_value
    input_image[start_i:start_i+occlusion_size,start_j:start_j+occlusion_size,0]=occlusion_square
    input_image[start_i:start_i+occlusion_size,start_j:start_j+occlusion_size,1]=occlusion_square
    input_image[start_i:start_i+occlusion_size,start_j:start_j+occlusion_size,2]=occlusion_square
    return input_image

def radially_distort_lfw_dataset(data, k):
    lfw_imageshape = (224, 224)
    distortion_array_i, distortion_array_j = create_radial_distortion_array(k, lfw_imageshape)
    num_images = data.shape[0]
    dataset_images = [radial_distortion(data[i], distortion_array_i, distortion_array_j) for i in range(num_images)]
    return np.asarray(dataset_images)

def radial_distortion(input_image, distortion_array_i, distortion_array_j):
    distorted_image = np.empty(input_image.shape)
    for i in range(distorted_image.shape[0]):
        for j in range(distorted_image.shape[1]):
            input_i = distortion_array_i[i, j]
            input_j = distortion_array_j[i, j]
            distorted_image[i,j,0] = input_image[input_i, input_j,0]
            distorted_image[i,j,1] = input_image[input_i, input_j,1]
            distorted_image[i,j,2] = input_image[input_i, input_j,2]
    return distorted_image

def create_radial_distortion_array(k, input_image_shape):
    # http://sprg.massey.ac.nz/pdfs/2003_IVCNZ_408.pdf
    # x_d = x_u / (1+kr_d^2)
    # Negative k for pincushion, positive k for barrel.
    i_max, j_max = input_image_shape
    i0 = int(i_max / 2)
    j0 = int(j_max / 2)
    distortion_array_i = np.zeros(input_image_shape, dtype='int')
    distortion_array_j = np.zeros(input_image_shape, dtype='int')
    for i in range(i_max):
        for j in range(j_max):
            i_bar = i - i0
            j_bar = j - j0
            r_squared = (i_bar * i_bar) + (j_bar * j_bar)
            i_input = i / (1+k*r_squared)
            j_input = j / (1+k*r_squared)
            if i_input < i_max and j_input < j_max and i_input >= 0 and j_input >= 0:
                distortion_array_i[i, j] = i_input
                distortion_array_j[i, j] = j_input
    return distortion_array_i, distortion_array_j

def blur_lfw_dataset(data, blurwindow_size):
    num_images = data.shape[0]
    dataset_images = [blur(data[i], blurwindow_size) for i in range(num_images)]
    return np.asarray(dataset_images)

def blur(input_image, blurwindow_size):
    blurred_image = np.empty(input_image.shape)
    blurred_image[:,:,0] = ndimage.percentile_filter(input_image[:,:,0], -50, blurwindow_size)
    blurred_image[:,:,1] = ndimage.percentile_filter(input_image[:,:,1], -50, blurwindow_size)
    blurred_image[:,:,2] = ndimage.percentile_filter(input_image[:,:,2], -50, blurwindow_size)
    return blurred_image

def blur_slow(input_image, blurwindow_size):
    blurred_image = np.empty(input_image.shape)
    blurwindow_halflength = blurwindow_size / 2
    image_imax, image_jmax = input_image.shape
    for i in range(image_imax):
        for j in range(image_jmax):
            blurwindow_imin = int(max(0, i-blurwindow_halflength))
            blurwindow_jmin = int(max(0, j-blurwindow_halflength))
            blurwindow_imax = int(min(image_imax, i+blurwindow_halflength))
            blurwindow_jmax = int(min(image_jmax, j+blurwindow_halflength))
            blurred_image[i,j,0] = np.mean(input_image[blurwindow_imin:blurwindow_imax,blurwindow_jmin:blurwindow_jmax,0])
            blurred_image[i,j,1] = np.mean(input_image[blurwindow_imin:blurwindow_imax,blurwindow_jmin:blurwindow_jmax,1])
            blurred_image[i,j,2] = np.mean(input_image[blurwindow_imin:blurwindow_imax,blurwindow_jmin:blurwindow_jmax,2])
    return blurred_image

In [None]:
# Test manipulations

im = Image.open('A.J._Buckley.jpg')
im = im.resize((224,224))
im = np.array(im).astype(np.float32)
im = np.expand_dims(im, axis=0)

#         ManipulationInfo("none", {}),
#         ManipulationInfo("occlude_lfw", {"occlusion_size": 20}),
#         ManipulationInfo("occlude_lfw", {"occlusion_size": 10}),
#         ManipulationInfo("occlude_lfw", {"occlusion_size": 30}),
#         ManipulationInfo("occlude_lfw", {"occlusion_size": 40}),
#         ManipulationInfo("radial_distortion", {"k": 0.00015}),
#         ManipulationInfo("radial_distortion", {"k": -0.00015}),
#         ManipulationInfo("radial_distortion", {"k": 0.0003}),
#         ManipulationInfo("radial_distortion", {"k": -0.0003}),
#         ManipulationInfo("radial_distortion", {"k": 0.0005}),
#         ManipulationInfo("radial_distortion", {"k": -0.0005}),
#         ManipulationInfo("blur", {"blurwindow_size": 5}),
#         ManipulationInfo("blur", {"blurwindow_size": 10})

imM = perform_manipulation(im, ManipulationInfo("radial_distortion", {"k": 0.00003}))
plt.imshow(imM[0])

In [21]:
# Get LFW dataset

def get_lfw_dataset(min_faces_per_person, manipulation_info: ManipulationInfo):
    dataset = fetch_lfw_people(
        min_faces_per_person=min_faces_per_person, 
        color=True, 
        slice_=(slice(0, 250, None), slice(0, 250, None)), 
        resize=0.896)
    data = dataset.images
    # data = manipulations.perform_manipulation(data, manipulation_info)
    # mean_face = np.mean(data, axis=0)
    # data = data - mean_face

    train_indices, test_indices = experiment.split_traintest(dataset.target)
    train_data = data[train_indices,:]
    train_targets = dataset.target[train_indices]
    test_data = perform_manipulation(data[test_indices,:], manipulation_info)
    test_targets = dataset.target[test_indices]

    # test_data = normalize(test_data, axis=1)
    # train_data = normalize(train_data, axis=1)
    # train_data, test_data, train_targets, test_targets = train_test_split(data, dataset.target)
    
    mean_face = [129.1863, 104.7624, 93.5940] # BGR
    
    train_data = train_data.transpose((0,3,1,2))
    train_data[:,0,:,:] = train_data[:,0,:,:] - mean_face[0]
    train_data[:,1,:,:] = train_data[:,1,:,:] - mean_face[1]
    train_data[:,2,:,:] = train_data[:,2,:,:] - mean_face[2]
#     train_data = train_data[:,::-1,:,:] # Flip to RGB? 
# Confusing because it seems like fetch_lfw_people() does some things to the original image 
# (coloring is off and the pixels are 0-255, not 0-1 as stated in documentation/their code)
    
    test_data = test_data.transpose((0,3,1,2))
    test_data[:,0,:,:] = test_data[:,0,:,:] - mean_face[0]
    test_data[:,1,:,:] = test_data[:,1,:,:] - mean_face[1]
    test_data[:,2,:,:] = test_data[:,2,:,:] - mean_face[2]
#     test_data = train_data[:,::-1,:,:] # Flip to RGB?
    
    return train_data, train_targets, test_data, test_targets

In [22]:
def get_descriptors(model, data):    
    descriptors = model.predict(data, verbose=1)
    return np.squeeze(descriptors)

In [23]:
def predict(mean_descriptors, descriptors, threshold=None):
    predictions = []
    for d in descriptors:
        distances = [np.linalg.norm(mean_descriptors[i] - d) for i in range(len(mean_descriptors))]
        predictions.append(np.argmin(distances))
    return np.asarray(predictions)

In [24]:
# Parkhi's paper "learns" a threshold value to determine whether f1 and f2 have the same identity
# We technically just need to find the face pairs with the smallest distance?
def find_threshold(mean_descriptors, descriptors):
    num_faces = len(mean_descriptors)
    num_examples_per_face = int(len(descriptors) / num_faces)
    
    distances = [np.linalg.norm(mean_descriptors[int(i/3)] - descriptors[i]) for i in range(len(descriptors))]
    threshold = max(distances)
    return threshold

In [25]:
def run_experiment(manipulation_info: ManipulationInfo):
    print('Loading model')
    model = partial_vgg_face()
    
    print('Loading dataset')
    min_faces_per_person = 20
    train_data, train_targets, test_data, test_targets = get_lfw_dataset(
        min_faces_per_person, manipulation_info=manipulation_info)
    
    # Train
    print('Training')
    time1 = time.clock()
    num_faces = len(np.unique(train_targets))
    num_examples_per_face = int(len(train_targets) / num_faces)
    train_descriptors = get_descriptors(model, train_data)
    test_descriptors = get_descriptors(model, test_data)
    mean_train_descriptors = np.mean(np.reshape(train_descriptors, (-1, num_examples_per_face, 4096)), axis=1)
    # threshold = find_threshold(mean_train_descriptors, train_descriptors)
    time2 = time.clock()
    train_time = time2 - time1
    
    # Test
    print('Testing')
    time1 = time.clock()
    train_predictions = predict(mean_train_descriptors, train_descriptors)
    train_accuracy = experiment.compute_accuracy(train_predictions, train_targets)
    # Predict test_descriptors
    test_predictions = predict(mean_train_descriptors, test_descriptors)
    test_accuracy = experiment.compute_accuracy(test_predictions, test_targets)
    time2 = time.clock()
    test_time = time2 - time1
    
    # Print results.
    num_faces = len(np.unique(train_targets))
    model_name = 'VGG_FACE'
    print("Manipulation info: %s" % str(manipulation_info))
    print("Recognition Algorithm: %s" % model_name)
    print("Number of distinct faces: %d" % num_faces)
    print("Chance rate: %f" % (1 / num_faces))
    print("Train accuracy: %f" % train_accuracy)
    print("Test accuracy: %f" % test_accuracy)
    print("Training Time: %s sec" % train_time)
    print("Testing Time: %s sec" % test_time)
    print("\n")

    
    return {
        "Manipulation Type": manipulation_info.type,
        "Manipulation Parameters": manipulation_info.parameters,
        "Recognition Algorithm": model_name,
        "Min Faces Per Person": min_faces_per_person,
        "Number of Distinct Faces": num_faces,
        "Chance Rate": (1 / num_faces),
        "Train Accuracy": train_accuracy,
        "Test Accuracy": test_accuracy,
        "Training Time": train_time,
        "Testing Time": test_time,
    }

In [26]:
manipulation_infos = [
        # ManipulationInfo("none", {}),
        ManipulationInfo("occlude_lfw", {"occlusion_size": 82}), # Adjusted for new image size
        ManipulationInfo("occlude_lfw", {"occlusion_size": 41}),
        ManipulationInfo("occlude_lfw", {"occlusion_size": 124}),
        ManipulationInfo("occlude_lfw", {"occlusion_size": 165}),
        ManipulationInfo("radial_distortion", {"k": 0.000008}), # Adjusted for new image size
        ManipulationInfo("radial_distortion", {"k": -0.000008}),
        ManipulationInfo("radial_distortion", {"k": 0.000016}),
        ManipulationInfo("radial_distortion", {"k": -0.000016}),
        ManipulationInfo("radial_distortion", {"k": 0.00003}),
        ManipulationInfo("radial_distortion", {"k": -0.00003}),
        ManipulationInfo("blur", {"blurwindow_size": 5}),
        ManipulationInfo("blur", {"blurwindow_size": 10}),
        ManipulationInfo("none", {})
    ]

for manipulation in manipulation_infos:
    stats = run_experiment(manipulation)
    print(stats)
    print()

Loading model
Loading dataset
Training
Testing
Manipulation info: ManipulationInfo(type='occlude_lfw', parameters={'occlusion_size': 82})
Recognition Algorithm: VGG_FACE
Number of distinct faces: 62
Chance rate: 0.016129
Train accuracy: 1.000000
Test accuracy: 0.406768
Training Time: 2427.5211830000003 sec
Testing Time: 1.958480999999665 sec


{'Manipulation Type': 'occlude_lfw', 'Manipulation Parameters': {'occlusion_size': 82}, 'Recognition Algorithm': 'VGG_FACE', 'Min Faces Per Person': 20, 'Number of Distinct Faces': 62, 'Chance Rate': 0.016129032258064516, 'Train Accuracy': 1.0, 'Test Accuracy': 0.4067677123722242, 'Training Time': 2427.5211830000003, 'Testing Time': 1.958480999999665}

Loading model
Loading dataset
Training
Testing
Manipulation info: ManipulationInfo(type='occlude_lfw', parameters={'occlusion_size': 41})
Recognition Algorithm: VGG_FACE
Number of distinct faces: 62
Chance rate: 0.016129
Train accuracy: 1.000000
Test accuracy: 0.773000
Training Time: 2430.146885 se

Loading dataset
Training
Testing
Manipulation info: ManipulationInfo(type='blur', parameters={'blurwindow_size': 5})
Recognition Algorithm: VGG_FACE
Number of distinct faces: 62
Chance rate: 0.016129
Train accuracy: 1.000000
Test accuracy: 0.816003
Training Time: 2405.032998999999 sec
Testing Time: 1.8945559999992838 sec


{'Manipulation Type': 'blur', 'Manipulation Parameters': {'blurwindow_size': 5}, 'Recognition Algorithm': 'VGG_FACE', 'Min Faces Per Person': 20, 'Number of Distinct Faces': 62, 'Chance Rate': 0.016129032258064516, 'Train Accuracy': 1.0, 'Test Accuracy': 0.81600281988015511, 'Training Time': 2405.032998999999, 'Testing Time': 1.8945559999992838}

Loading model
Loading dataset
Training
Testing
Manipulation info: ManipulationInfo(type='blur', parameters={'blurwindow_size': 10})
Recognition Algorithm: VGG_FACE
Number of distinct faces: 62
Chance rate: 0.016129
Train accuracy: 1.000000
Test accuracy: 0.515686
Training Time: 2405.975219 sec
Testing Time: 1.948157999999239