Skip to content

Latest commit

 

History

History

facenet

Table of Content

FaceNet

FaceNet is a face recognition system developed in 2015 by researchers at Google that achieved then state-of-the-art results. It maps the face images to euclidean space and learns on the L2 distance between the embeddings. Our paper notes on FaceNet can be found here.

This is a pytorch implementation of FaceNet paper with ResNet as the backbone architechture. At first the implementation was done on AT&T Dataset of Faces, then on LFW Dataset. We used online triplet mining method for selecting triplets.

W&B

Wandb was used throughout this part of the project for metric tracking, hyperparameter tuning, sweeps, visualization, etc.


(a) Metrics of ResNet18 on LFW (b) Sweeps of ResNet18 on ATT

AT&T Faces Open In Collab

The dataset was split in 35 training classes and 5 test classes

Training

Parameter Value
Architechture ResNet18
Embeddings Dimension 64
No. of Learnable Parameters 11,209,344
Epochs 200
Learning Rate 0.0002
Optimizer Adam
Batch Size 100
Margin 1

Results

Results Train Set Test Set
Accuracy 1.0 0.984
Recall 1.0 0.978
Precision 1.0 0.936
ROC area under curve 1.0 0.981
Euclidean Distance Threshold 0.91 0.89

Plots

EpochLoss EER Curve ROC curve ROC Curve t-SNE Embeddings
(a) Epoch Loss. (b) EER Curve. (c) t-SNE Emdeddings.
(d) ROC Curve on train set. (e) ROC Curve on test set

LFW Open In Collab

Deep Funneled set of LFW images was used for training and evaluation purpose.

The faces were extracted by center crop and then resized to match input shape. Further they were normalized overall data's mean and standard deviation.

MEAN = torch.Tensor([0.5929, 0.4496, 0.3654])
STD = torch.Tensor([0.2287, 0.1959, 0.1876])
transform = transforms.Compose([
    transforms.CenterCrop((128,98)),
    transforms.Resize((224,224)),
    transforms.ToTensor(),
    transforms.Normalize(mean=MEAN, std=STD),
])

LFWDataset.py contains the custom dataset classes for loading LFW data in all configurations. This dataset class was later contributed to Torchvision library.

Training Configuration

Architechture Embeddings
Dimension
No. of Learnable
Parameters
Epochs Learning Rate Batch Size
Training ResNet-18 128 11,242,176 200 0.002
(Reduced by factor of 2 every 50 epochs)
256

To train run

train.py --config configs/resnet18lfw.yml --data_dir ../datasets/lfw  --wandb true

To resume training

train.py --config configs/resnet18lfw.yml --data_dir ../datasets/lfw  --wandb true --resume "checkpoints/model_resnet18_triplet_epoch_120_08-Dec 15:57.pt" 

Model State Dict

state = {
    'epoch': epoch+1,
    'embedding_dimension': p.fc_layer_size,
    'batch_size_training': p.batch_size,
    'model_state_dict': model.state_dict(),
    'model_architecture': p.backbone,
    'optimizer_state_dict': optimizer.state_dict(),
    'scheduler_state_dict': scheduler.state_dict(),
    'best_distance_threshold': best_threshold,
    'accuracy':accuracy
}

Results

Accuracy Precision Recall ROC
Area Under Curve
Euclidean
Distance
TAR @ FAR=1e-2
88.35% 88.46% 88.23% 0.9508 1.104 61.07%

Plots

EpochLoss ROC curve EpochLoss44 ROC Curve
(a) Epoch Loss (b) ROC Curve (c) Accuracy (d) Learning Rate