Have you ever experienced this? When a group photograph has been taken, we will always disappointingly find that some people are looking away, some people are closing their eyes and some people are exactly wearing a sad expression in that photograph. Inspired by this paper, our project aims at synthesizing a perfect group photograph automatically from a given set of group photos.
The system pipeline is as follows:
We built our system on top of the Faster R-CNN. Here we used a TensorFlow implementation.
Note: You can see a better formatted report here
-
Tensorflow (see: Tensorflow). Please select appropriate version (GPU/CPU only) according to your machine.
-
Libraries you might not have:
dlib
-
Python packages you might not have:
cython
,python-opencv
,easydict
,ipython
- Clone the repository
# Make sure to clone with --recursive
git clone --recursive git@github.com:Yuliang-Zou/Automatic_Group_Photography_Enhancement.git
- Build the Cython modules
cd $ROOT/lib make
Pretrained ImageNet model npy, or this
Faster R-CNN model trained on VOC2007 ckpt, npy
Face Detection model ckpt, npy
Eye-closure and smile model ckpt
NOTE: You can use npy
files as initialization, while use ckpt
files to test and perform certain tasks. ckpt
files can be transformed into npy
, please check the code in $ROOT/lib/networks/newtork.py
Facial landmark model of dlib dat
In order to run the automatic enhancement code, you need to:
-
Finish Requires and Installation sections
-
Create a directory named
model
under the repository root, and download Eye-closure and smile model in it. -
Download facial landmark model and put it under root directory.
Then you can run:
python tools/enhance.py --model model/VGGnet_fast_rcnn_full_eye_smile_1e-4_iter_70000.ckpt
And you will find the synthesized output under root directory.
In this project, we used WIDER to train face detector, used FDDB to train eye-closure and smile classifier (and fine-tune the face detector simultaneously).
In order to use the data iterator of VOC2007, we provide annotations of both dataset:
WIDER: [Google Drive]
FDDB(face detection only): [Google Drive]
FDDB(with eye-closure and smile labels): [Google Drive]
**NOTE: ** some images in FDDB contain too many faces to annotate eye-closure and smile labels, we just ignore them.
- Download the training, validation, test data and VOCdevkit
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
- Extract all of these tars into one directory named
VOCdevkit2007
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar
- It should have this basic structure
$VOCdevkit2007/ # development kit
$VOCdevkit2007/VOCcode/ # VOC utility code
$VOCdevkit2007/VOC2007 # image sets, annotations, etc.
# ... and several other directories ...
- Create symlinks for the PASCAL VOC dataset
cd $FRCN_ROOT/data
ln -s $VOCdevkit VOCdevkit2007
- Create a folders for WIDER and FDDB
-
Move the
images/
folder of WIDER toVOCdevkit2007/VOC2007/JPEGImages/
, and rename it asWIDER/
-
Move the downloaded annotations of WIDER to
VOCdevkit2007/VOC2007/Annotations
(the folder should be named asWIDER/
) -
Move the two folders (
2002/
and2003/
) of FDDB toVOCdevkit2007/VOC2007/JPEGImages/
-
Move the downloaded annotations of FDDB to
VOCdevkit2007/VOC2007/Annotations
(You can't use old annotation and new annotation at the same time) -
Don't forget the set training/val/test set in
VOCdevkit2007/VOC2007/ImageSets/Main/
. (We here provide examples for you, you can download along with the annotation files)
-
If you want to train and test the face detector, you can clone the repository from the TensorFlow version Faster R-CNN, and modify some funtions in
$ROOT/lib/
to do this. -
If you want to train and test the eye-closure and smile utilities, you can run the following codes:
cd $FRCN_ROOT
python tools/train_net.py --weights model/VGGnet_fast_rcnn_wider_iter_70000.npy --imdb voc_2007_trainval --iters 100000 --cfg experiments/cfgs/faster_rcnn_end2end.yml --network VGGnet_train
or
cd $FRCN_ROOT
python tools/test_yl.py --model model/VGGnet_fast_rcnn_full_eye_smile_1e-4_iter_70000.ckpt --net VGGnet_test
- Face detector
The AP on the WIDER training set is 0.328
. The AP on the whole FDDB dataset is 0.902
.
Some examples:
(Green box: ground truth, red box: prediction)
- Eye-closure and smile classification
Some examples:
- Other results will be updated later...