A face detection algorithm joint multi-task using cascade structure in CNN. This repository containtraining code and testing code using Tensorflow architecture.
Quick overview of requirements: - Linux(ubuntu 16.04 my test) - tensorflow 1.0.1(or more high) - python 2.7 - opencv-python - easydict
- WIDER-FACE
- 300W-LP + Menpo-3D or (CelebA + AFLW)
Positive, Part, Negative sample label : image_name class_id bounding_box_regressor class_id: +1(Positive) -1(Part) 0(Negative) Negative bbox_regressor: 0 0 0 0(anything, but must keep 4 number) bounding_box_regressor: x1 y1 x2 y2(relative ground truth)
Landmark sample label: image_name class_id landmark_regressor Pose sample label: image_name class_id pose_regressor class_id: -2(Landmark) -3(Pose)
- Modfiy data_root_dir and save_dir_root of "prepare_data/gen_pnet_train_data.py" and "prepare_data/gen_pnet_val_data.py" .
- Select suitable parameters.
- Main options:
- IoU thresh
- how many negative samples per image
- pos_aug_ratio(without augment model performance is good. T^T)
-
./scripts/make_pnet_train_val_data.sh
- Modfiy "prepare_data/gen_imglist.py" **save_data_dir **and netSize
- According to self task design label in "prepare_data/multithread_create_tfrecords.py" _set_single_example
- Modify "scripts/make_tfrecords.sh" and run
- Adjustment "scripts/train_cls.sh" and run
- Modfiy "prepare_data/gen_hard_sample.py" set yourself fold root
- stage = 1 get PNet detect result and save it as pickle
- stage = 2 crop and save image patch
- Like PNet generate tfrecords
- Adjustment "scripts/train_cls.sh" and run
- Modfiy "prepare_data/gen_hard_sample.py" set yourself fold root, get classify samples
Select model file and suitable hyper parameters
python ./demo/mtcnn.py
-
lr = 0.01, lr_decay_scale = 0.1 and epoch [7 ,13] make lr decay
-
Small batch size(BS)
- Small BS can get higher recall than large BS in FDDB.
-
l2 regularizer is small
- Change normal 5e^-5 to 1e^-5, network less limit.
-
Add part samples in trianing stage
- Help to bounding box regression and indirect promote face classification.
-
Less channel
- There's negligible advance using more channel.
-
Optimizer
- Momentum optimizer in 0.9 momentum.
-
OHEM achieve, ohem ratio is 0.7
- BP top 70% of loss. The loss except part samples.
-
Focal loss VS. SoftmaxWithLoss
- SF is more less false positive and higher recall than FL.
-
ERC(Early recject classifier) and DR Layer
- New strategy in [2], but increase more parameters.
- PNet use conv1-s2 replace max pooling and relu6 replace prelu. But relu6 and reduce max pooling method lead to more false positive ratio and lower recall.
- K. Zhang, Z. Zhang, Z. Li and Y. Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. Signal Processing Letters, 23(10):1499–1503, 2016.
- K. Zhang, Z. Zhang, Z. Li and Y. Qiao. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. Signal Processing Letters, 23(10):1499–1503, 2016.
- V. Jain and E. Learned-Miller. FDDB: A benchmark for face detection in unconstrained settings. In Technical Report UMCS-2010-009, 2010.
- S. Yang, P. Luo, C.-C. Loy, and X. Tang. Wider face: A face detection benchmark. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.