This is the implementation of paper <Additive Margin Softmax for Face Verification>
Training logic is highly inspired by Sandberg's Facenet, check it if you are interested.
model structure can be found at ./models/resface.py
loss head can be found at AM-softmax.py
Step1: Align Dataset
See folder "align", this totally forked from insightface. The default image size is (112,96), in this repository, all trained faces share same size (112,96). Use align code to align your train data and validation data (like lfw) first. You can use align_lfw.py to align both training set and lfw, don't worry about others like align_insight, align_dlib.
Step2: Train AM-softmax
python align_lfw.py --input-dir [train data dir] --output-dir [aligned output dir]
Read parse_arguments() function carefully to confiure parameters. If you are new in face recognition, after aligning dataset, simply run this code, the default settings will help you solve the rest.
python train.py --data_dir [aligned train data] --random_flip --learning_rate -1 --learning_rate_schedule_file ./data/learning_rate_AM_softmax.txt --lfw_dir [aligned lfw data] --keep_probability 0.8 --weight_decay 5e-4
Also watch out that acc on lfw is not from cross validation. Read source code for more detail. Thanks Sandberg again for his extraordinary code.
||Currently it only reaches 97.6%. There might be some bugs, or some irregular preprocessings, when it reaches > 99%, detail configuration will be posted here.
||Now acc on lfw reaches 99.3% with only use resface36 and flipped-concatenate validation.
||After fixing bugs in training code, finally resface20 can reach 99.33% which only took 4 hours to converge. Notice:This model is trained on vggface2 without removing overlaps between vggface2 and lfw, so the performance is little higher than reported in orginal paper 98.98%(m=0.35) which trained on casia whose overlaps with lfw are removed.
||Using L-Resnet50E-IR which was proposed in can reach 99.42%. Also I noticed that alignment method is crucial to accuracy. The quality of alignment algorithm might be the bottleneck of modern face recognition system.
||Just for fun, I tried m=0.2 with Resface20, acc on lfw reaches 99.47%. All experimens that I've done used AdamOptimizer without weight decay, SGD(with/without momentum) or RMSProp actually performed really bad in my experiments. My assumption is the difference of implementation of optimizer inside different frameworks (e.g. caffe and tf).
||Add training logic and align code.
||Fix bugs in evaluation code. Upload new/deeper model "LRenet50E_IR" proposed in which performs better than resface20 and 36.
||Recently I revisiting this code and found that "weight_decay" settings for last fc layer is wrong, which lead to previous weird experiment conclusion. Now it's been fixed. And to follow standard evaluation protocal on lfw, evaluation code has been modified. The latest experiment result is updated here: Resface20(bn) + vggface2 + weight_decay5e-4 + batch_size256 + momentum achieves 0.995+-0.003 on lfw. Further more, with this code, it's easy to use some deeper models to achieve 99.7%+ on lfw. One big problem of this code is that it will load the name list of all images in cache at the begining, which will take very huge memory space. Also current dataset are composed of so small image files which will lead to low efficiency when load and transmit them. Thus tfrecord is recommanded to speed up training process.
Adam w/o weight_decay:
Momemtum with weight_decay:
My Chinese blog about Face Recognition system
It includes the experimental details of this repo. Welcome and share your precious advice!