mxnet dataset to tfrecordsbackbone network architectures [vgg16, vgg19, resnet]backbone network architectures [resnet-se, resnext]LResNet50E-IRLResNet100E-IRAdditive Angular Margin LossCosineFace Losstrain network codeadd validate during trainingmulti-gpu trainingcombine lossescontributed by RogerLo.- evaluate code
- If you can't use large batch size(>128), you should use small learning rate
- If you can't use large batch size(>128), you can try batch renormalization(file
L_Resnet_E_IR_RBN.py
) - If use multiple gpus, you should keep at least 16 images each gpu.
- Try Group Normalization, you can use the code
L_Resnet_E_IR_GBN.py
- Using the current model, and the lr schedule in
train_nets.py
, you can get the results asmodel c
- The bug about model size is 1.6G have fixed based on issues #9. If you want to get a small model, you should use
L_Resnet_E_IR_fix_issues9.py
- multi-gpu training code's bug have fixed. If you want to use the correct version, you should use
train_nets_mgpu_new.py
model name | depth | normalization layer | batch size | total_steps | download | password |
---|---|---|---|---|---|---|
model A | 50 | group normalization | 16 | 1060k | model a | 2q72 |
dbname | accuracy |
---|---|
lfw | 0.9897 |
cfp_ff | 0.9876 |
cfp_fp | 0.84357 |
age_db30 | 0.914 |
model name | depth | normalization layer | batch size | total_steps | download | password |
---|---|---|---|---|---|---|
model B | 50 | batch normalization | 16 | 1100k | model_b | h6ai |
dbname | accuracy |
---|---|
lfw | 0.9933 |
cfp_ff | 0.99357 |
cfp_fp | 0.8766 |
age_db30 | 0.9342 |
model name | depth | normalization layer | batch size | total_steps | download | password |
---|---|---|---|---|---|---|
model C | 50 | batch normalization | 16 | 1950k | model_c | 8mdi |
dbname | accuracy |
---|---|
lfw | 0.9963 |
cfp_ff | 0.99586 |
cfp_fp | 0.9087 |
age_db30 | 0.96367 |
model name | depth | normalization layer | batch size | total_steps | model_size | download | password |
---|---|---|---|---|---|---|---|
model D | 50 | batch normalization | 136 | 710k | 348.9MB | model_d | amdt |
dbname | accuracy |
---|---|
lfw | 0.9968 |
cfp_ff | 0.9973 |
cfp_fp | 0.9271 |
age_db30 | 0.9725 |
- TensorFlow 1.4 1.6
- TensorLayer 1.7
- cuda8&cudnn6 or cuda9&cudnn7
- Python3
GPU | cuda | cudnn | TensorFlow | TensorLayer | Maxnet | Gluon |
---|---|---|---|---|---|---|
Titan xp | 9.0 | 7.0 | 1.6 | 1.7 | 1.1.0 | 1.1.0 |
DL Tools | Max BatchSize(without bn and prelu) | Max BatchSize(with bn only) | Max BatchSize(with prelu only) | Max BatchSize(with bn and prelu) |
---|---|---|---|---|
TensorLayer | (8000, 9000) | (5000, 6000) | (3000, 4000) | (2000, 3000) |
Mxnet | (40000, 50000) | (20000, 30000) | (20000, 30000) | (10000, 20000) |
Gluon | (7000, 8000) | (3000, 4000) | no official method | None |
(8000, 9000) : 8000 without OOM, 9000 OOM Error
TensorLayer | Maxnet | Gluon |
---|---|---|
tensorlayer_batchsize_test.py | mxnet_batchsize_test.py | gluon_batchsize_test.py |