Mismatch Results of DNA_c #10

hongyuanyu · 2020-03-22T04:09:14Z

Hi,

Thanks for sharing the training code.
I try to retrain DNA_c with this config:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 ~/imagenet --model DNA_c \ --epochs 500 --warmup-epochs 5 --batch-size 128 --lr 0.064 --opt rmsproptf --opt-eps 0.001 --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema
After 500 epochs training, the best top1 accuracy is 77.2%, which is 0.6% lower than paper.
*** Best metric: 77.19799990478515 (epoch 458)

The text was updated successfully, but these errors were encountered:

jiefengpeng · 2020-03-22T07:46:44Z

Hi，hongyuanyu.
We implemented our training with 32x RTX2080ti, 64 batch size/gpu and optimizer_step every 2 iterations, so that we can guarntee a 4096 total batch size and initial lr 0.256 as efficientnets suggested. Small batch size and initial lr might reduce the final performance. You can try an optimizer_step every 4 iterations with 128 batch size/gpu and 0.256 lr to guarantee a big batch size.

changlin31 · 2020-03-22T07:54:51Z

Hi,

As for ImageNet retraining of the searched models, we used a similar protocol with EfficientNet [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs.

Our training config is:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 ~/imagenet --model DNA_c \ --epochs 500 --warmup-epochs 5 --batch-size 64 --lr 0.256 --opt rmsproptf --opt-eps 0.001 --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema with 4 nodes, i.e., 32 GPUs. And we step the optimizer every 2 training steps to simulate large training batch.
We achieve the highest top1 accuracy 77.77% at epoch 351.

The differences are the total batch size: 32x2X64=4096 vs. 8x128=1024. And we decrease the learning rate using the linear rule: lr = 0.256x1024/4096 = 0.064 in the suggested setting. This change in total batch size was intended for easier reproduce, but we can not guarantee the performance.

You can try enlarging your total batch size or step your optimizer less frequently as suggested by @jiefengpeng .

hongyuanyu · 2020-03-23T11:10:08Z

Thanks!

hongyuanyu closed this as completed Mar 23, 2020

ShunLu91 mentioned this issue Jun 15, 2021

Doubts about the retraining accuracy #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mismatch Results of DNA_c #10

Mismatch Results of DNA_c #10

hongyuanyu commented Mar 22, 2020

jiefengpeng commented Mar 22, 2020

changlin31 commented Mar 22, 2020 •

edited

hongyuanyu commented Mar 23, 2020

Mismatch Results of DNA_c #10

Mismatch Results of DNA_c #10

Comments

hongyuanyu commented Mar 22, 2020

jiefengpeng commented Mar 22, 2020

changlin31 commented Mar 22, 2020 • edited

hongyuanyu commented Mar 23, 2020

changlin31 commented Mar 22, 2020 •

edited