New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mismatch Results of DNA_c #10
Comments
Hi,hongyuanyu. |
Hi, As for ImageNet retraining of the searched models, we used a similar protocol with EfficientNet [30], i.e., a batch size of 4,096, an RMSprop optimizer with momentum 0.9, and an initial learning rate of 0.256 which decays by 0.97 every 2.4 epochs. Our training config is: The differences are the total batch size: 32x2X64=4096 vs. 8x128=1024. And we decrease the learning rate using the linear rule: lr = 0.256x1024/4096 = 0.064 in the suggested setting. This change in total batch size was intended for easier reproduce, but we can not guarantee the performance. You can try enlarging your total batch size or step your optimizer less frequently as suggested by @jiefengpeng . |
Thanks! |
Hi,
Thanks for sharing the training code.
I try to retrain DNA_c with this config:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 ~/imagenet --model DNA_c \ --epochs 500 --warmup-epochs 5 --batch-size 128 --lr 0.064 --opt rmsproptf --opt-eps 0.001 --sched step --decay-epochs 3 --decay-rate 0.963 --color-jitter 0.06 --drop 0.2 -j 8 --num-classes 1000 --model-ema
After 500 epochs training, the best top1 accuracy is 77.2%, which is 0.6% lower than paper.
*** Best metric: 77.19799990478515 (epoch 458)
The text was updated successfully, but these errors were encountered: