-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume from checkpoint #49
Comments
Sorry, typo in the above: But loss is not decreasing and Precision is also not improving as shown below: |
The reason is that we don't save optimizer in the checkpoint. Light CNN is trained by SGD with momentum. If you resume from the saved model, the weights are loaded from the checkpoint but the status of optimizer is not. You can modify these lines
And you also need to modify the lines of loading checkpoint
|
Thanks a lot. Sairam |
Sairam.
Hi, I trained lightccnn_29_v2 on MSceleb DB for 14 epochs. lr reduced from 0.001 to 0.0004575 at step size of 10. Validation Accuracy improved from 86 to 95.95 and Avg loss reduced from 11 to 0.28 after 14 epochs.
Now, when I resume from saved model, it starts well as shown below:
Test set: Average loss: 0.28508767582333855, Accuracy: (95.95295308032168)
But loss is decreasing and Precision is also not improving as shown below:
Epoch: [14][0/38671] Loss 0.1796 (0.1796) Prec@1 94.531 (94.531) Prec@5 97.656 (97.656)
Epoch: [14][100/38671] Loss 0.2190 (0.2752) Prec@1 93.750 (93.379) Prec@5 99.219 (98.337)
Epoch: [14][200/38671] Loss 0.4000 (0.3149) Prec@1 89.062 (92.405) Prec@5 96.875 (98.084)
......
Epoch: [14][6100/38671] Loss 0.8118 (0.5597) Prec@1 85.938 (86.939) Prec@5 92.188 (95.961)
Epoch: [14][6200/38671] Loss 0.7565 (0.5611) Prec@1 80.469 (86.915) Prec@5 93.750 (95.945)
Epoch: [14][6300/38671] Loss 0.6364 (0.5625) Prec@1 84.375 (86.893) Prec@5 96.875 (95.927)
Did you face the above issue? Could you help me what could be issue?
Thanks.
Darshan,SSSIHL.
The text was updated successfully, but these errors were encountered: