learning_rate becomes nan in rtho.py #7

stevenyu530 · 2021-05-05T11:29:32Z

Team,

I am trying to run your sample script ./bin/rtho.py, using the sample command in README.
python bin/rtho.py --network vgg --dataset cifar_10 --optimizer sgd --momentum 0.9 --hyper-lr 1e-8

After seeing val_cacurace increased in the first few epochs, learning_rate became nan. and training accuracy stuck at 10.0.
Sample output is attached at the end.

Running baseline.py using the provided command runs fine. Best val_accuracy will reach good value.

Can you help to advise what is causing the learning rate nan issue in rtho.py ?
Is is behaviour expected for the methods?

The text was updated successfully, but these errors were encountered:

orchidmajumder · 2021-06-16T21:57:21Z

It will most likely be due to the other hyperparameters. I recommend using a lower value for those.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

learning_rate becomes nan in rtho.py #7

learning_rate becomes nan in rtho.py #7

stevenyu530 commented May 5, 2021

orchidmajumder commented Jun 16, 2021

learning_rate becomes nan in rtho.py #7

learning_rate becomes nan in rtho.py #7

Comments

stevenyu530 commented May 5, 2021

orchidmajumder commented Jun 16, 2021