New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
math.sqrt gets a negative argument #30
Comments
Hi thanks for bringing this up. Please use the file https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py as the optimizer, this file should be bug free and fix your error. As to the language model training with sampled softmax, you need to use the sparse version of the optimizer. Maybe you can try to use RAdam for the dense part, and use sparseAdam for the sparse part. Also, as to the performance, if RAdam performs worse than the vanilla Adam, please tune the hyper-parameter (mainly the learning rate). This is important especially for the task having well-tuned hyper-parameters. Hope it helps : -) |
Thanks for getting back! would you recommend https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py over https://github.com/LiyuanLucasLiu/RAdam/blob/master/cifar_imagenet/utils/radam.py for CIFAR training as well? |
They should be the same function basically. If you check the code, the plainRadam is the one used in the cifar training. |
Also, I've fixed the bug in the https://github.com/LiyuanLucasLiu/RAdam/blob/master/language-model/model_word_ada/radam.py#L67 : -) |
Thanks so much! I used the other script (https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py) for training the TransformerXL (base) language model on wt103 and am able to get better performance (better train/val ppl after equal no of iters) than adam without any tuning of lr! |
Hi! I have been trying to train the TransformerXL language model (https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/run_wt103_base.sh) with RAdam and I get *** ValueError: math domain error
this is because the argument to math.sqrt is negative here -
https://github.com/LiyuanLucasLiu/RAdam/blob/master/language-model/model_word_ada/radam.py#L67
What would be the right fix for this? I tried math.sqrt(abs()) but that performs worse than adam.
The text was updated successfully, but these errors were encountered: