Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math.sqrt gets a negative argument #30

Closed
akhileshgotmare opened this issue Sep 15, 2019 · 5 comments
Closed

math.sqrt gets a negative argument #30

akhileshgotmare opened this issue Sep 15, 2019 · 5 comments

Comments

@akhileshgotmare
Copy link
Contributor

Hi! I have been trying to train the TransformerXL language model (https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/run_wt103_base.sh) with RAdam and I get *** ValueError: math domain error

Traceback (most recent call last):
  File "train.py", line 543, in <module>
    train()
  File "train.py", line 463, in train
    optimizer.step()
  File "/transformer-xl/pytorch/radam.py", line 69, in step
    N_sma * N_sma_max / (N_sma_max - 2))) / beta1_t
ValueError: math domain error

this is because the argument to math.sqrt is negative here -
https://github.com/LiyuanLucasLiu/RAdam/blob/master/language-model/model_word_ada/radam.py#L67

What would be the right fix for this? I tried math.sqrt(abs()) but that performs worse than adam.

@LiyuanLucasLiu
Copy link
Owner

Hi thanks for bringing this up. Please use the file https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py as the optimizer, this file should be bug free and fix your error.

As to the language model training with sampled softmax, you need to use the sparse version of the optimizer. Maybe you can try to use RAdam for the dense part, and use sparseAdam for the sparse part. Also, as to the performance, if RAdam performs worse than the vanilla Adam, please tune the hyper-parameter (mainly the learning rate). This is important especially for the task having well-tuned hyper-parameters.

Hope it helps : -)

@akhileshgotmare
Copy link
Contributor Author

Thanks for getting back! would you recommend https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py over https://github.com/LiyuanLucasLiu/RAdam/blob/master/cifar_imagenet/utils/radam.py for CIFAR training as well?

@LiyuanLucasLiu
Copy link
Owner

They should be the same function basically. If you check the code, the plainRadam is the one used in the cifar training.

@LiyuanLucasLiu
Copy link
Owner

LiyuanLucasLiu commented Sep 16, 2019

Also, I've fixed the bug in the https://github.com/LiyuanLucasLiu/RAdam/blob/master/language-model/model_word_ada/radam.py#L67 : -)

@akhileshgotmare
Copy link
Contributor Author

akhileshgotmare commented Sep 16, 2019

Thanks so much! I used the other script (https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py) for training the TransformerXL (base) language model on wt103 and am able to get better performance (better train/val ppl after equal no of iters) than adam without any tuning of lr!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants