math.sqrt gets a negative argument #30

akhileshgotmare · 2019-09-15T06:25:01Z

Hi! I have been trying to train the TransformerXL language model (https://github.com/kimiyoung/transformer-xl/blob/master/pytorch/run_wt103_base.sh) with RAdam and I get *** ValueError: math domain error

Traceback (most recent call last):
  File "train.py", line 543, in <module>
    train()
  File "train.py", line 463, in train
    optimizer.step()
  File "/transformer-xl/pytorch/radam.py", line 69, in step
    N_sma * N_sma_max / (N_sma_max - 2))) / beta1_t
ValueError: math domain error

this is because the argument to math.sqrt is negative here -
https://github.com/LiyuanLucasLiu/RAdam/blob/master/language-model/model_word_ada/radam.py#L67

What would be the right fix for this? I tried math.sqrt(abs()) but that performs worse than adam.

The text was updated successfully, but these errors were encountered:

LiyuanLucasLiu · 2019-09-15T16:45:30Z

Hi thanks for bringing this up. Please use the file https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py as the optimizer, this file should be bug free and fix your error.

As to the language model training with sampled softmax, you need to use the sparse version of the optimizer. Maybe you can try to use RAdam for the dense part, and use sparseAdam for the sparse part. Also, as to the performance, if RAdam performs worse than the vanilla Adam, please tune the hyper-parameter (mainly the learning rate). This is important especially for the task having well-tuned hyper-parameters.

Hope it helps : -)

akhileshgotmare · 2019-09-15T16:57:07Z

Thanks for getting back! would you recommend https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py over https://github.com/LiyuanLucasLiu/RAdam/blob/master/cifar_imagenet/utils/radam.py for CIFAR training as well?

LiyuanLucasLiu · 2019-09-15T17:00:47Z

They should be the same function basically. If you check the code, the plainRadam is the one used in the cifar training.

LiyuanLucasLiu · 2019-09-16T01:51:07Z

Also, I've fixed the bug in the https://github.com/LiyuanLucasLiu/RAdam/blob/master/language-model/model_word_ada/radam.py#L67 : -)

akhileshgotmare · 2019-09-16T02:59:56Z

Thanks so much! I used the other script (https://github.com/LiyuanLucasLiu/RAdam/blob/master/radam.py) for training the TransformerXL (base) language model on wt103 and am able to get better performance (better train/val ppl after equal no of iters) than adam without any tuning of lr!

LiyuanLucasLiu closed this as completed Sep 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math.sqrt gets a negative argument #30

math.sqrt gets a negative argument #30

akhileshgotmare commented Sep 15, 2019

LiyuanLucasLiu commented Sep 15, 2019

akhileshgotmare commented Sep 15, 2019

LiyuanLucasLiu commented Sep 15, 2019

LiyuanLucasLiu commented Sep 16, 2019 •

edited

akhileshgotmare commented Sep 16, 2019 •

edited

math.sqrt gets a negative argument #30

math.sqrt gets a negative argument #30

Comments

akhileshgotmare commented Sep 15, 2019

LiyuanLucasLiu commented Sep 15, 2019

akhileshgotmare commented Sep 15, 2019

LiyuanLucasLiu commented Sep 15, 2019

LiyuanLucasLiu commented Sep 16, 2019 • edited

akhileshgotmare commented Sep 16, 2019 • edited

LiyuanLucasLiu commented Sep 16, 2019 •

edited

akhileshgotmare commented Sep 16, 2019 •

edited