-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training fail with the following error #3
Comments
@FLC777 is there any suggested CUDA and pytorch Version to reproduce? Thanks. |
with CUDA_LAUNCH_BLOCKING=1 for debugging, the error came from the following one. GLAT/glat_plugins/criterions/glat_loss.py Line 63 in 6929c10
The same dataset can be run with levenshtein_transformer without any issue. |
problem resolved by using the latest fairseq commit for pytorch 1.8.1+ facebookresearch/fairseq@9549e7f |
Even if the above problem can be solved by rebasing to facebookresearch/fairseq@9549e7f with pytorch 1.8.1+, another problem would come up. The training loss would be terrible (seemed to be reset to initial parameters) and won't change from the second epoch, (note the first epoch worked as expected). Is there any suggestion or other way to solve this issue? cc @FLC777 |
The text was updated successfully, but these errors were encountered: