New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer learning fails and cannot be restarted #44
Comments
Let me check with our pytorch frameworks team. I've never seen this before. Any chance I can get you to run on a pytorch docker container with cuda 9.0? |
what version of pytorch are you using as well? |
I am using pytorch 0.4.1. It seems to be related to Automatic Suspend in Ubuntu. I've disabled it and it has been training without error since last night. I will try with cuda 9.0 as soon as it either fails or is done (likely tomorrow) but I don't want to mess with it right now. |
ok thanks for letting us know. Gonna close this, hopefully not too many ppl have automatic suspend set. |
Thanks. Sorry I didn't get to more testing. I'll try to find a solution to this in the future and create a pull request. Anyways the workaround seems solid. |
I have trained a model on my text corpus (
full_model.pt
) and want to see now how well it does with a labeled dataset. So I labeled the data and ran the following:When I try to restart the training it fails immediately with error:
Some more details:
Any ideas?
The text was updated successfully, but these errors were encountered: