Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory when restarting training process #106

Closed
lewfish opened this issue Aug 29, 2017 · 1 comment
Closed

Out of memory when restarting training process #106

lewfish opened this issue Aug 29, 2017 · 1 comment
Labels

Comments

@lewfish
Copy link
Contributor

lewfish commented Aug 29, 2017

In theory, if you run train_ec2.sh and exit before training completes, and then restart the job, it should pick up where it left off. But this doesn't actually work because on the second run, TF emits an out of memory error. We should isolate the exact conditions when this occurs and file an issue in the repo for TF Object Detection. We should also check to see if there's an issue already there.

@lewfish lewfish changed the title Fix problem with restarting training process Out of memory when restarting training process Aug 29, 2017
@lewfish lewfish added the bug label Aug 30, 2017
@lewfish lewfish added backlog and removed backlog labels Jun 5, 2018
@lewfish
Copy link
Contributor Author

lewfish commented Jul 26, 2018

This appears to be working locally now. We should check this on EC2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants