Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encode_images.py runs out of memory on image sequence #3

Closed
sam598 opened this issue Feb 26, 2019 · 2 comments
Closed

encode_images.py runs out of memory on image sequence #3

sam598 opened this issue Feb 26, 2019 · 2 comments

Comments

@sam598
Copy link

sam598 commented Feb 26, 2019

When encoding a large number of images the time to set reference images to the perceptual model takes longer, and eventually the script crashes with the following error:

019-02-25 21:02:48.244097: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 320.00MiB. Current allocation summary follows.
2019-02-25 21:02:48.252031: W tensorflow/core/common_runtime/bfc_allocator.cc:275] ****************************______
2019-02-25 21:02:48.257082: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:937 : Resource exhausted: OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/Conv2DBackpropInput}} = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/ShapeN, G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/mul, gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/add_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

When training on a batch of 1 image at a time it takes about 250 images before it crashes. When training on a batch of 5 images at a time it crashes on the 10th batch (50 images).

Is the perceptual model holding onto previous images? Could there be a memory leak somewhere? As far as I can tell the crash happens on the self.sess.run command of the optimize method. I also tried removing tqdm from the script but it still crashes during training.

@sam598
Copy link
Author

sam598 commented Feb 26, 2019

I found this thread: tensorflow/tensorflow#4151 which I think solves the issue.

@SystemErrorWang
Copy link

I found this thread: tensorflow/tensorflow#4151 which I think solves the issue.

I got similar problem, tried your modification but failed, as other parts of the codes are heavily modified and created conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants