You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When encoding a large number of images the time to set reference images to the perceptual model takes longer, and eventually the script crashes with the following error:
019-02-25 21:02:48.244097: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 320.00MiB. Current allocation summary follows.
2019-02-25 21:02:48.252031: W tensorflow/core/common_runtime/bfc_allocator.cc:275] ****************************______
2019-02-25 21:02:48.257082: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:937 : Resource exhausted: OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/Conv2DBackpropInput}} = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/ShapeN, G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/mul, gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/add_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
When training on a batch of 1 image at a time it takes about 250 images before it crashes. When training on a batch of 5 images at a time it crashes on the 10th batch (50 images).
Is the perceptual model holding onto previous images? Could there be a memory leak somewhere? As far as I can tell the crash happens on the self.sess.run command of the optimize method. I also tried removing tqdm from the script but it still crashes during training.
The text was updated successfully, but these errors were encountered:
When encoding a large number of images the time to set reference images to the perceptual model takes longer, and eventually the script crashes with the following error:
019-02-25 21:02:48.244097: W tensorflow/core/common_runtime/bfc_allocator.cc:271] Allocator (GPU_0_bfc) ran out of memory trying to allocate 320.00MiB. Current allocation summary follows.
2019-02-25 21:02:48.252031: W tensorflow/core/common_runtime/bfc_allocator.cc:275] ****************************______
2019-02-25 21:02:48.257082: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at conv_grad_input_ops.cc:937 : Resource exhausted: OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[5,16,1024,1024] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/Conv2DBackpropInput}} = Conv2DBackpropInput[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/Conv2D_grad/ShapeN, G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/mul, gradients_10/G_synthesis_1/_Run/G_synthesis/ToRGB_lod0/add_grad/tuple/control_dependency)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
When training on a batch of 1 image at a time it takes about 250 images before it crashes. When training on a batch of 5 images at a time it crashes on the 10th batch (50 images).
Is the perceptual model holding onto previous images? Could there be a memory leak somewhere? As far as I can tell the crash happens on the self.sess.run command of the optimize method. I also tried removing tqdm from the script but it still crashes during training.
The text was updated successfully, but these errors were encountered: