ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48] #2

subzerofun · 2017-02-03T04:44:30Z

Hey thanks for the code – the results look great! Can't wait to try it myself :-)

Managed to install tensorflow with bazel from source - with CUDA 8 and cudnn 5 support (on osx 10.12 with nvidia gtx 780 with 6gb VRAM).

Can run tensorflow test files without errors. Cuda gets detected and works.

The CelebA dataset + txt files are in the dataset directory and seem to get loaded successfully.

But after starting dm_main.py these errors appear:

$ python3 dm_main.py --run train
   69714 source images selected
   78610 target images selected

Generator input (feature) size is 100 x 80 x 3 = 24000
Generator has 0.59M parameters

Discriminator input (feature) size is 100 x 80 x 3 = 24000
Discriminator has 0.84M parameters

Building testing model...
Done.

Model training...
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,100,80,48]
	 [[Node: BiasAdd_16 = BiasAdd[T=DT_FLOAT, data_format="NHWC", _device="/job:localhost/replica:0/task:0/gpu:0"](Conv2D_16, GENE/L052/bias/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dm_main.py", line 169, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "dm_main.py", line 160, in main
    dm_train.train_model(train_data)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_train.py", line 99, in train_model
    td.sess.run(minimize_ops)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,100,80,48]
	 [[Node: BiasAdd_16 = BiasAdd[T=DT_FLOAT, data_format="NHWC", _device="/job:localhost/replica:0/task:0/gpu:0"](Conv2D_16, GENE/L052/bias/read)]]

Caused by op 'BiasAdd_16', defined at:
  File "dm_main.py", line 169, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "dm_main.py", line 159, in main
    train_data = _get_train_data()
  File "dm_main.py", line 106, in _get_train_data
    train_model  = dm_model.create_model(sess, source_images, target_images, annealing, verbose=True)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_model.py", line 174, in create_model
    gene          = _generator_model(sess, source_images)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_model.py", line 103, in _generator_model
    _residual_block(model, nunits, mapsize)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_model.py", line 65, in _residual_block
    model.add_conv2d(num_units, mapsize=1, stride=1)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_arch.py", line 235, in add_conv2d
    out    = tf.nn.bias_add(out, bias)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1316, in bias_add
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 281, in _bias_add
    data_format=data_format, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2402, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48]
	 [[Node: BiasAdd_16 = BiasAdd[T=DT_FLOAT, data_format="NHWC", _device="/job:localhost/replica:0/task:0/gpu:0"](Conv2D_16, GENE/L052/bias/read)]]

Does the "ResourceExhaustedError" mean the gpu is not available for processing? Or that i don´t have enough free VRAM?

How could i reduce the memory required – what would i need to change in your files?

The text was updated successfully, but these errors were encountered:

subzerofun · 2017-02-03T16:18:23Z

Sorry, fixed it – i just had to restart the system and then had enough VRAM to complete the training.

subzerofun closed this as completed Feb 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48] #2

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48] #2

subzerofun commented Feb 3, 2017

subzerofun commented Feb 3, 2017

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48] #2

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48] #2

Comments

subzerofun commented Feb 3, 2017

subzerofun commented Feb 3, 2017