Skip to content
This repository has been archived by the owner on Apr 3, 2022. It is now read-only.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48] #2

Closed
subzerofun opened this issue Feb 3, 2017 · 1 comment

Comments

@subzerofun
Copy link

Hey thanks for the code – the results look great! Can't wait to try it myself :-)

Managed to install tensorflow with bazel from source - with CUDA 8 and cudnn 5 support (on osx 10.12 with nvidia gtx 780 with 6gb VRAM).

Can run tensorflow test files without errors. Cuda gets detected and works.

The CelebA dataset + txt files are in the dataset directory and seem to get loaded successfully.

But after starting dm_main.py these errors appear:

$ python3 dm_main.py --run train
   69714 source images selected
   78610 target images selected

Generator input (feature) size is 100 x 80 x 3 = 24000
Generator has 0.59M parameters

Discriminator input (feature) size is 100 x 80 x 3 = 24000
Discriminator has 0.84M parameters

Building testing model...
Done.

Model training...
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1022, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1004, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/contextlib.py", line 66, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,100,80,48]
	 [[Node: BiasAdd_16 = BiasAdd[T=DT_FLOAT, data_format="NHWC", _device="/job:localhost/replica:0/task:0/gpu:0"](Conv2D_16, GENE/L052/bias/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dm_main.py", line 169, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "dm_main.py", line 160, in main
    dm_train.train_model(train_data)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_train.py", line 99, in train_model
    td.sess.run(minimize_ops)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 767, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 965, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,100,80,48]
	 [[Node: BiasAdd_16 = BiasAdd[T=DT_FLOAT, data_format="NHWC", _device="/job:localhost/replica:0/task:0/gpu:0"](Conv2D_16, GENE/L052/bias/read)]]

Caused by op 'BiasAdd_16', defined at:
  File "dm_main.py", line 169, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "dm_main.py", line 159, in main
    train_data = _get_train_data()
  File "dm_main.py", line 106, in _get_train_data
    train_model  = dm_model.create_model(sess, source_images, target_images, annealing, verbose=True)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_model.py", line 174, in create_model
    gene          = _generator_model(sess, source_images)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_model.py", line 103, in _generator_model
    _residual_block(model, nunits, mapsize)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_model.py", line 65, in _residual_block
    model.add_conv2d(num_units, mapsize=1, stride=1)
  File "/Users/david/github/image-manipulation/deep-makeover/dm_arch.py", line 235, in add_conv2d
    out    = tf.nn.bias_add(out, bias)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/nn_ops.py", line 1316, in bias_add
    return gen_nn_ops._bias_add(value, bias, data_format=data_format, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 281, in _bias_add
    data_format=data_format, name=name)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2402, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1264, in __init__
    self._traceback = _extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[16,100,80,48]
	 [[Node: BiasAdd_16 = BiasAdd[T=DT_FLOAT, data_format="NHWC", _device="/job:localhost/replica:0/task:0/gpu:0"](Conv2D_16, GENE/L052/bias/read)]]

Does the "ResourceExhaustedError" mean the gpu is not available for processing? Or that i don´t have enough free VRAM?

How could i reduce the memory required – what would i need to change in your files?

@subzerofun
Copy link
Author

Sorry, fixed it – i just had to restart the system and then had enough VRAM to complete the training.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant