GPU memory error doing inference on GPU with some version combination #77

jeromelecoq · 2021-12-03T19:59:30Z

Doing long inference with Tensorflow 2.7, Python 3.9 can cause gpu out of memory errors like so:

2021-12-01 13:52:12.086386: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 28311552 totalling 27.00MiB
2021-12-01 13:52:12.086398: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 50568192 totalling 48.23MiB
2021-12-01 13:52:12.086405: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 24 Chunks of size 314572800 totalling 7.03GiB
2021-12-01 13:52:12.086412: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 335544320 totalling 320.00MiB
2021-12-01 13:52:12.086419: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] 1 Chunks of size 671088640 totalling 640.00MiB
2021-12-01 13:52:12.086426: I tensorflow/core/common_runtime/bfc_allocator.cc:1078] Sum Total of in-use chunks: 8.11GiB
2021-12-01 13:52:12.086433: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] total_region_allocated_bytes_: 10919215104 memory_limit_: 10919215104 available bytes: 0 curr_region_allocation_bytes_: 21838430208
2021-12-01 13:52:12.086445: I tensorflow/core/common_runtime/bfc_allocator.cc:1086] Stats:
Limit: 10919215104
InUse: 8706758144
MaxInUse: 9860656896
NumAllocs: 7155
MaxAllocSize: 3528458240
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0

2021-12-01 13:52:12.086459: W tensorflow/core/common_runtime/bfc_allocator.cc:474] *************************************************************_________________
2021-12-01 13:52:12.086525: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at concat_op.cc:158 : RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[5,192,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/jeromel/Documents/Projects/Deep2P/repos/fine_tuning_jobs/2021-12-01-inference_main.py", line 45, in
inference_obj.run()
File "/home/jeromel/Documents/Projects/Deep2P/repos/new_deepinterpolation/deepinterpolation/deepinterpolation/cli/inference.py", line 52, in run
inferrence_class.run()
File "/home/jeromel/Documents/Projects/Deep2P/repos/new_deepinterpolation/deepinterpolation/deepinterpolation/inferrence_collection.py", line 246, in run
predictions_data = self.model.predict(local_data[0])
File "/allen/programs/braintv/workgroups/nc-ophys/Jeromel/conda/tf2-7-deepinterp-py39/lib/python3.9/site-packages/keras-2.7.0-py3.9.egg/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/allen/programs/braintv/workgroups/nc-ophys/Jeromel/conda/tf2-7-deepinterp-py39/lib/python3.9/site-packages/tensorflow-2.7.0-py3.9-linux-x86_64.egg/tensorflow/python/eager/execute.py", line 58, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[5,192,512,512] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_1/concatenate_4/concat
(defined at /allen/programs/braintv/workgroups/nc-ophys/Jeromel/conda/tf2-7-deepinterp-py39/lib/python3.9/site-packages/keras-2.7.0-py3.9.egg/keras/backend.py:3224)
]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[Op:__inference_predict_function_677]

jeromelecoq · 2021-12-03T20:01:03Z

The key to this error is RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[5,192,512,512]

This size array is not requested at any point in the network architecture, which suggests that some memory leakage is occurring in the inference for loop across batches

jeromelecoq · 2021-12-03T20:02:51Z

For large datasets, the expected use of the .predict function in tensorflow is to feed in the entire dataset and the internal system would loop through it, creating internal batches.

However since our datasets can be exceedingly large (60GB or more), we can't rely on having everyone equipped with 100GB (or more of RAM) for the sake of doing inference. So I initially broke down inference in batches.

See here for the function call :

deepinterpolation/deepinterpolation/inferrence_collection.py

Line 246 in 8a7834c

predictions_data = self.model.predict(local_data[0])

jeromelecoq · 2021-12-03T20:06:45Z

It turns out tensorflow now has a predict_on_batch function, which is expected to be used in those cases :
https://www.tensorflow.org/api_docs/python/tf/keras/Model#predict_on_batch

I found out that just dropping this function in tensorflow 2.7 at the exact line number mentioned above removes this memory leak error

jeromelecoq · 2021-12-03T20:09:04Z

I will make a PR with this fix and deploy to the main branch. A package release should be done if it proves to be compatible with older tensorflow versions.

asumser · 2021-12-07T00:26:27Z

I ran into the same issue (Windows 10, Python 3.7, tensorflow 2.4.4). my gpu is not great though and thus could be the actual culprit. when I changed to predict_on_batch I got a different error. Don't know if that is related though...

2021-12-07 01:04:37.560685: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
INFO:Training:wrote D:\TestDeepInterpolation\testdir_cli\2021_12_07_01_04_training_full_args.json
INFO:Training:wrote D:\TestDeepInterpolation\testdir_cli\2021_12_07_01_04_training.json
INFO:Training:wrote D:\TestDeepInterpolation\testdir_cli\2021_12_07_01_04_generator.json
INFO:Training:wrote D:\TestDeepInterpolation\testdir_cli\2021_12_07_01_04_network.json
INFO:Training:wrote D:\TestDeepInterpolation\testdir_cli\2021_12_07_01_04_test_generator.json
2021-12-07 01:04:59.217073: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-12-07 01:04:59.218142: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-12-07 01:04:59.241052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.455GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 104.43GiB/s
2021-12-07 01:04:59.241223: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:04:59.248164: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-12-07 01:04:59.248291: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-12-07 01:04:59.252300: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-12-07 01:04:59.254054: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-12-07 01:04:59.262101: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-12-07 01:04:59.407947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-12-07 01:04:59.409217: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-12-07 01:04:59.409409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-12-07 01:04:59.409871: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-12-07 01:04:59.410653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1050 computeCapability: 6.1
coreClock: 1.455GHz coreCount: 5 deviceMemorySize: 2.00GiB deviceMemoryBandwidth: 104.43GiB/s
2021-12-07 01:04:59.410817: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:04:59.411169: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-12-07 01:04:59.411543: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-12-07 01:04:59.411845: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-12-07 01:04:59.412182: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-12-07 01:04:59.412520: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusolver64_10.dll
2021-12-07 01:04:59.412861: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-12-07 01:04:59.413173: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-12-07 01:04:59.413268: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-12-07 01:04:59.862239: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-12-07 01:04:59.862380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267] 0
2021-12-07 01:04:59.862763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0: N
2021-12-07 01:04:59.863269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1326 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0, compute capability: 6.1)
2021-12-07 01:04:59.865190: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
WARNING:tensorflow:period argument is deprecated. Please use save_freq to specify the frequency in number of batches seen.
WARNING:tensorflow:period argument is deprecated. Please use save_freq to specify the frequency in number of batches seen.
INFO:Training:created objects for training
2021-12-07 01:05:00.392682: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
Epoch 1/17
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
WARNING:tensorflow:multiprocessing can interact badly with TensorFlow, causing nondeterministic deadlocks. For high performance data pipelines tf.data is recommended.
2021-12-07 01:05:03.109362: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:05:13.244850: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:05:53.199917: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:06:38.756539: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:07:25.575444: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:07:53.252457: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:08:13.651400: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:08:57.551788: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:09:20.196383: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:09:47.127514: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:10:37.204455: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:11:13.151584: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:11:48.301402: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:12:25.945740: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:13:02.681334: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:13:51.455925: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-12-07 01:14:15.050249: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudnn64_8.dll
2021-12-07 01:14:16.152272: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2021-12-07 01:14:16.154185: E tensorflow/stream_executor/cuda/cuda_dnn.cc:336] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2021-12-07 01:14:16.155755: W tensorflow/core/framework/op_kernel.cc:1763] OP_REQUIRES failed at conv_ops_fused_impl.h:697 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Traceback (most recent call last):
File "cli_example_tiny_ophys_training.py", line 72, in
trainer.run()
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\deepinterpolation\cli\training.py", line 94, in run
training_class.run()
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\deepinterpolation\trainor_collection.py", line 245, in run
initial_epoch=0,
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1095, in fit
tmp_logs = self.train_function(iterator)
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\eager\def_function.py", line 828, in call
result = self._call(*args, **kwds)
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\eager\def_function.py", line 888, in _call
return self._stateless_fn(*args, **kwds)
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\eager\function.py", line 2943, in call
filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\eager\function.py", line 1919, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\eager\function.py", line 560, in call
ctx=ctx)
File "C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\tensorflow\python\eager\execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node model/conv2d/Relu (defined at C:\ProgramData\Anaconda3\envs\deepinterpolation\lib\site-packages\deepinterpolation\trainor_collection.py:245) ]] [Op:__inference_train_function_1736]

Function call stack:
train_function

2021-12-07 01:14:23.500441: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
[[{{node PyFunc}}]]

jeromelecoq · 2021-12-07T00:42:26Z

That error seems to be related to something else. It looks like CUDA has trouble initializing the GPU at all. Maybe make a separate issue.

asumser · 2021-12-07T01:08:45Z

might be, sorry if not helpful.
anyhow the RESOURCE_EXHAUSTED: OOM when allocating tensor with shape[...] error is not there anymore, so it goes a step further on tensorflow 2.4.4

jeromelecoq mentioned this issue Dec 6, 2021

move to predict_on_batch #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory error doing inference on GPU with some version combination #77

GPU memory error doing inference on GPU with some version combination #77

jeromelecoq commented Dec 3, 2021

jeromelecoq commented Dec 3, 2021

jeromelecoq commented Dec 3, 2021 •

edited

Loading

jeromelecoq commented Dec 3, 2021

jeromelecoq commented Dec 3, 2021

asumser commented Dec 7, 2021

jeromelecoq commented Dec 7, 2021

asumser commented Dec 7, 2021

GPU memory error doing inference on GPU with some version combination #77

GPU memory error doing inference on GPU with some version combination #77

Comments

jeromelecoq commented Dec 3, 2021

jeromelecoq commented Dec 3, 2021

jeromelecoq commented Dec 3, 2021 • edited Loading

jeromelecoq commented Dec 3, 2021

jeromelecoq commented Dec 3, 2021

asumser commented Dec 7, 2021

jeromelecoq commented Dec 7, 2021

asumser commented Dec 7, 2021

jeromelecoq commented Dec 3, 2021 •

edited

Loading