Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get working with GPU: OOM and/or hangs #7

Closed
duncanenman opened this issue Jun 16, 2022 · 2 comments
Closed

Unable to get working with GPU: OOM and/or hangs #7

duncanenman opened this issue Jun 16, 2022 · 2 comments

Comments

@duncanenman
Copy link

First of all, thanks for putting this together! I'm trying to integrate it into my volunteer search & rescue team's RPAS/drone workflow, and we're all very excited about the initial results.

While the results under CPU mode are great, they are a bit slow for our purposes (using 4000x2250px images). Our team computer has a 3060 GPU (Windows 11, i7 CPU, 32GB RAM), which we hope can speed up processing, but we're unable to get it working in GPU mode.

When running the default GPU Docker install, we get OOM errors:

root@docker-desktop:/home# /bin/bash infer.sh
Using TensorFlow backend.
2022-06-16 02:32:07.062753: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-06-16 02:32:07.068259: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2687995000 Hz
2022-06-16 02:32:07.069773: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x57d5a40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-06-16 02:32:07.069800: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-06-16 02:32:07.072249: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-06-16 02:32:08.733246: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:32:08.733462: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x570f500 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-06-16 02:32:08.733513: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6
2022-06-16 02:32:08.738143: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:32:08.738184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3060 Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.702
pciBusID: 0000:01:00.0
2022-06-16 02:32:08.738376: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-06-16 02:32:08.739257: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-06-16 02:32:08.740589: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-06-16 02:32:08.740821: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-06-16 02:32:08.742371: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-06-16 02:32:08.743383: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-06-16 02:32:08.745999: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-16 02:32:08.746592: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:32:08.747027: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:32:08.747050: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-06-16 02:32:08.747094: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-06-16 02:32:08.747495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-16 02:32:08.747514: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2022-06-16 02:32:08.747532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2022-06-16 02:32:08.748081: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:32:08.748109: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1387] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2022-06-16 02:32:08.748516: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:32:08.748577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4857 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
Loading model, this may take a second...
2022-06-16 02:36:14.483391: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 1073741824 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:14.483439: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 1073741824
2022-06-16 02:36:14.576914: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 966367744 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:14.576962: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 966367744
2022-06-16 02:36:14.671003: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 869731072 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:14.671047: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 869731072
2022-06-16 02:36:14.770382: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 782758144 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:14.770424: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 782758144
2022-06-16 02:36:14.864264: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 704482304 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:14.864312: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 704482304
2022-06-16 02:36:14.957833: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 634034176 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:14.957889: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 634034176
2022-06-16 02:36:15.053452: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 570630912 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.053498: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 570630912
2022-06-16 02:36:15.150759: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 513568000 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.150802: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 513568000
2022-06-16 02:36:15.243685: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 462211328 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.243726: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 462211328
2022-06-16 02:36:15.335976: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 415990272 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.336028: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 415990272
2022-06-16 02:36:15.427373: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 374391296 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.427417: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 374391296
2022-06-16 02:36:15.520526: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 336952320 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.520570: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 336952320
2022-06-16 02:36:15.612661: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 303257088 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.612720: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 303257088
2022-06-16 02:36:15.704694: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 272931584 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.704741: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 272931584
2022-06-16 02:36:15.798110: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 245638656 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.798153: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 245638656
2022-06-16 02:36:15.889580: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 221074944 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.889628: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 221074944
2022-06-16 02:36:15.984319: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 198967552 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:15.984362: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 198967552
2022-06-16 02:36:16.077288: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 179070976 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:16.077332: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 179070976
2022-06-16 02:36:16.169940: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 161164032 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:16.169981: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 161164032
2022-06-16 02:36:16.264281: E tensorflow/stream_executor/cuda/cuda_driver.cc:893] failed to alloc 145047808 bytes on host: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2022-06-16 02:36:16.264328: W ./tensorflow/core/common_runtime/gpu/gpu_host_allocator.h:44] could not allocate pinned host memory of size: 145047808
tracking <tf.Variable 'Variable:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_1:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_2:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_3:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_4:0' shape=(15, 4) dtype=float32> anchors
Running inference on image folder: /home/data/images/test
Running network: N/A% (0 of 138) |                                          | Elapsed Time: 0:00:00 ETA:  --:--:--2022-06-16 02:36:21.094051: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-16 02:48:27.323880: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

Relying on driver to perform ptx compilation. This message will be only logged once.
2022-06-16 02:48:27.773454: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-06-16 02:49:29.518638: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
Traceback (most recent call last):
  File "keras_retinanet/keras_retinanet/bin/infer.py", line 215, in <module>
    main()
  File "keras_retinanet/keras_retinanet/bin/infer.py", line 192, in main
    profile=args.profile
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/eval.py", line 250, in get_detections
    max_inflation_factor=max_inflation_factor
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/eval.py", line 168, in run_inference_on_image
    max_inflation_factor=max_inflation_factor)
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/../../../airutils/mob.py", line 365, in merge_boxes_per_label
    iou_threshold, max_iterations, top_k, max_inflation_factor, merge_mode)
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/../../../airutils/mob.py", line 276, in merge_overlapping_boxes
    X = 1. - compute_overlap(boxes, boxes)
  File "compute_overlap.pyx", line 27, in compute_overlap.compute_overlap
    cdef np.ndarray[double, ndim=2] overlaps = np.zeros((N, K), dtype=np.float64)
MemoryError: Unable to allocate array with shape (100000, 100000) and data type float64
.....

I've tried some extreme configuration options to lessen the memory burden, but we still get the OOM errors:

    --image_min_side 100 
    --image_max_side 150 
    --image_tiling_dim 8 

I found that by setting compile=False in the below line of keras_retinanet\keras_retinanet\bin\infer.py, I avoid getting the Cuda OOM errors, but the process just hangs and eventually errors out the same.
model = models.load_model(args.model, backbone_name=args.backbone, compile=False)

Complete output with compile=False:

root@docker-desktop:/home# /bin/bash infer.sh
Using TensorFlow backend.
2022-06-16 02:09:13.780103: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-06-16 02:09:13.786143: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2687995000 Hz
2022-06-16 02:09:13.787804: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5b70d30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-06-16 02:09:13.787838: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-06-16 02:09:13.790658: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2022-06-16 02:09:15.386150: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:09:15.386309: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x58fbfd0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-06-16 02:09:15.386346: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA GeForce RTX 3060 Laptop GPU, Compute Capability 8.6
2022-06-16 02:09:15.389332: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:09:15.389373: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3060 Laptop GPU major: 8 minor: 6 memoryClockRate(GHz): 1.702
pciBusID: 0000:01:00.0
2022-06-16 02:09:15.389623: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-06-16 02:09:15.390896: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-06-16 02:09:15.392376: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2022-06-16 02:09:15.392657: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2022-06-16 02:09:15.395000: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2022-06-16 02:09:15.396610: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2022-06-16 02:09:15.399907: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-16 02:09:15.400535: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:09:15.401007: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:09:15.401035: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2022-06-16 02:09:15.401082: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2022-06-16 02:09:15.401523: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-06-16 02:09:15.401542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2022-06-16 02:09:15.401581: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2022-06-16 02:09:15.402103: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:09:15.402132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1387] Could not identify NUMA node of platform GPU id 0, defaulting to 0.  Your kernel may not have been built with NUMA support.
2022-06-16 02:09:15.402663: E tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:969] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
Your kernel may have been built without NUMA support.
2022-06-16 02:09:15.402747: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4857 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3060 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
Loading model, this may take a second...
tracking <tf.Variable 'Variable:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_1:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_2:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_3:0' shape=(15, 4) dtype=float32> anchors
tracking <tf.Variable 'Variable_4:0' shape=(15, 4) dtype=float32> anchors
Running inference on image folder: /home/data/images/test
Running network: N/A% (0 of 138) |                                               | Elapsed Time: 0:00:00 ETA:  --:--:--2022-06-16 02:13:11.539626: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2022-06-16 02:25:21.996183: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: ptxas exited with non-zero error code 65280, output: ptxas fatal   : Value 'sm_86' is not defined for option 'gpu-name'

Relying on driver to perform ptx compilation. This message will be only logged once.
2022-06-16 02:25:22.378005: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2022-06-16 02:26:24.908769: W tensorflow/core/common_runtime/bfc_allocator.cc:305] Garbage collection: deallocate free memory regions (i.e., allocations) so that we can re-allocate a larger region to avoid OOM due to memory fragmentation. If you see this message frequently, you are running near the threshold of the available device memory and re-allocation may incur great performance overhead. You may try smaller batch sizes to observe the performance impact. Set TF_ENABLE_GPU_GARBAGE_COLLECTION=false if you'd like to disable this feature.
Traceback (most recent call last):
  File "keras_retinanet/keras_retinanet/bin/infer.py", line 216, in <module>
  File "keras_retinanet/keras_retinanet/bin/infer.py", line 193, in main
    )
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/eval.py", line 250, in get_detections
    max_inflation_factor=max_inflation_factor
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/eval.py", line 168, in run_inference_on_image
    max_inflation_factor=max_inflation_factor)
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/../../../airutils/mob.py", line 365, in merge_boxes_per_label
    iou_threshold, max_iterations, top_k, max_inflation_factor, merge_mode)
  File "keras_retinanet/keras_retinanet/bin/../../keras_retinanet/utils/../../../airutils/mob.py", line 276, in merge_overlapping_boxes
    X = 1. - compute_overlap(boxes, boxes)
  File "compute_overlap.pyx", line 27, in compute_overlap.compute_overlap
    cdef np.ndarray[double, ndim=2] overlaps = np.zeros((N, K), dtype=np.float64)
MemoryError: Unable to allocate array with shape (100000, 100000) and data type float64

In case it helps:

root@docker-desktop:/home# nvidia-smi
Thu Jun 16 02:29:41 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08    Driver Version: 512.96       CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  N/A |
| N/A   47C    P0    24W /  N/A |      0MiB /  6144MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Any suggestions would be very much appreciated, as I think I'm fully out of ideas and we're really keen to make this work!

@Hyper5phere
Copy link
Member

Glad to hear you've found my work useful.

This is a new error for me as well. I'm guessing it is caused by that 8GB (100000, 100000) float64 numpy array in compute_overlap.pyx code, which likely somehow gets stored in the GPU memory that has only 6144MB capacity. To make this array smaller, try setting --max_detections 1000 and see if the error disappears. Although with MOB postprocessing, you'd want to use a quite large --max_detections value so all candidate boxes get merged together properly (but the effect might be negligible between 10k and 100k anyhow).

Playing with the other memory options might help too, as you've already tried. Although the detection performance likely degrades a lot with those tiny resized images (150x100), I'd rather touch the --image_tiling_dim option first.

@Hyper5phere
Copy link
Member

Another possible solution could be to increase your Docker container memory limit to something like 16 GB. I think that compute_overlap computation should happen in CPU and main memory regardless if GPU is used or not, so it's weird that the GPU memory would be a limiting factor here.

Nevertheless, decreasing that --max_detections parameter likely brings your memory consumption to an acceptable level regardless where the bottleneck is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants