[BUG] #151

PredyDaddy · 2024-04-15T10:16:45Z

When I run this command

./scripts/run_samples.sh

it shows below, it happens in classfier sample and detection sample

root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/run_samples.sh
SAMPLES_DIR: /app/cqy/CV-CUDA/samples
CLASSIFICATION_OUT_DIR: /tmp/classification
SEGMENTATION_OUT_DIR: /tmp/segmentation
DETECTION_OUT_DIR: /tmp/object_detection
DISTANCE_LABEL_OUT_DIR: /tmp/distance_label
[perf_utils:100] 2024-04-15 10:12:26 WARNING perf_utils is used without benchmark.py. Benchmarking mode is turned off.
[perf_utils:104] 2024-04-15 10:12:26 INFO   Using CV-CUDA version: 0.6.0-beta
[pipelines:35] 2024-04-15 10:12:26 INFO   Using CVCUDA as preprocessor.
[nvcodec_utils:532] 2024-04-15 10:12:26 INFO   Using nvImageCodec decoder version: 0.2.0
[pipelines:122] 2024-04-15 10:12:26 INFO   Using CVCUDA as post-processor.
[04/15/2024-10:12:28] [TRT] [I] Using TensorRT version: 8.6.1
[04/15/2024-10:12:28] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 469, GPU 1597 (MiB)
[04/15/2024-10:12:36] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU -306, now: CPU 1990, GPU 732 (MiB)
[04/15/2024-10:12:36] [TRT] [I] Using precision : float16
[04/15/2024-10:12:36] [TRT] [I] Loading ONNX file from path /tmp/classification/model.4.224.224.onnx 
[04/15/2024-10:12:36] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/15/2024-10:12:37] [TRT] [I] Graph optimization time: 0.579155 seconds.
[04/15/2024-10:12:37] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/15/2024-10:13:16] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[04/15/2024-10:13:16] [TRT] [I] Total Host Persistent Memory: 295504
[04/15/2024-10:13:16] [TRT] [I] Total Device Persistent Memory: 66560
[04/15/2024-10:13:16] [TRT] [I] Total Scratch Memory: 1049088
[04/15/2024-10:13:16] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 49 MiB
[04/15/2024-10:13:16] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 62 steps to complete.
[04/15/2024-10:13:16] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.856465ms to assign 4 blocks to 62 nodes requiring 16257024 bytes.
[04/15/2024-10:13:16] [TRT] [I] Total Activation Memory: 16257024
[04/15/2024-10:13:16] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[04/15/2024-10:13:16] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[04/15/2024-10:13:16] [TRT] [W] Check verbose logs for the list of affected weights.
[04/15/2024-10:13:16] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[04/15/2024-10:13:16] [TRT] [W] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[04/15/2024-10:13:16] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +48, GPU +49, now: CPU 48, GPU 49 (MiB)
[04/15/2024-10:13:17] [TRT] [I] Wrote TensorRT engine file: /tmp/classification/model.4.224.224.trtmodel
[model_inference:230] 2024-04-15 10:13:17 INFO   Using TensorRT as the inference engine.
Traceback (most recent call last):
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 236, in <module>
    main()
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 219, in main
    run_sample(
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 154, in run_sample
    batch = decoder()
  File "/app/cqy/CV-CUDA/samples/common/python/nvcodec_utils.py", line 552, in __call__
    image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)
RuntimeError: nvImageCodec failure: '#4'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
./scripts/run_samples.sh: line 48: 46642 Aborted                 (core dumped) python3 $SAMPLES_DIR/classification/python/main.py -o "$CLASSIFICATION_OUT_DIR"

bhaefnerNV · 2024-05-02T13:45:14Z

Hi @PredyDaddy,
Thank you very much for your interest in CVCUDA.
Did you follow the steps described in the samples' readme here? If not, what were the steps you did before executing the run_samples script?

PredyDaddy · 2024-05-06T09:58:15Z

Hi @PredyDaddy, Thank you very much for your interest in CVCUDA. Did you follow the steps described in the samples' readme here? If not, what were the steps you did before executing the run_samples script?

yes, I did follow this readme install envs, finally with this step

cd /workspace/cvcuda_install/  # Assuming this is where the installation files are
pip install cvcuda_cu12-0.6.0b0-cp310-cp310-linux_x86_64.whl

then I directly run

./scripts/run_samples.sh

with follow

SAMPLES_DIR: /app/cqy/CV-CUDA/samples
CLASSIFICATION_OUT_DIR: /tmp/classification
SEGMENTATION_OUT_DIR: /tmp/segmentation
DETECTION_OUT_DIR: /tmp/object_detection
DISTANCE_LABEL_OUT_DIR: /tmp/distance_label
[perf_utils:100] 2024-05-06 09:55:49 WARNING perf_utils is used without benchmark.py. Benchmarking mode is turned off.
[perf_utils:104] 2024-05-06 09:55:49 INFO   Using CV-CUDA version: 0.6.0-beta
[pipelines:35] 2024-05-06 09:55:49 INFO   Using CVCUDA as preprocessor.
[nvcodec_utils:532] 2024-05-06 09:55:49 INFO   Using nvImageCodec decoder version: 0.2.0
[pipelines:122] 2024-05-06 09:55:49 INFO   Using CVCUDA as post-processor.
[05/06/2024-09:55:53] [TRT] [I] Using TensorRT version: 8.6.1
[05/06/2024-09:55:53] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 469, GPU 466 (MiB)
[05/06/2024-09:56:02] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU +266, now: CPU 1990, GPU 732 (MiB)
[05/06/2024-09:56:02] [TRT] [I] Using precision : float16
[05/06/2024-09:56:02] [TRT] [I] Loading ONNX file from path /tmp/classification/model.4.224.224.onnx 
[05/06/2024-09:56:02] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/06/2024-09:56:03] [TRT] [I] Graph optimization time: 0.947446 seconds.
[05/06/2024-09:56:03] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/06/2024-09:56:54] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[05/06/2024-09:56:55] [TRT] [I] Total Host Persistent Memory: 297920
[05/06/2024-09:56:55] [TRT] [I] Total Device Persistent Memory: 63488
[05/06/2024-09:56:55] [TRT] [I] Total Scratch Memory: 1311744
[05/06/2024-09:56:55] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 49 MiB
[05/06/2024-09:56:55] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 62 steps to complete.
[05/06/2024-09:56:55] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 2.21112ms to assign 4 blocks to 62 nodes requiring 16257024 bytes.
[05/06/2024-09:56:55] [TRT] [I] Total Activation Memory: 16257024
[05/06/2024-09:56:55] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/06/2024-09:56:55] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/06/2024-09:56:55] [TRT] [W] Check verbose logs for the list of affected weights.
[05/06/2024-09:56:55] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[05/06/2024-09:56:55] [TRT] [W] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[05/06/2024-09:56:55] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +48, GPU +49, now: CPU 48, GPU 49 (MiB)
[05/06/2024-09:56:55] [TRT] [I] Wrote TensorRT engine file: /tmp/classification/model.4.224.224.trtmodel
[model_inference:230] 2024-05-06 09:56:55 INFO   Using TensorRT as the inference engine.
Traceback (most recent call last):
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 236, in <module>
    main()
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 219, in main
    run_sample(
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 154, in run_sample
    batch = decoder()
  File "/app/cqy/CV-CUDA/samples/common/python/nvcodec_utils.py", line 552, in __call__
    image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)
RuntimeError: nvImageCodec failure: '#4'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
./scripts/run_samples.sh: line 48: 91016 Aborted                 (core dumped) python3 $SAMPLES_DIR/classification/python/main.py -o "$CLASSIFICATION_OUT_DIR"

bhaefnerNV · 2024-05-07T06:36:19Z

Hi @PredyDaddy,
if you would like to execute the run_samples.sh script, you also have to build the C++ samples, as the script also executes those. In order to build the C++ samples, CVCUDA for C++ has to be installed, ie follow 6.ii. in the same readme.

If you would like to avoid building the C++ samples or installing CVCUDA for C++, you can also manually execute only the python samples instead of running the run_samples.sh script.

PredyDaddy · 2024-05-07T08:28:01Z

Hi @PredyDaddy, if you would like to execute the run_samples.sh script, you also have to build the C++ samples, as the script also executes those. In order to build the C++ samples, CVCUDA for C++ has to be installed, ie follow 6.ii. in the same readme.

If you would like to avoid building the C++ samples or installing CVCUDA for C++, you can also manually execute only the python samples instead of running the run_samples.sh script.

then I have this error, could you help me look at it?

root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/build_samples.sh 
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Error at CMakeLists.txt:20 (find_package):
  By not providing "Findnvcv_types.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "nvcv_types", but CMake did not find one.

  Could not find a package configuration file provided by "nvcv_types" with
  any of the following names:

    nvcv_typesConfig.cmake
    nvcv_types-config.cmake

  Add the installation prefix of "nvcv_types" to CMAKE_PREFIX_PATH or set
  "nvcv_types_DIR" to a directory containing one of the above files.  If
  "nvcv_types" provides a separate development package or SDK, be sure it has
  been installed.

bhaefnerNV · 2024-05-07T10:20:23Z

This looks like CVCUDA is not installed on your system. I recommend a deb package installation.

PredyDaddy · 2024-05-08T03:49:31Z

This looks like CVCUDA is not installed on your system. I recommend a deb package installation.

I use this package then I make build_sample

cd /workspace/cvcuda_install/  # Assuming this is where the installation files are
dpkg -i cvcuda-lib-0.6.0_beta-cuda12-x86_64-linux.deb
dpkg -i cvcuda-dev-0.6.0_beta-cuda12-x86_64-linux.deb
dpkg -i cvcuda-python3.10-0.6.0_beta-cuda12-x86_64-linux.deb

But I still have the same error

root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/build_samples.sh 
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "12.3") 
-- Found TensorRT: /usr/lib/x86_64-linux-gnu/libnvinfer.so (found version "8.6.1") 
-- Found TensorRT-8.6.1 libs: nvinfer nvinfer_plugin nvparsers nvonnxparser
-- Configuring done
-- Generating done
-- Build files have been written to: /app/cqy/CV-CUDA/samples/build
[ 14%] Building CXX object common/CMakeFiles/cvcuda_samples_common.dir/TRTUtils.cpp.o
[ 28%] Building CXX object common/CMakeFiles/cvcuda_samples_common.dir/NvDecoder.cpp.o
[ 42%] Linking CXX shared library libcvcuda_samples_common.so
[ 42%] Built target cvcuda_samples_common
[ 57%] Building CXX object classification/CMakeFiles/cvcuda_sample_classification.dir/Main.cpp.o
[ 71%] Linking CXX executable cvcuda_sample_classification
[ 71%] Built target cvcuda_sample_classification
[ 85%] Building CXX object cropandresize/CMakeFiles/cvcuda_sample_cropandresize.dir/Main.cpp.o
[100%] Linking CXX executable cvcuda_sample_cropandresize
[100%] Built target cvcuda_sample_cropandresize
root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/run_samples.sh
SAMPLES_DIR: /app/cqy/CV-CUDA/samples
CLASSIFICATION_OUT_DIR: /tmp/classification
SEGMENTATION_OUT_DIR: /tmp/segmentation
DETECTION_OUT_DIR: /tmp/object_detection
DISTANCE_LABEL_OUT_DIR: /tmp/distance_label
[perf_utils:100] 2024-05-08 03:46:54 WARNING perf_utils is used without benchmark.py. Benchmarking mode is turned off.
[perf_utils:104] 2024-05-08 03:46:54 INFO   Using CV-CUDA version: 0.6.0-beta
[pipelines:35] 2024-05-08 03:46:55 INFO   Using CVCUDA as preprocessor.
[nvcodec_utils:532] 2024-05-08 03:46:55 INFO   Using nvImageCodec decoder version: 0.2.0
[pipelines:122] 2024-05-08 03:46:55 INFO   Using CVCUDA as post-processor.
[05/08/2024-03:46:58] [TRT] [I] Using TensorRT version: 8.6.1
[05/08/2024-03:46:58] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 469, GPU 466 (MiB)
[05/08/2024-03:47:07] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU +266, now: CPU 1990, GPU 732 (MiB)
[05/08/2024-03:47:07] [TRT] [I] Using precision : float16
[05/08/2024-03:47:07] [TRT] [I] Loading ONNX file from path /tmp/classification/model.4.224.224.onnx 
[05/08/2024-03:47:07] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/08/2024-03:47:08] [TRT] [I] Graph optimization time: 0.844108 seconds.
[05/08/2024-03:47:08] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/08/2024-03:47:58] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[05/08/2024-03:47:59] [TRT] [I] Total Host Persistent Memory: 312272
[05/08/2024-03:47:59] [TRT] [I] Total Device Persistent Memory: 41984
[05/08/2024-03:47:59] [TRT] [I] Total Scratch Memory: 1311744
[05/08/2024-03:47:59] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 49 MiB
[05/08/2024-03:47:59] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 62 steps to complete.
[05/08/2024-03:47:59] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 2.28343ms to assign 4 blocks to 62 nodes requiring 16257024 bytes.
[05/08/2024-03:47:59] [TRT] [I] Total Activation Memory: 16257024
[05/08/2024-03:47:59] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/08/2024-03:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/08/2024-03:47:59] [TRT] [W] Check verbose logs for the list of affected weights.
[05/08/2024-03:47:59] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[05/08/2024-03:47:59] [TRT] [W] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[05/08/2024-03:47:59] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +48, GPU +49, now: CPU 48, GPU 49 (MiB)
[05/08/2024-03:47:59] [TRT] [I] Wrote TensorRT engine file: /tmp/classification/model.4.224.224.trtmodel
[model_inference:230] 2024-05-08 03:48:00 INFO   Using TensorRT as the inference engine.
Traceback (most recent call last):
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 236, in <module>
    main()
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 219, in main
    run_sample(
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 154, in run_sample
    batch = decoder()
  File "/app/cqy/CV-CUDA/samples/common/python/nvcodec_utils.py", line 552, in __call__
    image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)
RuntimeError: nvImageCodec failure: '#4'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
./scripts/run_samples.sh: line 48: 138944 Aborted                 (core dumped) python3 $SAMPLES_DIR/classification/python/main.py -o "$CLASSIFICATION_OUT_DIR"

can you help me look at it?

wenkaic · 2024-05-13T03:53:38Z

It seems that this issue is caused by the image file corruption. You can replace the image files in samples/assets/images with other images to solve this issue.

wenkaic · 2024-05-13T03:54:31Z

It seems that this issue is caused by the image file corruption. You can replace the image files in samples/assets/images with other images to solve this issue.

And also for the video.

PredyDaddy added the bug Something isn't working as expected (software, install, documentation) label Apr 15, 2024

bhaefnerNV added the need more info Waiting for more information from user label May 2, 2024

PredyDaddy closed this as completed May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] #151

[BUG] #151

PredyDaddy commented Apr 15, 2024

bhaefnerNV commented May 2, 2024

PredyDaddy commented May 6, 2024

bhaefnerNV commented May 7, 2024

PredyDaddy commented May 7, 2024

bhaefnerNV commented May 7, 2024

PredyDaddy commented May 8, 2024

wenkaic commented May 13, 2024

wenkaic commented May 13, 2024 •

edited

[BUG] #151

[BUG] #151

Comments

PredyDaddy commented Apr 15, 2024

bhaefnerNV commented May 2, 2024

PredyDaddy commented May 6, 2024

bhaefnerNV commented May 7, 2024

PredyDaddy commented May 7, 2024

bhaefnerNV commented May 7, 2024

PredyDaddy commented May 8, 2024

wenkaic commented May 13, 2024

wenkaic commented May 13, 2024 • edited

wenkaic commented May 13, 2024 •

edited