Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] #151

Closed
PredyDaddy opened this issue Apr 15, 2024 · 8 comments
Closed

[BUG] #151

PredyDaddy opened this issue Apr 15, 2024 · 8 comments
Labels
bug Something isn't working as expected (software, install, documentation) need more info Waiting for more information from user

Comments

@PredyDaddy
Copy link

When I run this command

./scripts/run_samples.sh

it shows below, it happens in classfier sample and detection sample

root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/run_samples.sh
SAMPLES_DIR: /app/cqy/CV-CUDA/samples
CLASSIFICATION_OUT_DIR: /tmp/classification
SEGMENTATION_OUT_DIR: /tmp/segmentation
DETECTION_OUT_DIR: /tmp/object_detection
DISTANCE_LABEL_OUT_DIR: /tmp/distance_label
[perf_utils:100] 2024-04-15 10:12:26 WARNING perf_utils is used without benchmark.py. Benchmarking mode is turned off.
[perf_utils:104] 2024-04-15 10:12:26 INFO   Using CV-CUDA version: 0.6.0-beta
[pipelines:35] 2024-04-15 10:12:26 INFO   Using CVCUDA as preprocessor.
[nvcodec_utils:532] 2024-04-15 10:12:26 INFO   Using nvImageCodec decoder version: 0.2.0
[pipelines:122] 2024-04-15 10:12:26 INFO   Using CVCUDA as post-processor.
[04/15/2024-10:12:28] [TRT] [I] Using TensorRT version: 8.6.1
[04/15/2024-10:12:28] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 469, GPU 1597 (MiB)
[04/15/2024-10:12:36] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU -306, now: CPU 1990, GPU 732 (MiB)
[04/15/2024-10:12:36] [TRT] [I] Using precision : float16
[04/15/2024-10:12:36] [TRT] [I] Loading ONNX file from path /tmp/classification/model.4.224.224.onnx 
[04/15/2024-10:12:36] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[04/15/2024-10:12:37] [TRT] [I] Graph optimization time: 0.579155 seconds.
[04/15/2024-10:12:37] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[04/15/2024-10:13:16] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[04/15/2024-10:13:16] [TRT] [I] Total Host Persistent Memory: 295504
[04/15/2024-10:13:16] [TRT] [I] Total Device Persistent Memory: 66560
[04/15/2024-10:13:16] [TRT] [I] Total Scratch Memory: 1049088
[04/15/2024-10:13:16] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 49 MiB
[04/15/2024-10:13:16] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 62 steps to complete.
[04/15/2024-10:13:16] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.856465ms to assign 4 blocks to 62 nodes requiring 16257024 bytes.
[04/15/2024-10:13:16] [TRT] [I] Total Activation Memory: 16257024
[04/15/2024-10:13:16] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[04/15/2024-10:13:16] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[04/15/2024-10:13:16] [TRT] [W] Check verbose logs for the list of affected weights.
[04/15/2024-10:13:16] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[04/15/2024-10:13:16] [TRT] [W] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[04/15/2024-10:13:16] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +48, GPU +49, now: CPU 48, GPU 49 (MiB)
[04/15/2024-10:13:17] [TRT] [I] Wrote TensorRT engine file: /tmp/classification/model.4.224.224.trtmodel
[model_inference:230] 2024-04-15 10:13:17 INFO   Using TensorRT as the inference engine.
Traceback (most recent call last):
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 236, in <module>
    main()
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 219, in main
    run_sample(
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 154, in run_sample
    batch = decoder()
  File "/app/cqy/CV-CUDA/samples/common/python/nvcodec_utils.py", line 552, in __call__
    image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)
RuntimeError: nvImageCodec failure: '#4'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
./scripts/run_samples.sh: line 48: 46642 Aborted                 (core dumped) python3 $SAMPLES_DIR/classification/python/main.py -o "$CLASSIFICATION_OUT_DIR"
@PredyDaddy PredyDaddy added the bug Something isn't working as expected (software, install, documentation) label Apr 15, 2024
@bhaefnerNV
Copy link

Hi @PredyDaddy,
Thank you very much for your interest in CVCUDA.
Did you follow the steps described in the samples' readme here? If not, what were the steps you did before executing the run_samples script?

@bhaefnerNV bhaefnerNV added the need more info Waiting for more information from user label May 2, 2024
@PredyDaddy
Copy link
Author

Hi @PredyDaddy, Thank you very much for your interest in CVCUDA. Did you follow the steps described in the samples' readme here? If not, what were the steps you did before executing the run_samples script?

yes, I did follow this readme install envs, finally with this step

cd /workspace/cvcuda_install/  # Assuming this is where the installation files are
pip install cvcuda_cu12-0.6.0b0-cp310-cp310-linux_x86_64.whl

then I directly run

./scripts/run_samples.sh

with follow

SAMPLES_DIR: /app/cqy/CV-CUDA/samples
CLASSIFICATION_OUT_DIR: /tmp/classification
SEGMENTATION_OUT_DIR: /tmp/segmentation
DETECTION_OUT_DIR: /tmp/object_detection
DISTANCE_LABEL_OUT_DIR: /tmp/distance_label
[perf_utils:100] 2024-05-06 09:55:49 WARNING perf_utils is used without benchmark.py. Benchmarking mode is turned off.
[perf_utils:104] 2024-05-06 09:55:49 INFO   Using CV-CUDA version: 0.6.0-beta
[pipelines:35] 2024-05-06 09:55:49 INFO   Using CVCUDA as preprocessor.
[nvcodec_utils:532] 2024-05-06 09:55:49 INFO   Using nvImageCodec decoder version: 0.2.0
[pipelines:122] 2024-05-06 09:55:49 INFO   Using CVCUDA as post-processor.
[05/06/2024-09:55:53] [TRT] [I] Using TensorRT version: 8.6.1
[05/06/2024-09:55:53] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 469, GPU 466 (MiB)
[05/06/2024-09:56:02] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU +266, now: CPU 1990, GPU 732 (MiB)
[05/06/2024-09:56:02] [TRT] [I] Using precision : float16
[05/06/2024-09:56:02] [TRT] [I] Loading ONNX file from path /tmp/classification/model.4.224.224.onnx 
[05/06/2024-09:56:02] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/06/2024-09:56:03] [TRT] [I] Graph optimization time: 0.947446 seconds.
[05/06/2024-09:56:03] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/06/2024-09:56:54] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[05/06/2024-09:56:55] [TRT] [I] Total Host Persistent Memory: 297920
[05/06/2024-09:56:55] [TRT] [I] Total Device Persistent Memory: 63488
[05/06/2024-09:56:55] [TRT] [I] Total Scratch Memory: 1311744
[05/06/2024-09:56:55] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 49 MiB
[05/06/2024-09:56:55] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 62 steps to complete.
[05/06/2024-09:56:55] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 2.21112ms to assign 4 blocks to 62 nodes requiring 16257024 bytes.
[05/06/2024-09:56:55] [TRT] [I] Total Activation Memory: 16257024
[05/06/2024-09:56:55] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/06/2024-09:56:55] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/06/2024-09:56:55] [TRT] [W] Check verbose logs for the list of affected weights.
[05/06/2024-09:56:55] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[05/06/2024-09:56:55] [TRT] [W] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[05/06/2024-09:56:55] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +48, GPU +49, now: CPU 48, GPU 49 (MiB)
[05/06/2024-09:56:55] [TRT] [I] Wrote TensorRT engine file: /tmp/classification/model.4.224.224.trtmodel
[model_inference:230] 2024-05-06 09:56:55 INFO   Using TensorRT as the inference engine.
Traceback (most recent call last):
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 236, in <module>
    main()
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 219, in main
    run_sample(
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 154, in run_sample
    batch = decoder()
  File "/app/cqy/CV-CUDA/samples/common/python/nvcodec_utils.py", line 552, in __call__
    image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)
RuntimeError: nvImageCodec failure: '#4'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
./scripts/run_samples.sh: line 48: 91016 Aborted                 (core dumped) python3 $SAMPLES_DIR/classification/python/main.py -o "$CLASSIFICATION_OUT_DIR"

@bhaefnerNV
Copy link

Hi @PredyDaddy,
if you would like to execute the run_samples.sh script, you also have to build the C++ samples, as the script also executes those. In order to build the C++ samples, CVCUDA for C++ has to be installed, ie follow 6.ii. in the same readme.

If you would like to avoid building the C++ samples or installing CVCUDA for C++, you can also manually execute only the python samples instead of running the run_samples.sh script.

@PredyDaddy
Copy link
Author

Hi @PredyDaddy, if you would like to execute the run_samples.sh script, you also have to build the C++ samples, as the script also executes those. In order to build the C++ samples, CVCUDA for C++ has to be installed, ie follow 6.ii. in the same readme.

If you would like to avoid building the C++ samples or installing CVCUDA for C++, you can also manually execute only the python samples instead of running the run_samples.sh script.

then I have this error, could you help me look at it?

root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/build_samples.sh 
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
CMake Error at CMakeLists.txt:20 (find_package):
  By not providing "Findnvcv_types.cmake" in CMAKE_MODULE_PATH this project
  has asked CMake to find a package configuration file provided by
  "nvcv_types", but CMake did not find one.

  Could not find a package configuration file provided by "nvcv_types" with
  any of the following names:

    nvcv_typesConfig.cmake
    nvcv_types-config.cmake

  Add the installation prefix of "nvcv_types" to CMAKE_PREFIX_PATH or set
  "nvcv_types_DIR" to a directory containing one of the above files.  If
  "nvcv_types" provides a separate development package or SDK, be sure it has
  been installed.

@bhaefnerNV
Copy link

This looks like CVCUDA is not installed on your system. I recommend a deb package installation.

@PredyDaddy
Copy link
Author

This looks like CVCUDA is not installed on your system. I recommend a deb package installation.

I use this package then I make build_sample

cd /workspace/cvcuda_install/  # Assuming this is where the installation files are
dpkg -i cvcuda-lib-0.6.0_beta-cuda12-x86_64-linux.deb
dpkg -i cvcuda-dev-0.6.0_beta-cuda12-x86_64-linux.deb
dpkg -i cvcuda-python3.10-0.6.0_beta-cuda12-x86_64-linux.deb

But I still have the same error

root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/build_samples.sh 
-- The CXX compiler identification is GNU 11.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.3.107") 
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "12.3") 
-- Found TensorRT: /usr/lib/x86_64-linux-gnu/libnvinfer.so (found version "8.6.1") 
-- Found TensorRT-8.6.1 libs: nvinfer nvinfer_plugin nvparsers nvonnxparser
-- Configuring done
-- Generating done
-- Build files have been written to: /app/cqy/CV-CUDA/samples/build
[ 14%] Building CXX object common/CMakeFiles/cvcuda_samples_common.dir/TRTUtils.cpp.o
[ 28%] Building CXX object common/CMakeFiles/cvcuda_samples_common.dir/NvDecoder.cpp.o
[ 42%] Linking CXX shared library libcvcuda_samples_common.so
[ 42%] Built target cvcuda_samples_common
[ 57%] Building CXX object classification/CMakeFiles/cvcuda_sample_classification.dir/Main.cpp.o
[ 71%] Linking CXX executable cvcuda_sample_classification
[ 71%] Built target cvcuda_sample_classification
[ 85%] Building CXX object cropandresize/CMakeFiles/cvcuda_sample_cropandresize.dir/Main.cpp.o
[100%] Linking CXX executable cvcuda_sample_cropandresize
[100%] Built target cvcuda_sample_cropandresize
root@f230e902ede4:/app/cqy/CV-CUDA/samples# ./scripts/run_samples.sh
SAMPLES_DIR: /app/cqy/CV-CUDA/samples
CLASSIFICATION_OUT_DIR: /tmp/classification
SEGMENTATION_OUT_DIR: /tmp/segmentation
DETECTION_OUT_DIR: /tmp/object_detection
DISTANCE_LABEL_OUT_DIR: /tmp/distance_label
[perf_utils:100] 2024-05-08 03:46:54 WARNING perf_utils is used without benchmark.py. Benchmarking mode is turned off.
[perf_utils:104] 2024-05-08 03:46:54 INFO   Using CV-CUDA version: 0.6.0-beta
[pipelines:35] 2024-05-08 03:46:55 INFO   Using CVCUDA as preprocessor.
[nvcodec_utils:532] 2024-05-08 03:46:55 INFO   Using nvImageCodec decoder version: 0.2.0
[pipelines:122] 2024-05-08 03:46:55 INFO   Using CVCUDA as post-processor.
[05/08/2024-03:46:58] [TRT] [I] Using TensorRT version: 8.6.1
[05/08/2024-03:46:58] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 469, GPU 466 (MiB)
[05/08/2024-03:47:07] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1445, GPU +266, now: CPU 1990, GPU 732 (MiB)
[05/08/2024-03:47:07] [TRT] [I] Using precision : float16
[05/08/2024-03:47:07] [TRT] [I] Loading ONNX file from path /tmp/classification/model.4.224.224.onnx 
[05/08/2024-03:47:07] [TRT] [W] onnx2trt_utils.cpp:374: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[05/08/2024-03:47:08] [TRT] [I] Graph optimization time: 0.844108 seconds.
[05/08/2024-03:47:08] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[05/08/2024-03:47:58] [TRT] [I] Detected 1 inputs and 1 output network tensors.
[05/08/2024-03:47:59] [TRT] [I] Total Host Persistent Memory: 312272
[05/08/2024-03:47:59] [TRT] [I] Total Device Persistent Memory: 41984
[05/08/2024-03:47:59] [TRT] [I] Total Scratch Memory: 1311744
[05/08/2024-03:47:59] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 48 MiB, GPU 49 MiB
[05/08/2024-03:47:59] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 62 steps to complete.
[05/08/2024-03:47:59] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 2.28343ms to assign 4 blocks to 62 nodes requiring 16257024 bytes.
[05/08/2024-03:47:59] [TRT] [I] Total Activation Memory: 16257024
[05/08/2024-03:47:59] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.
[05/08/2024-03:47:59] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.
[05/08/2024-03:47:59] [TRT] [W] Check verbose logs for the list of affected weights.
[05/08/2024-03:47:59] [TRT] [W] - 57 weights are affected by this issue: Detected subnormal FP16 values.
[05/08/2024-03:47:59] [TRT] [W] - 31 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.
[05/08/2024-03:47:59] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +48, GPU +49, now: CPU 48, GPU 49 (MiB)
[05/08/2024-03:47:59] [TRT] [I] Wrote TensorRT engine file: /tmp/classification/model.4.224.224.trtmodel
[model_inference:230] 2024-05-08 03:48:00 INFO   Using TensorRT as the inference engine.
Traceback (most recent call last):
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 236, in <module>
    main()
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 219, in main
    run_sample(
  File "/app/cqy/CV-CUDA/samples/classification/python/main.py", line 154, in run_sample
    batch = decoder()
  File "/app/cqy/CV-CUDA/samples/common/python/nvcodec_utils.py", line 552, in __call__
    image_list = self.decoder.decode(data_batch, cuda_stream=self.cuda_stream)
RuntimeError: nvImageCodec failure: '#4'
-------------------------------------------------------------------
PyCUDA ERROR: The context stack was not empty upon module cleanup.
-------------------------------------------------------------------
A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.
-------------------------------------------------------------------
./scripts/run_samples.sh: line 48: 138944 Aborted                 (core dumped) python3 $SAMPLES_DIR/classification/python/main.py -o "$CLASSIFICATION_OUT_DIR"

can you help me look at it?

@wenkaic
Copy link

wenkaic commented May 13, 2024

It seems that this issue is caused by the image file corruption. You can replace the image files in samples/assets/images with other images to solve this issue.

@wenkaic
Copy link

wenkaic commented May 13, 2024

It seems that this issue is caused by the image file corruption. You can replace the image files in samples/assets/images with other images to solve this issue.

And also for the video.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected (software, install, documentation) need more info Waiting for more information from user
Projects
None yet
Development

No branches or pull requests

3 participants