CUDA error: CUBLAS_STATUS_NOT_SUPPORTED #16

animikhaich · 2023-07-30T23:38:04Z

After a standard installation, I tried to run the test steps as outlined here.

I encountered this error:

As suggested by issues #9 and #6 I verified the existence of md_v5a.0.0.pt and md_v5b.0.0.pt in .EcoAssist_files/pretrained_models.

The stdout.txt log dump is given below:

EXECUTED: start_deploy({})

EXECUTED: deploy_model({'path_to_image_folder': '/home/ani/Downloads/test-images', 'selected_options': ['--output_relative_filenames', '--recursive'], 'data_type': 'img'})

EXECUTED: switch_yolov5_git_to({'model_type': 'old models'})

command:

["'/home/ani/.EcoAssist_files/miniforge/envs/ecoassistcondaenv/bin/python' '/home/ani/.EcoAssist_files/cameratraps/detection/run_detector_batch.py' '/home/ani/.EcoAssist_files/pretrained_models/md_v5a.0.0.pt' '--output_relative_filenames' '--recursive' '/home/ani/Downloads/test-images' '/home/ani/Downloads/test-images/image_recognition_file.json'"]


Fusing layers... 
5 image files found in the input directory
PyTorch reports 1 available CUDA devices
GPU available: True
Using PyTorch version 1.10.1
Traceback (most recent call last):
  File "/home/ani/.EcoAssist_files/cameratraps/detection/run_detector_batch.py", line 816, in <module>
    main()
  File "/home/ani/.EcoAssist_files/cameratraps/detection/run_detector_batch.py", line 785, in main
    results = load_and_run_detector_batch(model_file=args.detector_file,
  File "/home/ani/.EcoAssist_files/cameratraps/detection/run_detector_batch.py", line 402, in load_and_run_detector_batch
    detector = load_detector(model_file)
  File "/home/ani/.EcoAssist_files/cameratraps/detection/run_detector.py", line 289, in load_detector
    detector = PTDetector(model_file, force_cpu, USE_MODEL_NATIVE_CLASSES)        
  File "/home/ani/.EcoAssist_files/cameratraps/detection/pytorch_detector.py", line 50, in __init__
    self.model = PTDetector._load_model(model_path, self.device)
  File "/home/ani/.EcoAssist_files/cameratraps/detection/pytorch_detector.py", line 62, in _load_model
    model = checkpoint['model'].float().fuse().eval()  # FP32 model
  File "/home/ani/.EcoAssist_files/yolov5/models/yolo.py", line 231, in fuse
    m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv
  File "/home/ani/.EcoAssist_files/yolov5/utils/torch_utils.py", line 205, in fuse_conv_and_bn
    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_SUPPORTED when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`
ERROR:
local variable 'elapsed_time' referenced before assignment

DETAILS:
Traceback (most recent call last):
  File "EcoAssist/EcoAssist_GUI.py", line 1232, in start_deploy
    deploy_model(var_choose_folder.get(), additional_img_options, data_type = "img")
  File "EcoAssist/EcoAssist_GUI.py", line 1072, in deploy_model
    progress_stats['text'] = create_md_progress_lbl(elapsed_time = elapsed_time,
UnboundLocalError: local variable 'elapsed_time' referenced before assignment

My System Information:

           `.:/ossyyyysso/:.               ani@Arc 
        .:oyyyyyyyyyyyyyyyyyyo:`           ------- 
      -oyyyyyyyodMMyyyyyyyysyyyyo-         OS: Kubuntu 23.04 x86_64 
    -syyyyyyyyyydMMyoyyyydmMMyyyyys-       Host: MS-7D43 1.0 
   oyyysdMysyyyydMMMMMMMMMMMMMyyyyyyyo     Kernel: 6.2.0-26-generic 
 `oyyyydMMMMysyysoooooodMMMMyyyyyyyyyo`    Uptime: 32 mins 
 oyyyyyydMMMMyyyyyyyyyyyysdMMysssssyyyo    Packages: 2819 (dpkg), 13 (snap) 
-yyyyyyyydMysyyyyyyyyyyyyyysdMMMMMysyyy-   Shell: bash 5.2.15 
oyyyysoodMyyyyyyyyyyyyyyyyyyydMMMMysyyyo   Resolution: 2560x1080 
yyysdMMMMMyyyyyyyyyyyyyyyyyyysosyyyyyyyy   DE: Plasma 5.27.4 
yyysdMMMMMyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy   WM: KWin 
oyyyyysosdyyyyyyyyyyyyyyyyyyydMMMMysyyyo   Theme: [Plasma], Breeze [GTK2/3] 
-yyyyyyyydMysyyyyyyyyyyyyyysdMMMMMysyyy-   Icons: [Plasma], Breeze-openSUSE Dark Icons [GTK2/3] 
 oyyyyyydMMMysyyyyyyyyyyysdMMyoyyyoyyyo    Terminal: konsole 
 `oyyyydMMMysyyyoooooodMMMMyoyyyyyyyyo     CPU: 12th Gen Intel i7-12700F (16) @ 4.800GHz 
   oyyysyyoyyyysdMMMMMMMMMMMyyyyyyyyo      GPU: NVIDIA GeForce RTX 3090 Ti 
    -syyyyyyyyydMMMysyyydMMMysyyyys-       Memory: 3879MiB / 31931MiB 
      -oyyyyyyydMMyyyyyyysosyyyyo-
        ./oyyyyyyyyyyyyyyyyyyo/.                                   
           `.:/oosyyyysso/:.`

Nvidia Driver:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  Off |
|  0%   49C    P8    19W / 450W |    415MiB / 24564MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

Default CUDA Version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

The text was updated successfully, but these errors were encountered:

agentmorris · 2023-07-31T14:06:29Z

Hmmm... I haven't seen this before. I'm about 80% sure this is not really a dimensionality issue, it's a CUDA version mismatch, where for some reason the system CUDA environment is being used instead of the Python environment's CUDA environment. A couple things to try, starting with the easiest:

In the shell from which you are launching EcoAssist, try running:

export LD_LIBRARY_PATH=''

...prior to starting EcoAssist. I'm 61% sure this will fix the problem, and if that's the case, we have an easy fix, and I get to grumble about how I wish CUDA installs wouldn't mess with LD_LIBRARY_PATH.
It would help debug a little if we could take EcoAssist out of the loop just to remove a level of indirection, so if the person who owns the environment is up for it, it would be great to go through the MegaDetector setup instructions. If we can repro the issue there, we'll have a simpler time debugging.
I don't really recommend that the environment owner do this, but FWIW, I think uninstalling CUDA entirely from the system will fix the issue. In principle I'd like to do this as a debugging step, but it's a big hammer to wield if the user is using the system CUDA for other things.
I don't think we'll go past (2) just yet, but if (1) doesn't work, and we can repro the problem in a standalone Python environment (i.e., outside of EcoAssist), we can try to upgrade PyTorch in that environment to match the system CUDA version. If that works, we've at least verified that it was really a CUDA version mismatch, then we can decide what to do about it.

PetervanLunteren · 2023-07-31T15:02:38Z

@agentmorris Thanks for your response!

@animikhaich With regards to option 1, the easiest way to run export LD_LIBRARY_PATH='' prior to opening EcoAssist would be to add this line somewhere before the python command on line 109 in /home/ani/.EcoAssist_files/EcoAssist/open.command.

animikhaich · 2023-07-31T15:06:01Z

Thanks @agentmorris and @PetervanLunteren. Option 1 resolved it!

Set environment variable to ensure that the Python environment's CUDA environment is being used instead of the system CUDA environment. Resolves #16 and agentmorris/MegaDetector#106

animikhaich mentioned this issue Jul 30, 2023

[REVIEW]: EcoAssist: A no-code platform to train and deploy custom YOLOv5 object detection models openjournals/joss-reviews#5581

Closed

PetervanLunteren mentioned this issue Jul 31, 2023

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED on Kubuntu 23.04 x86_64 with NVIDIA GeForce RTX 3090 Ti agentmorris/MegaDetector#106

Closed

animikhaich closed this as completed Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED #16

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED #16

animikhaich commented Jul 30, 2023

agentmorris commented Jul 31, 2023

PetervanLunteren commented Jul 31, 2023 •

edited

animikhaich commented Jul 31, 2023

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED #16

CUDA error: CUBLAS_STATUS_NOT_SUPPORTED #16

Comments

animikhaich commented Jul 30, 2023

agentmorris commented Jul 31, 2023

PetervanLunteren commented Jul 31, 2023 • edited

animikhaich commented Jul 31, 2023

PetervanLunteren commented Jul 31, 2023 •

edited