Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. #8965

Closed
1 task done
cellifeer opened this issue Mar 26, 2023 · 11 comments
Closed
1 task done
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance

Comments

@cellifeer
Copy link

cellifeer commented Mar 26, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

why stoped in this screen. not running.
how to resolve, let it contiune next task. like the same to crash.

Steps to reproduce the problem

python: 3.10.10  •  torch: 2.0.0+cu118  •  xformers: 0.0.17rc482  •  gradio: 3.16.2  

return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

why stoped in this screen. not running.
how to resolve, let it contiune next task. like the same to crash.

What should have happened?

return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

why stoped in this screen. not running.
how to resolve, let it contiune next task. like the same to crash.

Commit where the problem happens

general timing

What platforms do you use to access the UI ?

No response

What browsers do you use to access the UI ?

No response

Command Line Arguments

set PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128
set PYTHON=
set GIT=C:\Program Files\Git\bin\git.exe
set VENV_DIR=venv
set COMMANDLINE_ARGS=--autolaunch --lowvram --xformers --opt-split-attention --opt-channelslast --gradio-queue

List of extensions

no

Console logs

return torch.cuda.cudart().cudaMemGetInfo(device)
RuntimeError: CUDA error: the launch timed out and was terminated
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Additional information

No response

@cellifeer cellifeer added the bug-report Report of a bug, yet to be confirmed label Mar 26, 2023
@pirrimaison
Copy link

pirrimaison commented Apr 9, 2023

May be "Mixed Precision," with bf16 does not happen to me.
image

[+] xformers version 0.0.18 installed. (xformers is not necessary with torch 2.0.
)
[+] torch version 2.0.0+cu118 installed.
[+] torchvision version 0.15.1+cu118 installed.
[+] accelerate version 0.18.0 installed.
[+] diffusers version 0.14.0 installed.
[+] transformers version 4.27.1 installed.
[+] bitsandbytes version 0.35.4 installed.

python: 3.10.6  •  torch: 2.0.0+cu118  •  xformers: N/A  •  gradio: 3.23.0  •  commit:  •  checkpoint: e6415c4892

@Hung-Ching-Lee
Copy link

change torch version 2.0.0+cu118 to 2.1.0.dev20230501+cu117 works for me, but I have no idea what the reason is.

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

@LisaLy123456
Copy link

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

Thank you, Your method has solved my problem!

@QJShan
Copy link

QJShan commented Jun 2, 2023

Hung

why the torch is installed repeatly multiply times by executing this command?

@QJShan
Copy link

QJShan commented Jun 2, 2023

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

Thank you, Your method has solved my problem!

why the torch is installed repeatly multiply times by executing this command?

@cscooper2000
Copy link

Forgive me, I'm a newb. Where do I put that command? (I'm on Windows)

@ChiragGhanshani
Copy link

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

Thank you, Your method has solved my problem!

why the torch is installed repeatly multiply times by executing this command?

Looks like it's installing every version of cu117 contained in the nightly folder. There are subtle differences in the filenames.

@catboxanon catboxanon added asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance and removed bug-report Report of a bug, yet to be confirmed labels Aug 26, 2023
@arlojeremy
Copy link

arlojeremy commented Oct 29, 2023

change torch version 2.0.0+cu118 to 2.1.0.dev20230501+cu117 works for me, but I have no idea what the reason is.

pip3 install numpy --pre torch torchvision torchaudio --force-reinstall --index-url https://download.pytorch.org/whl/nightly/cu117

When I run this I get this error:

Looking in indexes: https://download.pytorch.org/whl/nightly/cu117
Collecting numpy
Downloading https://download.pytorch.org/whl/nightly/numpy-1.24.1-cp311-cp311-win_amd64.whl (14.8 MB)
---------------------------------------- 14.8/14.8 MB 34.4 MB/s eta 0:00:00
Collecting torch
Downloading https://download.pytorch.org/whl/nightly/cu117/torch-2.1.0.dev20230621%2Bcu117-cp311-cp311-win_amd64.whl (2387.8 MB)
---------------------------------------- 2.4/2.4 GB 434.6 kB/s eta 0:00:00
ERROR: Could not find a version that satisfies the requirement torchvision (from versions: none)
ERROR: No matching distribution found for torchvision
cmd_bbWUwwRBy8

@unanan

This comment was marked as outdated.

@liujingmao
Copy link

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 try execute is ok!!!

@arlojeremy
Copy link

Late comment but I think my gpu is failing. I under clocked the memory and gpu clock and don't get the error any more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking-for-help-with-local-system-issues This issue is asking for help related to local system; please offer assistance
Projects
None yet
Development

No branches or pull requests