A100 Support #27

delbalso · 2022-09-30T23:24:14Z

What can I do to help get a100 support?

TheLastBen · 2022-09-30T23:26:41Z

Thanks,

run :

!pip install git+https://github.com/facebookresearch/xformers@51dd119#egg=xformers

after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers

save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them.

the files might not show in the colab explorer, so you will have to rename them

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C.so /usr/local/lib/python3.7/dist-packages/xformers/C.py

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C_flashattention.so /usr/local/lib/python3.7/dist-packages/xformers/C_flashattention.py

delbalso · 2022-10-01T05:30:13Z

https://file.io/UkvT0KEU31MY
kept them as .py files

delbalso · 2022-10-01T05:36:01Z

The notebook still doesn't work though. I get this error.

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--num_cpu_threads_per_process was set to 6 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Downloading: 100% 543/543 [00:00<00:00, 565kB/s]
Fetching 16 files: 0% 0/16 [00:00<?, ?it/s]
Downloading: 100% 342/342 [00:00<00:00, 359kB/s]
Fetching 16 files: 6% 1/16 [00:00<00:11, 1.36it/s]
Downloading: 100% 4.56k/4.56k [00:00<00:00, 3.96MB/s]
Fetching 16 files: 19% 3/16 [00:01<00:05, 2.31it/s]
Downloading: 0% 0.00/1.22G [00:00<?, ?B/s]
Downloading: 0% 4.71M/1.22G [00:00<00:25, 47.1MB/s]
...
Downloading: 100% 1.22G/1.22G [00:16<00:00, 75.7MB/s]
Fetching 16 files: 25% 4/16 [00:17<01:12, 6.01s/it]
Downloading: 100% 209/209 [00:00<00:00, 193kB/s]
Fetching 16 files: 38% 6/16 [00:19<00:30, 3.01s/it]
Downloading: 100% 592/592 [00:00<00:00, 586kB/s]
Fetching 16 files: 44% 7/16 [00:19<00:20, 2.27s/it]
Downloading: 0% 0.00/492M [00:00<?, ?B/s]
Downloading: 1% 4.79M/492M [00:00<00:10, 47.9MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.9MB/s]
Fetching 16 files: 50% 8/16 [00:26<00:29, 3.71s/it]
Downloading: 0% 0.00/525k [00:00<?, ?B/s]
Downloading: 100% 525k/525k [00:00<00:00, 3.57MB/s]
Fetching 16 files: 56% 9/16 [00:27<00:19, 2.81s/it]
Downloading: 100% 472/472 [00:00<00:00, 408kB/s]
Fetching 16 files: 62% 10/16 [00:28<00:12, 2.16s/it]
Downloading: 100% 806/806 [00:00<00:00, 793kB/s]
Fetching 16 files: 69% 11/16 [00:28<00:08, 1.70s/it]
Downloading: 0% 0.00/1.06M [00:00<?, ?B/s]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.60MB/s]
Fetching 16 files: 75% 12/16 [00:29<00:05, 1.44s/it]
Downloading: 100% 743/743 [00:00<00:00, 641kB/s]
Fetching 16 files: 81% 13/16 [00:30<00:03, 1.20s/it]
Downloading: 0% 0.00/3.44G [00:00<?, ?B/s]
...
Downloading: 100% 3.44G/3.44G [00:45<00:00, 75.0MB/s]
Fetching 16 files: 88% 14/16 [01:16<00:29, 14.77s/it]
Downloading: 100% 522/522 [00:00<00:00, 454kB/s]
Fetching 16 files: 94% 15/16 [01:17<00:10, 10.53s/it]
Downloading: 0% 0.00/335M [00:00<?, ?B/s]
Downloading: 1% 4.71M/335M [00:00<00:07, 47.1MB/s]
...
Downloading: 100% 335M/335M [00:04<00:00, 76.1MB/s]
Fetching 16 files: 100% 16/16 [01:21<00:00, 5.12s/it]
Generating class images: 100% 3/3 [00:28<00:00, 9.61s/it]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.61MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.7MB/s]
Steps: 0% 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/drive/MyDrive/AI/DreamBooth/training_data/mike_pics_training_data', '--class_data_dir=/content/data/guy', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks guy', '--class_prompt=photo of a guy', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.

TheLastBen · 2022-10-01T09:07:00Z

Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?

TheLastBen · 2022-10-01T09:20:02Z

it looks like you missed the cell downloading the model

liangwei191 · 2022-10-01T11:33:02Z

I get a100 at first too after I found the cost drain too fast.So I use menu runtime->reset factory runtime to random a gpu until get a usable one.

delbalso · 2022-10-01T13:53:32Z

Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?

Yep

it looks like you missed the cell downloading the model

Why do you think that? In any case, I just downloaded it again.

I noticed that I copied the precompiled files wrong, but have now fixed them.

BTW the %%capture thing confused me because I didn't see an error.

Here's an update to the error I'm getting:

/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"172.28.0.3","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
"WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...
Steps: 0% 2/2000 [00:05<1:14:31, 2.24s/it, loss=0.42, lr=5e-6] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 550, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.7/dist-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 262, in forward
sample = self.mid_block(sample, emb, encoder_hidden_states=encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 375, in forward
hidden_states = attn(hidden_states, encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 167, in forward
hidden_states = block(hidden_states, context=context)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 219, in forward
hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 451, in forward
return self.net(hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
Steps: 0% 2/2000 [00:05<1:36:41, 2.90s/it, loss=0.42, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=/content/gdrive/MyDrive/stable-diffusion-v1-4', '--instance_data_dir=/content/data/mikemdb', '--output_dir=/content/models/mikemdb', '--instance_prompt=photo of mikemdb man', '--seed=12345', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1.

Any ideas?

TheLastBen · 2022-10-01T13:59:04Z

If you're using the A100, I still didn't implement them in the colab, yet, I'll do it shortly

delbalso · 2022-10-01T14:04:33Z

Yes I understand, I just placed the files in the right place manually.

FYI think I just got it working by killing

--use_8bit_adam \

and

--mixed_precision="fp16" \

delbalso · 2022-10-01T14:06:25Z

How long does training take on other GPUs? It looks like 2000 steps on 512 resolution on an A100 on colab takes 30 mins

TheLastBen · 2022-10-01T14:07:22Z

it's because you removed the --use_8bit_adam \ and --mixed_precision="fp16"
make sure they are the cause for the error you're getting

TheLastBen · 2022-10-01T14:07:46Z

try leaving the --mixed_precision="fp16" \

delbalso · 2022-10-01T14:40:41Z

I'm saying it only started working when I removed --mixed_precision="fp16" \

delbalso · 2022-10-01T14:56:59Z

Should I set train_batch_size to the number of training instances I have?

TheLastBen · 2022-10-01T14:58:20Z

That is the number of models it trains on the same instance, best to keep it to one to save time

delbalso · 2022-10-01T15:03:02Z

Thank you

ackl · 2022-10-03T17:34:54Z

i'm not sure if this issue should've been closed without making some changes in the notebooks? I have run into the exact same issue today, got an A100 and during training it would throw the same CUBLAS_STATUS_EXECUTION_FAILED error right as it gets to step 2

I also resolved it by removing --mixed_precision="fp16" \

TheLastBen · 2022-10-03T18:22:33Z

@ackl I'll make sure A100 users won't face that issue in the future

TheLastBen · 2022-10-03T18:24:46Z

@ackl try and set it to "no" : --mixed_precision="no" \ instead of removing it
if it works, that would be easier for me to implement the change, looking forward to your feedback

TheLastBen · 2022-10-03T19:00:47Z

I have fixed the precision issue for A100s, waiting for your confirmation to close the issue. Make sure you use the updated Colab Notebook

ackl · 2022-10-03T19:40:23Z

I can confirm it works with the latest commit that uses --mixed_precision="no" when GPU == A100. Thanks for the quick update!

TheLastBen · 2022-10-03T19:41:09Z

Thanks for the feedback

delbalso closed this as completed Oct 1, 2022

TheLastBen reopened this Oct 3, 2022

TheLastBen closed this as completed Oct 3, 2022

geocine mentioned this issue Oct 23, 2022

CUDA Error ShivamShrirao/diffusers#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A100 Support #27

A100 Support #27

delbalso commented Sep 30, 2022

TheLastBen commented Sep 30, 2022

delbalso commented Oct 1, 2022

delbalso commented Oct 1, 2022 •

edited

TheLastBen commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

liangwei191 commented Oct 1, 2022

delbalso commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

delbalso commented Oct 1, 2022 •

edited

delbalso commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

delbalso commented Oct 1, 2022

delbalso commented Oct 1, 2022

TheLastBen commented Oct 1, 2022 •

edited

delbalso commented Oct 1, 2022

ackl commented Oct 3, 2022

TheLastBen commented Oct 3, 2022

TheLastBen commented Oct 3, 2022 •

edited

TheLastBen commented Oct 3, 2022

ackl commented Oct 3, 2022

TheLastBen commented Oct 3, 2022

A100 Support #27

A100 Support #27

Comments

delbalso commented Sep 30, 2022

TheLastBen commented Sep 30, 2022

delbalso commented Oct 1, 2022

delbalso commented Oct 1, 2022 • edited

TheLastBen commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

liangwei191 commented Oct 1, 2022

delbalso commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

delbalso commented Oct 1, 2022 • edited

delbalso commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

TheLastBen commented Oct 1, 2022

delbalso commented Oct 1, 2022

delbalso commented Oct 1, 2022

TheLastBen commented Oct 1, 2022 • edited

delbalso commented Oct 1, 2022

ackl commented Oct 3, 2022

TheLastBen commented Oct 3, 2022

TheLastBen commented Oct 3, 2022 • edited

TheLastBen commented Oct 3, 2022

ackl commented Oct 3, 2022

TheLastBen commented Oct 3, 2022

delbalso commented Oct 1, 2022 •

edited

delbalso commented Oct 1, 2022 •

edited

TheLastBen commented Oct 1, 2022 •

edited

TheLastBen commented Oct 3, 2022 •

edited