Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A100 Support #27

delbalso opened this issue Sep 30, 2022 · 22 comments

A100 Support #27

delbalso opened this issue Sep 30, 2022 · 22 comments


Copy link

What can I do to help get a100 support?

Copy link


run :

!pip install git+

after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers

save the two files : "" and "", upload them to any host and send me the link and I will integrate them.

the files might not show in the colab explorer, so you will have to rename them

!cp /usr/local/lib/python3.7/dist-packages/xformers/ /usr/local/lib/python3.7/dist-packages/xformers/

!cp /usr/local/lib/python3.7/dist-packages/xformers/ /usr/local/lib/python3.7/dist-packages/xformers/

Copy link

delbalso commented Oct 1, 2022
kept them as .py files

Copy link

delbalso commented Oct 1, 2022

The notebook still doesn't work though. I get this error.

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--num_cpu_threads_per_process was set to 6 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Downloading: 100% 543/543 [00:00<00:00, 565kB/s]
Fetching 16 files: 0% 0/16 [00:00<?, ?it/s]
Downloading: 100% 342/342 [00:00<00:00, 359kB/s]
Fetching 16 files: 6% 1/16 [00:00<00:11, 1.36it/s]
Downloading: 100% 4.56k/4.56k [00:00<00:00, 3.96MB/s]
Fetching 16 files: 19% 3/16 [00:01<00:05, 2.31it/s]
Downloading: 0% 0.00/1.22G [00:00<?, ?B/s]
Downloading: 0% 4.71M/1.22G [00:00<00:25, 47.1MB/s]
Downloading: 100% 1.22G/1.22G [00:16<00:00, 75.7MB/s]
Fetching 16 files: 25% 4/16 [00:17<01:12, 6.01s/it]
Downloading: 100% 209/209 [00:00<00:00, 193kB/s]
Fetching 16 files: 38% 6/16 [00:19<00:30, 3.01s/it]
Downloading: 100% 592/592 [00:00<00:00, 586kB/s]
Fetching 16 files: 44% 7/16 [00:19<00:20, 2.27s/it]
Downloading: 0% 0.00/492M [00:00<?, ?B/s]
Downloading: 1% 4.79M/492M [00:00<00:10, 47.9MB/s]
Downloading: 100% 492M/492M [00:06<00:00, 75.9MB/s]
Fetching 16 files: 50% 8/16 [00:26<00:29, 3.71s/it]
Downloading: 0% 0.00/525k [00:00<?, ?B/s]
Downloading: 100% 525k/525k [00:00<00:00, 3.57MB/s]
Fetching 16 files: 56% 9/16 [00:27<00:19, 2.81s/it]
Downloading: 100% 472/472 [00:00<00:00, 408kB/s]
Fetching 16 files: 62% 10/16 [00:28<00:12, 2.16s/it]
Downloading: 100% 806/806 [00:00<00:00, 793kB/s]
Fetching 16 files: 69% 11/16 [00:28<00:08, 1.70s/it]
Downloading: 0% 0.00/1.06M [00:00<?, ?B/s]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.60MB/s]
Fetching 16 files: 75% 12/16 [00:29<00:05, 1.44s/it]
Downloading: 100% 743/743 [00:00<00:00, 641kB/s]
Fetching 16 files: 81% 13/16 [00:30<00:03, 1.20s/it]
Downloading: 0% 0.00/3.44G [00:00<?, ?B/s]
Downloading: 100% 3.44G/3.44G [00:45<00:00, 75.0MB/s]
Fetching 16 files: 88% 14/16 [01:16<00:29, 14.77s/it]
Downloading: 100% 522/522 [00:00<00:00, 454kB/s]
Fetching 16 files: 94% 15/16 [01:17<00:10, 10.53s/it]
Downloading: 0% 0.00/335M [00:00<?, ?B/s]
Downloading: 1% 4.71M/335M [00:00<00:07, 47.1MB/s]
Downloading: 100% 335M/335M [00:04<00:00, 76.1MB/s]
Fetching 16 files: 100% 16/16 [01:21<00:00, 5.12s/it]
Generating class images: 100% 3/3 [00:28<00:00, 9.61s/it]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.61MB/s]
Downloading: 100% 492M/492M [00:06<00:00, 75.7MB/s]
Steps: 0% 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 43, in main
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 837, in launch_command
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/drive/MyDrive/AI/DreamBooth/training_data/mike_pics_training_data', '--class_data_dir=/content/data/guy', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks guy', '--class_prompt=photo of a guy', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.

Copy link

Thank you very much for the files, have you accepted the terms in ?

Copy link

it looks like you missed the cell downloading the model

Copy link

I get a100 at first too after I found the cost drain too fast.So I use menu runtime->reset factory runtime to random a gpu until get a usable one.

Copy link

delbalso commented Oct 1, 2022

Thank you very much for the files, have you accepted the terms in ?


it looks like you missed the cell downloading the model

Why do you think that? In any case, I just downloaded it again.

I noticed that I copied the precompiled files wrong, but have now fixed them.

BTW the %%capture thing confused me because I didn't see an error.

Here's an update to the error I'm getting:

/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/ UserWarning: /usr/lib64-nvidia did not contain as expected! Searching further paths...
f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/ UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('["--ip="],"debugAdapterMultiplexerPath"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/ UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/ UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
"WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/
Steps: 0% 2/2000 [00:05<1:14:31, 2.24s/it, loss=0.42, lr=5e-6] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/", line 606, in
File "/content/diffusers/examples/dreambooth/", line 550, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/accelerate/utils/", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.7/dist-packages/torch/amp/", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/", line 262, in forward
sample = self.mid_block(sample, emb, encoder_hidden_states=encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/", line 375, in forward
hidden_states = attn(hidden_states, encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/", line 167, in forward
hidden_states = block(hidden_states, context=context)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/", line 219, in forward
hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/", line 451, in forward
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
Steps: 0% 2/2000 [00:05<1:36:41, 2.90s/it, loss=0.42, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 43, in main
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 837, in launch_command
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/', '--pretrained_model_name_or_path=/content/gdrive/MyDrive/stable-diffusion-v1-4', '--instance_data_dir=/content/data/mikemdb', '--output_dir=/content/models/mikemdb', '--instance_prompt=photo of mikemdb man', '--seed=12345', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1.

Any ideas?

Copy link

If you're using the A100, I still didn't implement them in the colab, yet, I'll do it shortly

Copy link

delbalso commented Oct 1, 2022

Yes I understand, I just placed the files in the right place manually.

FYI think I just got it working by killing

--use_8bit_adam \


--mixed_precision="fp16" \

Copy link

delbalso commented Oct 1, 2022

How long does training take on other GPUs? It looks like 2000 steps on 512 resolution on an A100 on colab takes 30 mins

Copy link

it's because you removed the --use_8bit_adam \ and --mixed_precision="fp16"
make sure they are the cause for the error you're getting

Copy link

try leaving the --mixed_precision="fp16" \

Copy link

delbalso commented Oct 1, 2022

I'm saying it only started working when I removed --mixed_precision="fp16" \

Copy link

delbalso commented Oct 1, 2022

Should I set train_batch_size to the number of training instances I have?

Copy link

TheLastBen commented Oct 1, 2022

That is the number of models it trains on the same instance, best to keep it to one to save time

Copy link

delbalso commented Oct 1, 2022

Thank you

@delbalso delbalso closed this as completed Oct 1, 2022
Copy link

ackl commented Oct 3, 2022

i'm not sure if this issue should've been closed without making some changes in the notebooks? I have run into the exact same issue today, got an A100 and during training it would throw the same CUBLAS_STATUS_EXECUTION_FAILED error right as it gets to step 2

I also resolved it by removing --mixed_precision="fp16" \

Copy link

@ackl I'll make sure A100 users won't face that issue in the future

Copy link

TheLastBen commented Oct 3, 2022

@ackl try and set it to "no" : --mixed_precision="no" \ instead of removing it
if it works, that would be easier for me to implement the change, looking forward to your feedback

@TheLastBen TheLastBen reopened this Oct 3, 2022
Copy link

I have fixed the precision issue for A100s, waiting for your confirmation to close the issue. Make sure you use the updated Colab Notebook

Copy link

ackl commented Oct 3, 2022

I can confirm it works with the latest commit that uses --mixed_precision="no" when GPU == A100. Thanks for the quick update!

Copy link

Thanks for the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

4 participants