Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A100 Support #27

Closed
delbalso opened this issue Sep 30, 2022 · 22 comments
Closed

A100 Support #27

delbalso opened this issue Sep 30, 2022 · 22 comments

Comments

@delbalso
Copy link

What can I do to help get a100 support?

@TheLastBen
Copy link
Owner

Thanks,

run :

!pip install git+https://github.com/facebookresearch/xformers@51dd119#egg=xformers

after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers

save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them.

the files might not show in the colab explorer, so you will have to rename them

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C.so /usr/local/lib/python3.7/dist-packages/xformers/C.py

!cp /usr/local/lib/python3.7/dist-packages/xformers/_C_flashattention.so /usr/local/lib/python3.7/dist-packages/xformers/C_flashattention.py

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

https://file.io/UkvT0KEU31MY
kept them as .py files

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

The notebook still doesn't work though. I get this error.

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--num_cpu_threads_per_process was set to 6 to improve out-of-box performance
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Downloading: 100% 543/543 [00:00<00:00, 565kB/s]
Fetching 16 files: 0% 0/16 [00:00<?, ?it/s]
Downloading: 100% 342/342 [00:00<00:00, 359kB/s]
Fetching 16 files: 6% 1/16 [00:00<00:11, 1.36it/s]
Downloading: 100% 4.56k/4.56k [00:00<00:00, 3.96MB/s]
Fetching 16 files: 19% 3/16 [00:01<00:05, 2.31it/s]
Downloading: 0% 0.00/1.22G [00:00<?, ?B/s]
Downloading: 0% 4.71M/1.22G [00:00<00:25, 47.1MB/s]
...
Downloading: 100% 1.22G/1.22G [00:16<00:00, 75.7MB/s]
Fetching 16 files: 25% 4/16 [00:17<01:12, 6.01s/it]
Downloading: 100% 209/209 [00:00<00:00, 193kB/s]
Fetching 16 files: 38% 6/16 [00:19<00:30, 3.01s/it]
Downloading: 100% 592/592 [00:00<00:00, 586kB/s]
Fetching 16 files: 44% 7/16 [00:19<00:20, 2.27s/it]
Downloading: 0% 0.00/492M [00:00<?, ?B/s]
Downloading: 1% 4.79M/492M [00:00<00:10, 47.9MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.9MB/s]
Fetching 16 files: 50% 8/16 [00:26<00:29, 3.71s/it]
Downloading: 0% 0.00/525k [00:00<?, ?B/s]
Downloading: 100% 525k/525k [00:00<00:00, 3.57MB/s]
Fetching 16 files: 56% 9/16 [00:27<00:19, 2.81s/it]
Downloading: 100% 472/472 [00:00<00:00, 408kB/s]
Fetching 16 files: 62% 10/16 [00:28<00:12, 2.16s/it]
Downloading: 100% 806/806 [00:00<00:00, 793kB/s]
Fetching 16 files: 69% 11/16 [00:28<00:08, 1.70s/it]
Downloading: 0% 0.00/1.06M [00:00<?, ?B/s]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.60MB/s]
Fetching 16 files: 75% 12/16 [00:29<00:05, 1.44s/it]
Downloading: 100% 743/743 [00:00<00:00, 641kB/s]
Fetching 16 files: 81% 13/16 [00:30<00:03, 1.20s/it]
Downloading: 0% 0.00/3.44G [00:00<?, ?B/s]
...
Downloading: 100% 3.44G/3.44G [00:45<00:00, 75.0MB/s]
Fetching 16 files: 88% 14/16 [01:16<00:29, 14.77s/it]
Downloading: 100% 522/522 [00:00<00:00, 454kB/s]
Fetching 16 files: 94% 15/16 [01:17<00:10, 10.53s/it]
Downloading: 0% 0.00/335M [00:00<?, ?B/s]
Downloading: 1% 4.71M/335M [00:00<00:07, 47.1MB/s]
...
Downloading: 100% 335M/335M [00:04<00:00, 76.1MB/s]
Fetching 16 files: 100% 16/16 [01:21<00:00, 5.12s/it]
Generating class images: 100% 3/3 [00:28<00:00, 9.61s/it]
Downloading: 100% 1.06M/1.06M [00:00<00:00, 6.61MB/s]
...
Downloading: 100% 492M/492M [00:06<00:00, 75.7MB/s]
Steps: 0% 0/800 [00:00<?, ?it/s]Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--use_auth_token', '--instance_data_dir=/content/drive/MyDrive/AI/DreamBooth/training_data/mike_pics_training_data', '--class_data_dir=/content/data/guy', '--output_dir=/content/drive/MyDrive/stable_diffusion_weights/sks', '--with_prior_preservation', '--prior_loss_weight=1.0', '--instance_prompt=photo of sks guy', '--class_prompt=photo of a guy', '--seed=1337', '--resolution=512', '--center_crop', '--train_batch_size=1', '--mixed_precision=fp16', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--num_class_images=12', '--sample_batch_size=4', '--max_train_steps=800']' died with <Signals.SIGABRT: 6>.

@TheLastBen
Copy link
Owner

Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?

@TheLastBen
Copy link
Owner

it looks like you missed the cell downloading the model

@liangwei191
Copy link

I get a100 at first too after I found the cost drain too fast.So I use menu runtime->reset factory runtime to random a gpu until get a usable one.

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ?

Yep

it looks like you missed the cell downloading the model

Why do you think that? In any case, I just downloaded it again.

I noticed that I copied the precompiled files wrong, but have now fixed them.

BTW the %%capture thing confused me because I didn't see an error.

Here's an update to the error I'm getting:

/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:99: UserWarning: /usr/lib64-nvidia did not contain libcudart.so as expected! Searching further paths...
f'{candidate_env_vars["LD_LIBRARY_PATH"]} did not contain '
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('6000,"kernelManagerProxyHost"'), PosixPath('"172.28.0.3","jupyterArgs"'), PosixPath('"/usr/local/bin/dap_multiplexer","enableLsp"'), PosixPath('["--ip=172.28.0.2"],"debugAdapterMultiplexerPath"'), PosixPath('true}'), PosixPath('{"kernelManagerProxyPort"')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//ipykernel.pylab.backend_inline'), PosixPath('module')}
"WARNING: The following directories listed in your path were found to "
/usr/local/lib/python3.7/dist-packages/bitsandbytes/cuda_setup/paths.py:21: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/env/python')}
"WARNING: The following directories listed in your path were found to "
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching /usr/local/cuda/lib64...
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.0
CUDA SETUP: Detected CUDA version 111
CUDA SETUP: Loading binary /usr/local/lib/python3.7/dist-packages/bitsandbytes/libbitsandbytes_cuda111.so...
Steps: 0% 2/2000 [00:05<1:14:31, 2.24s/it, loss=0.42, lr=5e-6] Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 606, in
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 550, in main
noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/accelerate/utils/operations.py", line 507, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/usr/local/lib/python3.7/dist-packages/torch/amp/autocast_mode.py", line 12, in decorate_autocast
return func(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_2d_condition.py", line 262, in forward
sample = self.mid_block(sample, emb, encoder_hidden_states=encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/unet_blocks.py", line 375, in forward
hidden_states = attn(hidden_states, encoder_hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 167, in forward
hidden_states = block(hidden_states, context=context)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 219, in forward
hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/diffusers/models/attention.py", line 451, in forward
return self.net(hidden_states)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
Steps: 0% 2/2000 [00:05<1:36:41, 2.90s/it, loss=0.42, lr=5e-6]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--pretrained_model_name_or_path=/content/gdrive/MyDrive/stable-diffusion-v1-4', '--instance_data_dir=/content/data/mikemdb', '--output_dir=/content/models/mikemdb', '--instance_prompt=photo of mikemdb man', '--seed=12345', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--use_8bit_adam', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=2000']' returned non-zero exit status 1.

Any ideas?

@TheLastBen
Copy link
Owner

If you're using the A100, I still didn't implement them in the colab, yet, I'll do it shortly

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

Yes I understand, I just placed the files in the right place manually.

FYI think I just got it working by killing

--use_8bit_adam \

and

--mixed_precision="fp16" \

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

How long does training take on other GPUs? It looks like 2000 steps on 512 resolution on an A100 on colab takes 30 mins

@TheLastBen
Copy link
Owner

it's because you removed the --use_8bit_adam \ and --mixed_precision="fp16"
make sure they are the cause for the error you're getting

@TheLastBen
Copy link
Owner

try leaving the --mixed_precision="fp16" \

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

I'm saying it only started working when I removed --mixed_precision="fp16" \

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

Should I set train_batch_size to the number of training instances I have?

@TheLastBen
Copy link
Owner

TheLastBen commented Oct 1, 2022

That is the number of models it trains on the same instance, best to keep it to one to save time

@delbalso
Copy link
Author

delbalso commented Oct 1, 2022

Thank you

@delbalso delbalso closed this as completed Oct 1, 2022
@ackl
Copy link

ackl commented Oct 3, 2022

i'm not sure if this issue should've been closed without making some changes in the notebooks? I have run into the exact same issue today, got an A100 and during training it would throw the same CUBLAS_STATUS_EXECUTION_FAILED error right as it gets to step 2

I also resolved it by removing --mixed_precision="fp16" \

@TheLastBen
Copy link
Owner

@ackl I'll make sure A100 users won't face that issue in the future

@TheLastBen
Copy link
Owner

TheLastBen commented Oct 3, 2022

@ackl try and set it to "no" : --mixed_precision="no" \ instead of removing it
if it works, that would be easier for me to implement the change, looking forward to your feedback

@TheLastBen TheLastBen reopened this Oct 3, 2022
@TheLastBen
Copy link
Owner

I have fixed the precision issue for A100s, waiting for your confirmation to close the issue. Make sure you use the updated Colab Notebook

@ackl
Copy link

ackl commented Oct 3, 2022

I can confirm it works with the latest commit that uses --mixed_precision="no" when GPU == A100. Thanks for the quick update!

@TheLastBen
Copy link
Owner

Thanks for the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants