New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A100 Support #27
Comments
Thanks, run :
after around 40min, and the installation is done, navigate to /usr/local/lib/python3.7/dist-packages/xformers save the two files : "_C_flashattention.so" and "_C.so", upload them to any host and send me the link and I will integrate them. the files might not show in the colab explorer, so you will have to rename them
|
https://file.io/UkvT0KEU31MY |
The notebook still doesn't work though. I get this error.
|
Thank you very much for the files, have you accepted the terms in https://huggingface.co/CompVis/stable-diffusion-v1-4 ? |
it looks like you missed the cell downloading the model |
I get a100 at first too after I found the cost drain too fast.So I use menu runtime->reset factory runtime to random a gpu until get a usable one. |
Yep
Why do you think that? In any case, I just downloaded it again. I noticed that I copied the precompiled files wrong, but have now fixed them. BTW the Here's an update to the error I'm getting:
Any ideas? |
If you're using the A100, I still didn't implement them in the colab, yet, I'll do it shortly |
Yes I understand, I just placed the files in the right place manually. FYI think I just got it working by killing
and
|
How long does training take on other GPUs? It looks like 2000 steps on 512 resolution on an A100 on colab takes 30 mins |
it's because you removed the --use_8bit_adam \ and --mixed_precision="fp16" |
try leaving the --mixed_precision="fp16" \ |
I'm saying it only started working when I removed |
Should I set |
That is the number of models it trains on the same instance, best to keep it to one to save time |
Thank you |
i'm not sure if this issue should've been closed without making some changes in the notebooks? I have run into the exact same issue today, got an A100 and during training it would throw the same I also resolved it by removing |
@ackl I'll make sure A100 users won't face that issue in the future |
@ackl try and set it to "no" : --mixed_precision="no" \ instead of removing it |
I have fixed the precision issue for A100s, waiting for your confirmation to close the issue. Make sure you use the updated Colab Notebook |
I can confirm it works with the latest commit that uses |
Thanks for the feedback |
What can I do to help get a100 support?
The text was updated successfully, but these errors were encountered: