Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(memory?) issue with stable_diffusion_v2_1_webui_colab when mounting Google Drive #21

Closed
system1system2 opened this issue Dec 28, 2022 · 38 comments

Comments

@system1system2
Copy link

system1system2 commented Dec 28, 2022

Hi. All your colab notebooks are amazing. Thanks for sharing them with the community.

I have a problem with one of them: stable_diffusion_v2_1_webui_colab

If I create a new cell to mount my Google Drive and run it before your cell to initialize SD2.1, the initialization interrupts half way and I get this output:

Python 3.8.16 (default, Dec 7 2022, 01:12:13)
[GCC 7.5.0]
Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into repositories/taming-transformers...
Cloning K-diffusion into repositories/k-diffusion...
Cloning CodeFormer into repositories/CodeFormer...
Cloning BLIP into repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements for Web UI
Launching Web UI with arguments: --share --force-enable-xformers
No module 'xformers'. Proceeding without it.
Cannot import xformers
Traceback (most recent call last):
File "/content/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 18, in
import xformers.ops
ModuleNotFoundError: No module named 'xformers.ops'; 'xformers' is not a package

Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [00:57<00:00, 69.0MB/s]
^C

I read that the ^C interrupt might indicate the system has run out of memory.

If I do not run the cell that mounts Google Drive, everything works fine.

Also, and this is where it's strange, if I run another of your Colab notebooks, like analog_diffusion_webui_colab, by running the cell that mounts Google Drive first, everything works fine, too.

@camenduru
Copy link
Owner

--force-enable-xformers obsolete please use --xformers

@system1system2
Copy link
Author

system1system2 commented Dec 29, 2022

Thanks for the quick reply, @camenduru

The issue is not related to the xformers module (even if it appeared in my quoted output).
Even before the flag --force-enable-xformers was rendered obsolete, I had this issue.
And now, after updating the notebook with the new flag --xformers, I still have it:

Python 3.8.16 (default, Dec 7 2022, 01:12:13)
[GCC 7.5.0]
Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into repositories/taming-transformers...
Cloning K-diffusion into repositories/k-diffusion...
Cloning CodeFormer into repositories/CodeFormer...
Cloning BLIP into repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements for Web UI
Launching Web UI with arguments: --share --xformers
Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [00:56<00:00, 70.3MB/s]
^C

This issue exclusively happens with this particular notebook (again, I don't have it with, for example, the Analog Diffusion module), and only if I try to mount my Google Drive before running the A1111 installation cell.

@camenduru
Copy link
Owner

this is working https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb
please compare your code with stable_diffusion_v2_1_webui_colab.ipynb

@camenduru camenduru reopened this Dec 29, 2022
@camenduru
Copy link
Owner

wait it is not working with gdrive interesting 😋

@system1system2
Copy link
Author

system1system2 commented Dec 29, 2022

I change nothing from your code. All I do is:

  • click on the colab button from you GitHub page. It opens it.
  • ask to copy it to my drive. It does it.
  • double-click on my copy from the Notebook folder in my GDrive. It opens it.
  • ask to mount my GDrive. It creates a new cell under your original cell.
  • move the cell up. It does it.
  • run my cell to mount my GDrive. It executes it correctly in 17s.
  • run your cell to install and launch A1111. Then I get the error.

I do exactly these steps with your colab for Analog Diffusion and it works flawlessly, as expected.

I have no clue why there's this difference in behaviour.

@camenduru
Copy link
Owner

stable_diffusion_v2_1_webui_colab

with gdrive (crashed) without gdrive
Screenshot 2022-12-29 144347 Screenshot 2022-12-29 144950

stable_diffusion_1_5_webui_colab with gdrive
Screenshot 2022-12-29 151319

stable_diffusion_v2_1 using too much system ram
without gdrive working
with gdrive and stable_diffusion_v2_1 not fitting the system ram 😨

@camenduru
Copy link
Owner

if we convert fp32 v2-1_768-ema-pruned.ckpt 5.21 GB to fp16 5.21/2 GB probably fits

@MitPitt
Copy link

MitPitt commented Jan 1, 2023

Had the same problem, figured out a workaround fix — crash the colab right before launching the UI, this will free up the RAM

Do this after downloading the models:

import os
os.kill(os.getpid(), 9)

This will crash the runtime.
Now reconnect and run:

%cd /content/stable-diffusion-webui
!python launch.py --share --xformers

@camenduru
Copy link
Owner

thanks @MitPitt ❤ good idea 🤩

@system1system2
Copy link
Author

Thanks, @MitPitt, but I still can't make it work.

I split @camenduru's original notebook into multiple cells as in the screenshot. I executed your recommended os.kill cell.
The environment crashes as expected and reconnects automatically.

Then I proceed launching A1111, but I still run out of system RAM.

What am I doing wrong?

Screenshot 2023-01-02 at 16 43 25

@camenduru camenduru reopened this Jan 2, 2023
@MitPitt
Copy link

MitPitt commented Jan 3, 2023

Google drive is taking RAM as well, I had this problem. You will have to download any needed files manually, without mounting the drive.
Use this command to download public files from google drive:

!curl -o train_images.zip -L 'https://drive.google.com/uc?export=download&confirm=yes&id=[ID]' # repalce ID

And you can find your file's ID by looking at the share link: https://drive.google.com/file/d/ABCDEFG/view?usp=share_link
Here, the ID is ABCDEFG

@camenduru
Copy link
Owner

hi @system1system2 👋 I converted to fp16 now 2.58 GB please use this with gdrive

https://huggingface.co/ckpt/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned-fp16.ckpt

@system1system2
Copy link
Author

Thank you so much for converting this. Unfortunately, I still have issues:

I have modified your colab notebook to download the correct file and save it with the old file name, so I don't have to rename the yaml file as well:

!wget https://huggingface.co/ckpt/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned-fp16.ckpt -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.ckpt
!wget https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/v2-inference-v.yaml -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml

It correctly downloads the half-precision variant (which is saved in the Colab drive as a 2.4G file), but then it insists in loading a 3.4GB file:

Screenshot 2023-01-03 at 10 51 23

and that's where it runs out of memory as usual:

Python 3.8.16 (default, Dec 7 2022, 01:12:13)
[GCC 7.5.0]
Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into repositories/taming-transformers...
Cloning K-diffusion into repositories/k-diffusion...
Cloning CodeFormer into repositories/CodeFormer...
Cloning BLIP into repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements for Web UI
Launching Web UI with arguments: --share --xformers
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [01:04<00:00, 61.2MB/s]
^C

Also notice that during the process, python raises a new error:

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'

I don't know if it's important or not as I cannot load the UI to test image generation.

@camenduru
Copy link
Owner

at this point, I am thinking that there may be a memory leak in the code 🤔

@MisoSpree
Copy link

MisoSpree commented Jan 6, 2023

Agree with all above. I tried just installing the WebUI without connecting my drive. It died the same death as described above.

Launching Web UI with arguments: --share --xformers
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/768-v-ema.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [00:56<00:00, 69.6MB/s]
^C

And I switched to this latest version because I can't fix the "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)" on the older version that I used to be able to fix. (Now, none of the suggested edits to the ddpm file work.) I would so dearly love to train more embeddings but I can't seem to find a version that runs for me on Colab (with a paid account.)

Edit: But I did get the WebUI running from midjourney_v4_diffusion_webui_colab.ipynb before attaching the Google Drive. (Now trying to mount the drive does nothing. No pop-up, no error message, no mount.) And also the runtime error about indices is still a problem. I am really sad about this.

@inu-ai
Copy link

inu-ai commented Jan 6, 2023

Forking and patching the stablediffusion repository of Stability-AI will bring it within 12 GB.
Here is a similar Issue.
(Translated at DeepL)

ddPn08/automatic1111-colab#16
ddPn08/automatic1111-colab@2748452

@camenduru
Copy link
Owner

@thx-pw さん、ありがとうございます。 ❤ ❤

@camenduru
Copy link
Owner

!sed -i -e '''/prepare_environment()/a\ os.system\(f\"""sed -i -e ''\"s/dict()))/dict())).cuda()/g\"'' /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py""")''' /content/stable-diffusion-webui/launch.py

@camenduru
Copy link
Owner

@thx-pwさん、こちらも動作しています return get_obj_from_str(config["target"])(**config.get("params", dict())).cuda()

@camenduru
Copy link
Owner

ご確認ください

@inu-ai
Copy link

inu-ai commented Jan 6, 2023

You can do it in one line.
That's smarter.

@MisoSpree
Copy link

MisoSpree commented Jan 6, 2023

@system1system2 please try this https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb

I am running. So far, sed is throwing "no such file or directory" errors (for both sed calls). Edit: but apparently it doesn't matter? I couldn't mount my Google drive, but I just uploaded my training images and am now training an embedding.

@camenduru
Copy link
Owner

hi @MisoSpree 👋 sed is working we are getting this message because we are using sed inside sed before getting the file from repo little trick hehe

sed: can't read /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py: No such file or directory
sed: can't read /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py: No such file or directory

@MisoSpree
Copy link

Roger that. Ignoring error messages is right up my alley.

@MisoSpree
Copy link

Note that when training an embedding, the loss is reported as a NaN:

[Epoch 499: 10/10]loss: nan: 10% 4999/50000 [45:34<6:49:13, 1.83it/s]

And the image put out every N steps is just black. Looks like something is broken still.

@camenduru
Copy link
Owner

oh no 😐

@camenduru camenduru reopened this Jan 6, 2023
@system1system2
Copy link
Author

system1system2 commented Jan 6, 2023

@camenduru believe or not, it woks (at least for ordinary txt2img generations - I didn't try to train an embedding like @MisoSpree). The sed weird trick worked, but you might want to say something about it in the documentation or you'll have an avalanche of people reporting the same No such file or directory error that @MisoSpree reported.

Thanks for the patience in fixing this. I'm training without any issues this morning thanks to you.

@ddPn08
Copy link

ddPn08 commented Jan 6, 2023

In my environment, I had no problem learning embedding.

@camenduru
Copy link
Owner

hi @ddPn08 can you train without black example output? please show us how

@camenduru
Copy link
Owner

I tried, and I also got a black output 😭

@ddPn08
Copy link

ddPn08 commented Jan 6, 2023

I created embedding from the train tab of AUTOMATIC1111 and trained without changing any settings.
I tested it on my notebook, so I'll try it on this one too.

@camenduru
Copy link
Owner

camenduru commented Jan 6, 2023

@MisoSpree
Copy link

MisoSpree commented Jan 6, 2023

I tried, and I also got a black output 😭

I am glad I am not the only one. Were you seeing loss reported as NaN?

Edit: Just to check, I did this again today. This is in Colab. Today I first connected my Google Drive. (This is different from the last time when I didn't connect the google drive at all.) Then I ran https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb. Everything installed. I generated a single text-to-image (which I always do as a test when I get the WebUI open.) That worked fine. Then I created an embedding and ran the training. Still, loss is being reported as NaN and the first output image was all black. Then I stopped the training.

@inu-ai
Copy link

inu-ai commented Jan 7, 2023

Even after removing the low RAM patch, I am still getting the nan error in learning.
So I tried everything and when I changed from SD2.1 to WD1.4e1, the nan error was gone.
https://huggingface.co/hakurei/waifu-diffusion-v1-4/tree/main
I don't know why.

@system1system2
Copy link
Author

Just a quick note to let you know, @camenduru, that this new version of the notebook runs out of memory again :)

The problem is the single sed command, in place of the previous two.

If you replace it with the previous two lines below, the notebook works just fine, including triton installation and the new CivitAI extension:

!sed -i -e '''/prepare_environment()/a\ os.system(f"""sed -i -e ''"s/self.logvar\[t\]/self.logvar\[t.item()\]/g"'' /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py""")''' /content/stable-diffusion-webui/launch.py
!sed -i -e '''/prepare_environment()/a\ os.system(f"""sed -i -e ''"s/dict()))/dict())).cuda()/g"'' /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py""")''' /content/stable-diffusion-webui/launch.py

@camenduru
Copy link
Owner

I tested it this one and it worked with gdrive I didn't change anything maybe you are getting less ram I got 12.68GB
https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb
Screenshot 2023-01-13 162026

@system1system2
Copy link
Author

Same amount. Not sure why it works with the two sed lines but fails with a single one.
At this point, it's up to you. We can close this issue as is (as it works for me, at least with this specific configuration) or leave it open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants