(memory?) issue with stable_diffusion_v2_1_webui_colab when mounting Google Drive #21

system1system2 · 2022-12-28T13:36:23Z

Hi. All your colab notebooks are amazing. Thanks for sharing them with the community.

I have a problem with one of them: stable_diffusion_v2_1_webui_colab

If I create a new cell to mount my Google Drive and run it before your cell to initialize SD2.1, the initialization interrupts half way and I get this output:

Python 3.8.16 (default, Dec 7 2022, 01:12:13)
[GCC 7.5.0]
Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into repositories/taming-transformers...
Cloning K-diffusion into repositories/k-diffusion...
Cloning CodeFormer into repositories/CodeFormer...
Cloning BLIP into repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements for Web UI
Launching Web UI with arguments: --share --force-enable-xformers
No module 'xformers'. Proceeding without it.
Cannot import xformers
Traceback (most recent call last):
File "/content/stable-diffusion-webui/modules/sd_hijack_optimizations.py", line 18, in
import xformers.ops
ModuleNotFoundError: No module named 'xformers.ops'; 'xformers' is not a package

Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [00:57<00:00, 69.0MB/s]
^C

I read that the ^C interrupt might indicate the system has run out of memory.

If I do not run the cell that mounts Google Drive, everything works fine.

Also, and this is where it's strange, if I run another of your Colab notebooks, like analog_diffusion_webui_colab, by running the cell that mounts Google Drive first, everything works fine, too.

The text was updated successfully, but these errors were encountered:

camenduru · 2022-12-29T00:03:53Z

--force-enable-xformers obsolete please use --xformers

system1system2 · 2022-12-29T10:44:21Z

Thanks for the quick reply, @camenduru

The issue is not related to the xformers module (even if it appeared in my quoted output).
Even before the flag --force-enable-xformers was rendered obsolete, I had this issue.
And now, after updating the notebook with the new flag --xformers, I still have it:

Python 3.8.16 (default, Dec 7 2022, 01:12:13)
[GCC 7.5.0]
Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into repositories/taming-transformers...
Cloning K-diffusion into repositories/k-diffusion...
Cloning CodeFormer into repositories/CodeFormer...
Cloning BLIP into repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements for Web UI
Launching Web UI with arguments: --share --xformers
Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [00:56<00:00, 70.3MB/s]
^C

This issue exclusively happens with this particular notebook (again, I don't have it with, for example, the Analog Diffusion module), and only if I try to mount my Google Drive before running the A1111 installation cell.

camenduru · 2022-12-29T11:03:49Z

this is working https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb
please compare your code with stable_diffusion_v2_1_webui_colab.ipynb

camenduru · 2022-12-29T11:12:46Z

wait it is not working with gdrive interesting 😋

system1system2 · 2022-12-29T11:16:28Z

I change nothing from your code. All I do is:

click on the colab button from you GitHub page. It opens it.
ask to copy it to my drive. It does it.
double-click on my copy from the Notebook folder in my GDrive. It opens it.
ask to mount my GDrive. It creates a new cell under your original cell.
move the cell up. It does it.
run my cell to mount my GDrive. It executes it correctly in 17s.
run your cell to install and launch A1111. Then I get the error.

I do exactly these steps with your colab for Analog Diffusion and it works flawlessly, as expected.

I have no clue why there's this difference in behaviour.

camenduru · 2022-12-29T12:16:11Z

stable_diffusion_v2_1_webui_colab

with gdrive (crashed)	without gdrive

stable_diffusion_1_5_webui_colab with gdrive

stable_diffusion_v2_1 using too much system ram
without gdrive working
with gdrive and stable_diffusion_v2_1 not fitting the system ram 😨

camenduru · 2022-12-29T12:21:41Z

if we convert fp32 v2-1_768-ema-pruned.ckpt 5.21 GB to fp16 5.21/2 GB probably fits

MitPitt · 2023-01-01T01:25:13Z

Had the same problem, figured out a workaround fix — crash the colab right before launching the UI, this will free up the RAM

Do this after downloading the models:

import os
os.kill(os.getpid(), 9)

This will crash the runtime.
Now reconnect and run:

%cd /content/stable-diffusion-webui
!python launch.py --share --xformers

camenduru · 2023-01-01T04:18:03Z

thanks @MitPitt ❤ good idea 🤩

system1system2 · 2023-01-02T16:46:19Z

Thanks, @MitPitt, but I still can't make it work.

I split @camenduru's original notebook into multiple cells as in the screenshot. I executed your recommended os.kill cell.
The environment crashes as expected and reconnects automatically.

Then I proceed launching A1111, but I still run out of system RAM.

What am I doing wrong?

MitPitt · 2023-01-03T00:48:25Z

Google drive is taking RAM as well, I had this problem. You will have to download any needed files manually, without mounting the drive.
Use this command to download public files from google drive:

!curl -o train_images.zip -L 'https://drive.google.com/uc?export=download&confirm=yes&id=[ID]' # repalce ID

And you can find your file's ID by looking at the share link: https://drive.google.com/file/d/ABCDEFG/view?usp=share_link
Here, the ID is ABCDEFG

camenduru · 2023-01-03T10:05:56Z

hi @system1system2 👋 I converted to fp16 now 2.58 GB please use this with gdrive

https://huggingface.co/ckpt/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned-fp16.ckpt

system1system2 · 2023-01-03T10:52:28Z

Thank you so much for converting this. Unfortunately, I still have issues:

I have modified your colab notebook to download the correct file and save it with the old file name, so I don't have to rename the yaml file as well:

!wget https://huggingface.co/ckpt/stable-diffusion-2-1/resolve/main/v2-1_768-ema-pruned-fp16.ckpt -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.ckpt
!wget https://raw.githubusercontent.com/Stability-AI/stablediffusion/main/configs/stable-diffusion/v2-inference-v.yaml -O /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml

It correctly downloads the half-precision variant (which is saved in the Colab drive as a 2.4G file), but then it insists in loading a 3.4GB file:

and that's where it runs out of memory as usual:

Python 3.8.16 (default, Dec 7 2022, 01:12:13)
[GCC 7.5.0]
Commit hash: 4af3ca5393151d61363c30eef4965e694eeac15e
Installing gfpgan
Installing clip
Installing open_clip
Cloning Stable Diffusion into repositories/stable-diffusion-stability-ai...
Cloning Taming Transformers into repositories/taming-transformers...
Cloning K-diffusion into repositories/k-diffusion...
Cloning CodeFormer into repositories/CodeFormer...
Cloning BLIP into repositories/BLIP...
Installing requirements for CodeFormer
Installing requirements for Web UI
Launching Web UI with arguments: --share --xformers
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/v2-1_768-ema-pruned.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [01:04<00:00, 61.2MB/s]
^C

Also notice that during the process, python raises a new error:

A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'

I don't know if it's important or not as I cannot load the UI to test image generation.

camenduru · 2023-01-05T23:10:55Z

at this point, I am thinking that there may be a memory leak in the code 🤔

MisoSpree · 2023-01-06T00:46:45Z

Agree with all above. I tried just installing the WebUI without connecting my drive. It died the same death as described above.

Launching Web UI with arguments: --share --xformers
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Loading config from: /content/stable-diffusion-webui/models/Stable-diffusion/768-v-ema.yaml
LatentDiffusion: Running in v-prediction mode
DiffusionWrapper has 865.91 M params.
Downloading: 100% 3.94G/3.94G [00:56<00:00, 69.6MB/s]
^C

And I switched to this latest version because I can't fix the "RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)" on the older version that I used to be able to fix. (Now, none of the suggested edits to the ddpm file work.) I would so dearly love to train more embeddings but I can't seem to find a version that runs for me on Colab (with a paid account.)

Edit: But I did get the WebUI running from midjourney_v4_diffusion_webui_colab.ipynb before attaching the Google Drive. (Now trying to mount the drive does nothing. No pop-up, no error message, no mount.) And also the runtime error about indices is still a problem. I am really sad about this.

inu-ai · 2023-01-06T01:00:31Z

Forking and patching the stablediffusion repository of Stability-AI will bring it within 12 GB.
Here is a similar Issue.
(Translated at DeepL)

ddPn08/automatic1111-colab#16
ddPn08/automatic1111-colab@2748452

camenduru · 2023-01-06T01:07:57Z

@thx-pw さん、ありがとうございます。 ❤ ❤

camenduru · 2023-01-06T01:48:35Z

!sed -i -e '''/prepare_environment()/a\ os.system\(f\"""sed -i -e ''\"s/dict()))/dict())).cuda()/g\"'' /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py""")''' /content/stable-diffusion-webui/launch.py

camenduru · 2023-01-06T01:52:04Z

@thx-pwさん、こちらも動作しています return get_obj_from_str(config["target"])(**config.get("params", dict())).cuda()

camenduru · 2023-01-06T01:53:50Z

ご確認ください

camenduru · 2023-01-06T02:00:24Z

@system1system2 please try this https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb

inu-ai · 2023-01-06T02:10:26Z

You can do it in one line.
That's smarter.

MisoSpree · 2023-01-06T02:48:47Z

@system1system2 please try this https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb

I am running. So far, sed is throwing "no such file or directory" errors (for both sed calls). Edit: but apparently it doesn't matter? I couldn't mount my Google drive, but I just uploaded my training images and am now training an embedding.

camenduru · 2023-01-06T03:10:48Z

hi @MisoSpree 👋 sed is working we are getting this message because we are using sed inside sed before getting the file from repo little trick hehe

sed: can't read /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py: No such file or directory
sed: can't read /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py: No such file or directory

MisoSpree · 2023-01-06T03:16:56Z

Roger that. Ignoring error messages is right up my alley.

MisoSpree · 2023-01-06T04:28:35Z

Note that when training an embedding, the loss is reported as a NaN:

[Epoch 499: 10/10]loss: nan: 10% 4999/50000 [45:34<6:49:13, 1.83it/s]

And the image put out every N steps is just black. Looks like something is broken still.

camenduru · 2023-01-06T04:32:53Z

oh no 😐

system1system2 · 2023-01-06T09:36:37Z

@camenduru believe or not, it woks (at least for ordinary txt2img generations - I didn't try to train an embedding like @MisoSpree). The sed weird trick worked, but you might want to say something about it in the documentation or you'll have an avalanche of people reporting the same No such file or directory error that @MisoSpree reported.

Thanks for the patience in fixing this. I'm training without any issues this morning thanks to you.

ddPn08 · 2023-01-06T14:50:47Z

In my environment, I had no problem learning embedding.

camenduru · 2023-01-06T15:32:08Z

hi @ddPn08 can you train without black example output? please show us how

camenduru · 2023-01-06T15:33:33Z

I tried, and I also got a black output 😭

ddPn08 · 2023-01-06T15:36:38Z

I created embedding from the train tab of AUTOMATIC1111 and trained without changing any settings.
I tested it on my notebook, so I'll try it on this one too.

camenduru · 2023-01-06T15:38:01Z

did you use this colab https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb

MisoSpree · 2023-01-06T17:32:28Z

I tried, and I also got a black output 😭

I am glad I am not the only one. Were you seeing loss reported as NaN?

Edit: Just to check, I did this again today. This is in Colab. Today I first connected my Google Drive. (This is different from the last time when I didn't connect the google drive at all.) Then I ran https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb. Everything installed. I generated a single text-to-image (which I always do as a test when I get the WebUI open.) That worked fine. Then I created an embedding and ran the training. Still, loss is being reported as NaN and the first output image was all black. Then I stopped the training.

inu-ai · 2023-01-07T02:23:49Z

Even after removing the low RAM patch, I am still getting the nan error in learning.
So I tried everything and when I changed from SD2.1 to WD1.4e1, the nan error was gone.
https://huggingface.co/hakurei/waifu-diffusion-v1-4/tree/main
I don't know why.

system1system2 · 2023-01-13T11:18:34Z

Just a quick note to let you know, @camenduru, that this new version of the notebook runs out of memory again :)

The problem is the single sed command, in place of the previous two.

If you replace it with the previous two lines below, the notebook works just fine, including triton installation and the new CivitAI extension:

!sed -i -e '''/prepare_environment()/a\ os.system(f"""sed -i -e ''"s/self.logvar\[t\]/self.logvar\[t.item()\]/g"'' /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py""")''' /content/stable-diffusion-webui/launch.py
!sed -i -e '''/prepare_environment()/a\ os.system(f"""sed -i -e ''"s/dict()))/dict())).cuda()/g"'' /content/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/util.py""")''' /content/stable-diffusion-webui/launch.py

camenduru · 2023-01-13T13:22:53Z

I tested it this one and it worked with gdrive I didn't change anything maybe you are getting less ram I got 12.68GB
https://github.com/camenduru/stable-diffusion-webui-colab/blob/main/stable_diffusion_v2_1_webui_colab.ipynb

system1system2 · 2023-01-13T13:49:25Z

Same amount. Not sure why it works with the two sed lines but fails with a single one.
At this point, it's up to you. We can close this issue as is (as it works for me, at least with this specific configuration) or leave it open.

camenduru closed this as completed Dec 29, 2022

camenduru reopened this Dec 29, 2022

camenduru closed this as completed Jan 1, 2023

camenduru reopened this Jan 2, 2023

inu-ai mentioned this issue Jan 6, 2023

RAM Problem TheLastBen/fast-stable-diffusion#823

Open

camenduru closed this as completed Jan 6, 2023

camenduru reopened this Jan 6, 2023

camenduru closed this as completed Jan 19, 2023

(memory?) issue with stable_diffusion_v2_1_webui_colab when mounting Google Drive #21

(memory?) issue with stable_diffusion_v2_1_webui_colab when mounting Google Drive #21

Comments

system1system2 commented Dec 28, 2022 • edited

camenduru commented Dec 29, 2022

system1system2 commented Dec 29, 2022 • edited

camenduru commented Dec 29, 2022

camenduru commented Dec 29, 2022

system1system2 commented Dec 29, 2022 • edited

camenduru commented Dec 29, 2022

camenduru commented Dec 29, 2022

MitPitt commented Jan 1, 2023

camenduru commented Jan 1, 2023

system1system2 commented Jan 2, 2023

MitPitt commented Jan 3, 2023

camenduru commented Jan 3, 2023

system1system2 commented Jan 3, 2023

camenduru commented Jan 5, 2023

MisoSpree commented Jan 6, 2023 • edited

inu-ai commented Jan 6, 2023

camenduru commented Jan 6, 2023

camenduru commented Jan 6, 2023

camenduru commented Jan 6, 2023

camenduru commented Jan 6, 2023

camenduru commented Jan 6, 2023

inu-ai commented Jan 6, 2023

MisoSpree commented Jan 6, 2023 • edited

camenduru commented Jan 6, 2023

MisoSpree commented Jan 6, 2023

MisoSpree commented Jan 6, 2023

camenduru commented Jan 6, 2023

system1system2 commented Jan 6, 2023 • edited

ddPn08 commented Jan 6, 2023

camenduru commented Jan 6, 2023

camenduru commented Jan 6, 2023

ddPn08 commented Jan 6, 2023

camenduru commented Jan 6, 2023 • edited

MisoSpree commented Jan 6, 2023 • edited

inu-ai commented Jan 7, 2023

system1system2 commented Jan 13, 2023

camenduru commented Jan 13, 2023

system1system2 commented Jan 13, 2023

system1system2 commented Dec 28, 2022 •

edited

system1system2 commented Dec 29, 2022 •

edited

system1system2 commented Dec 29, 2022 •

edited

MisoSpree commented Jan 6, 2023 •

edited

MisoSpree commented Jan 6, 2023 •

edited

system1system2 commented Jan 6, 2023 •

edited

camenduru commented Jan 6, 2023 •

edited

MisoSpree commented Jan 6, 2023 •

edited