Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: stable diffusion inside Docker crashes with error: ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed' #13785

Open
1 task done
l0ggik opened this issue Oct 28, 2023 · 5 comments
Labels
bug-report Report of a bug, yet to be confirmed

Comments

@l0ggik
Copy link

l0ggik commented Oct 28, 2023

Is there an existing issue for this?

  • I have searched the existing issues and checked the recent builds/commits

What happened?

I tried to use the rocm/pytorch Docker image but when starting stable-diffusion-webui following the install instructions i get the following error: ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

I tried downgrading to pytorch_lightning v1.7.7 and 1.6.5 but with no effect

Has anyone else this problem and know a solution?

Steps to reproduce the problem

  1. Install stable-diffusion-webui with Docker

What should have happened?

webui should have started without error

Sysinfo

Linux Mint 20.1 Ulyssa
RX 580

What browsers do you use to access the UI ?

No response

Console logs

Python 3.9.5 (default, Nov 23 2021, 15:27:38) 
[GCC 9.3.0]
Version: v1.6.0
Commit hash: 5ef669de080814067961f28357256e8fe27544f4
Launching Web UI with arguments: --precision full --no-half --skip-torch-cuda-test
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
Traceback (most recent call last):
  File "/dockerx/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/dockerx/stable-diffusion-webui/launch.py", line 44, in main
    start()
  File "/dockerx/stable-diffusion-webui/modules/launch_utils.py", line 432, in start
    import webui
  File "/dockerx/stable-diffusion-webui/webui.py", line 13, in <module>
    initialize.imports()
  File "/dockerx/stable-diffusion-webui/modules/initialize.py", line 33, in imports
    from modules import shared_init
  File "/dockerx/stable-diffusion-webui/modules/shared_init.py", line 5, in <module>
    from modules import shared
  File "/dockerx/stable-diffusion-webui/modules/shared.py", line 5, in <module>
    from modules import shared_cmd_options, shared_gradio_themes, options, shared_items, sd_models_types
  File "/dockerx/stable-diffusion-webui/modules/sd_models_types.py", line 1, in <module>
    from ldm.models.diffusion.ddpm import LatentDiffusion
  File "/dockerx/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 20, in <module>
    from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

Additional information

No response

@l0ggik l0ggik added the bug-report Report of a bug, yet to be confirmed label Oct 28, 2023
@gsteinLTU
Copy link

gsteinLTU commented Oct 29, 2023

I had to remove some extensions to get it working again after a similar error on a non-Docker install.

@TheNexter
Copy link

TheNexter commented Nov 24, 2023

Same here using main or dev branch 👍

6600 XT, ubuntu 23.10 using docker

no module 'xformers'. Processing without...
Traceback (most recent call last):
  File "/dockerx/stable-diffusion-webui/launch.py", line 48, in <module>
    main()
  File "/dockerx/stable-diffusion-webui/launch.py", line 44, in main
    start()
  File "/dockerx/stable-diffusion-webui/modules/launch_utils.py", line 432, in start
    import webui
  File "/dockerx/stable-diffusion-webui/webui.py", line 13, in <module>
    initialize.imports()
  File "/dockerx/stable-diffusion-webui/modules/initialize.py", line 33, in imports
    from modules import shared_init
  File "/dockerx/stable-diffusion-webui/modules/shared_init.py", line 5, in <module>
    from modules import shared
  File "/dockerx/stable-diffusion-webui/modules/shared.py", line 5, in <module>
    from modules import shared_cmd_options, shared_gradio_themes, options, shared_items, sd_models_types
  File "/dockerx/stable-diffusion-webui/modules/sd_models_types.py", line 1, in <module>
    from ldm.models.diffusion.ddpm import LatentDiffusion
  File "/dockerx/stable-diffusion-webui/repositories/stable-diffusion-stability-ai/ldm/models/diffusion/ddpm.py", line 20, in <module>
    from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'

@axxapy
Copy link

axxapy commented Dec 28, 2023

running pip install pytorch-lightning==1.6.5 helped me

source

@hchasens
Copy link

hchasens commented Mar 6, 2024

Can confirm it's still present in docker when using w/ ROCm.

@t3dc
Copy link

t3dc commented May 7, 2024

Running into this same issue. As suggested installing pytorch-lighting==1.6.5 helped, but only partly. Then I recieved:

No module named 'timm'

Running pip install timm got me past that but only to another crash:

AttributeError: 'NoneType' object has no attribute '_id' Creating model from config: /dockerx/stable-diffusion-webui/configs/v1-inference.yaml /opt/conda/envs/py_3.9/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_downloadis deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True. warnings.warn( loading stable diffusion model: RuntimeError Traceback (most recent call last): File "/opt/conda/envs/py_3.9/lib/python3.9/threading.py", line 937, in _bootstrap self._bootstrap_inner() File "/opt/conda/envs/py_3.9/lib/python3.9/threading.py", line 980, in _bootstrap_inner self.run() File "/opt/conda/envs/py_3.9/lib/python3.9/threading.py", line 917, in run self._target(*self._args, **self._kwargs) File "/dockerx/stable-diffusion-webui/modules/initialize.py", line 149, in load_model shared.sd_model # noqa: B018 File "/dockerx/stable-diffusion-webui/modules/shared_items.py", line 175, in sd_model return modules.sd_models.model_data.get_sd_model() File "/dockerx/stable-diffusion-webui/modules/sd_models.py", line 620, in get_sd_model load_model() File "/dockerx/stable-diffusion-webui/modules/sd_models.py", line 748, in load_model load_model_weights(sd_model, checkpoint_info, state_dict, timer) File "/dockerx/stable-diffusion-webui/modules/sd_models.py", line 393, in load_model_weights model.load_state_dict(state_dict, strict=False) File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 223, in <lambda> module_load_state_dict = self.replace(torch.nn.Module, 'load_state_dict', lambda *args, **kwargs: load_state_dict(module_load_state_dict, *args, **kwargs)) File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 221, in load_state_dict original(module, state_dict, strict=strict) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2139, in load_state_dict load(self, state_dict) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2127, in load load(child, child_state_dict, child_prefix) # noqa: F821 File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2127, in load load(child, child_state_dict, child_prefix) # noqa: F821 File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2127, in load load(child, child_state_dict, child_prefix) # noqa: F821 [Previous line repeated 1 more time] File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2121, in load module._load_from_state_dict( File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 225, in <lambda> linear_load_from_state_dict = self.replace(torch.nn.Linear, '_load_from_state_dict', lambda *args, **kwargs: load_from_state_dict(linear_load_from_state_dict, *args, **kwargs)) File "/dockerx/stable-diffusion-webui/modules/sd_disable_initialization.py", line 191, in load_from_state_dict module._parameters[name] = torch.nn.parameter.Parameter(torch.zeros_like(param, device=device, dtype=dtype), requires_grad=param.requires_grad) File "/opt/conda/envs/py_3.9/lib/python3.9/site-packages/torch/_meta_registrations.py", line 4820, in zeros_like res.fill_(0) RuntimeError: HIP error: invalid device function HIP kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing AMD_SERIALIZE_KERNEL=3. Compile with TORCH_USE_HIP_DSA to enable device-side assertions.

Anyone fought their way past this? I'm on an RX 6750XT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-report Report of a bug, yet to be confirmed
Projects
None yet
Development

No branches or pull requests

6 participants