Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot start DoRa training due to error #2589

Open
mi8m opened this issue Jun 13, 2024 · 6 comments
Open

Cannot start DoRa training due to error #2589

mi8m opened this issue Jun 13, 2024 · 6 comments

Comments

@mi8m
Copy link

mi8m commented Jun 13, 2024

My main settings (https://pastebin.com/0BMs5ft8) work fine, but while using them, I tried changing the network module to lycoris-locon with dora activated, but I got an error.

/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn( Traceback (most recent call last): File "/workspace/kohya_ss/sd-scripts/sdxl_train_network.py", line 185, in trainer.train(args) File "/workspace/kohya_ss/sd-scripts/train_network.py", line 864, in train noise_pred = self.call_unet( File "/workspace/kohya_ss/sd-scripts/sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 680, in forward return model_forward(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/utils/operations.py", line 668, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 1093, in call_module x = layer(x, emb) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 348, in forward x = torch.utils.checkpoint.checkpoint(create_custom_forward(self.forward_body), x, emb, use_reentrant=USE_REENTRANT) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn return fn(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint return CheckpointFunction.apply(function, preserve, *args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward outputs = run_function(*args) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 344, in custom_forward return func(*inputs) File "/workspace/kohya_ss/sd-scripts/library/sdxl_original_unet.py", line 331, in forward_body h = self.in_layers(x) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/container.py", line 215, in forward input = module(input) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/lycoris/modules/locon.py", line 246, in forward weight = self.apply_weight_decompose(weight) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/lycoris/modules/locon.py", line 207, in apply_weight_decompose return weight * (self.dora_scale / weight_norm) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! steps: 0%| | 0/3525 [00:00<?, ?it/s] Traceback (most recent call last): File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in sys.exit(main()) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main args.func(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command simple_launcher(args) File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', '/workspace/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', '/workspace/stuff/output/config_lora-20240613-115427.toml', '--network_train_unet_only', '--keep_tokens_separator', "'|||'", '--base_weights']' returned non-zero exit status 1.

so I just refreshed the page to get the default settings, choose lycoris-loha, choose dora, and this happens:

Traceback (most recent call last):
File "/workspace/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1017, in launch_command
simple_launcher(args)
File "/workspace/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/workspace/kohya_ss/venv/bin/python', '/workspace/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', '/workspace/stuff/output/config_lora-20240613-122335.toml']' died with <Signals.SIGKILL: 9>.

those are the "default" settings: https://pastebin.com/41Gq3rVA

I am sorry for the formatting on the first error and on the last setting, it was the only way I could recover them as I just closed runpod and had them like that on my ctrl + v.
Was using Runpod pytorch 2.0.1 template.

@mi8m
Copy link
Author

mi8m commented Jun 14, 2024

Dora option alone also seems to be dropping RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@nephi-dev
Copy link

Same error here, I tried updating lycoris to the latest version, but the error persists

@nephi-dev
Copy link

updating lycoris to the latest DEV version fixed it

@mi8m
Copy link
Author

mi8m commented Jun 16, 2024

could you explain the process for updating it?

@nephi-dev
Copy link

could you explain the process for updating it?

this should work ./venv/Scripts/activate && pip install lycoris_lora -U --pre

@mi8m
Copy link
Author

mi8m commented Jun 16, 2024

thx, gonna try it later on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants