Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lycoris extreme slow training after update to latest version #167

Closed
killerciao opened this issue Mar 29, 2024 · 21 comments
Closed

Lycoris extreme slow training after update to latest version #167

killerciao opened this issue Mar 29, 2024 · 21 comments

Comments

@killerciao
Copy link

After updating to the latest release Lycoris/LoCon (the only one i tested) training in kohya_ss is super slow. With my 4090 with the same setting loaded, in the previous version was running at about 1it/s for SDXL training, now 4-10s/it.
Nothing has changed in the training data.
Uninstalling 2.2post3 and installing 2.1.0post2 fixes the problem

@killerciao
Copy link
Author

killerciao commented Mar 29, 2024

V2.2post3

Screenshot 2024-03-29 113124
V2.1.0post2

Screenshot 2024-03-29 113057

@KohakuBlueleaf
Copy link
Owner

please provide full configuration.

@killerciao
Copy link
Author

killerciao commented Mar 29, 2024

These are the setting used in both Versions

{
"LoRA_type": "LyCORIS/LoCon",
"LyCORIS_preset": "full",
"adaptive_noise_scale": 0,
"additional_parameters": "--max_grad_norm=0",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 64,
"bypass_mode": false,
"cache_latents": true,
"cache_latents_to_disk": true,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": ".txt",
"clip_skip": "1",
"color_aug": false,
"constrain": 0.0,
"conv_alpha": 16,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 32,
"dataset_config": "",
"debiased_estimation_loss": false,
"decompose_both": false,
"dim_from_weights": false,
"dora_wd": false,
"down_lr_weight": "",
"enable_bucket": true,
"epoch": 6,
"factor": -1,
"flip_aug": false,
"fp8_base": false,
"full_bf16": false,
"full_fp16": false,
"gpu_ids": "",
"gradient_accumulation_steps": 1,
"gradient_checkpointing": false,
"keep_tokens": "0",
"learning_rate": 1.0,
"log_tracker_config": "",
"log_tracker_name": "",
"logging_dir": "H:/t00nstyle/log",
"lora_network_weights": "",
"lr_scheduler": "cosine",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "0",
"max_grad_norm": 1,
"max_resolution": "1024,1024",
"max_timestep": 1000,
"max_token_length": "225",
"max_train_epochs": "",
"max_train_steps": "",
"mem_eff_attn": false,
"mid_lr_weight": "",
"min_bucket_reso": 256,
"min_snr_gamma": 0,
"min_timestep": 0,
"mixed_precision": "bf16",
"model_list": "custom",
"module_dropout": 0,
"multi_gpu": false,
"multires_noise_discount": 0.3,
"multires_noise_iterations": 6,
"network_alpha": 16,
"network_dim": 32,
"network_dropout": 0,
"noise_offset": 0.0357,
"noise_offset_type": "Multires",
"num_cpu_threads_per_process": 2,
"num_machines": 1,
"num_processes": 1,
"optimizer": "Prodigy",
"optimizer_args": "decouple=True weight_decay=0.5 betas=0.9,0.99 use_bias_correction=False",
"output_dir": "H:/t00nstyle/model",
"output_name": "t00nstylev1PonySDXL",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "F:/IA FILES/Models/Stablediffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"rank_dropout_scale": false,
"reg_data_dir": "",
"rescaled": false,
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 250,
"sample_prompts": "score_9, score_8_up, score_7_up,flashing tits, nipples, looking at viewer, tongue out, wink, in pool, bikini, t00nstyle --n low quality, worst quality, bad anatomy,bad composition, poor, low effort --w 1024 --h 1024 --d 1 --l 7 --s 28",
"sample_sampler": "euler_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "bf16",
"save_state": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 1,
"sdxl": true,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "12345",
"shuffle_caption": true,
"stop_text_encoder_training_pct": 0,
"text_encoder_lr": 1.0,
"train_batch_size": 1,
"train_data_dir": "H:/t00nstyle/img",
"train_norm": false,
"train_on_input": false,
"training_comment": "",
"unet_lr": 1.0,
"unit": 1,
"up_lr_weight": "",
"use_cp": true,
"use_scalar": false,
"use_tucker": false,
"use_wandb": false,
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae": "",
"vae_batch_size": 0,
"wandb_api_key": "",
"wandb_run_name": "",
"weighted_captions": false,
"xformers": "xformers"
}

@KohakuBlueleaf
Copy link
Owner

What's your hardware

@killerciao
Copy link
Author

killerciao commented Mar 31, 2024

What's your hardware

Win 11, Nvidia 4090, 64gb ddr5 ram, i9 13900k, running from nvme 4th gen

@KohakuBlueleaf
Copy link
Owner

I do some test and cannot reproduce this error...
But I'm using plain kohya-ss/sd-scripts
Will try to use the GUI to test.

I'm using almost same hardware (64GB -> 128GB ram) as yours so it should be ok

@whythisusername
Copy link

@KohakuBlueleaf
Since someone already reported this issue, can confirm something changed specifically after 2.2.0.dev7 update (tested prev and after versions) that drops the speed from ~1.5 it/s to ~1.5s/it with 4090 training using this config, its for easy-scripts

@KohakuBlueleaf
Copy link
Owner

@whythisusername Can you try drouput=0 (all kind of dropout)?
Maybe something with dropout goes wrong

@whythisusername
Copy link

@KohakuBlueleaf
Yeah, here is some speed measurements between two versions, dropout significantly affects the speed with the latest, still a little slower than old version without it though

  • 2.2.0.dev7 with dropout: 1321/2500 [17:22<15:30, 1.27it/s, avr_loss=0.107]

  • 2.2.0.dev7 no dropout: 587/2500 [07:32<24:34, 1.30it/s, avr_loss=0.105]

  • 2.3.0.dev6 with dropout: 533/2500 [10:59<40:33, 1.24s/it, avr_loss=0.115]

  • 2.3.0.dev6 no dropout: 721/2500 [10:31<25:57, 1.14it/s, avr_loss=0.0954]

@KohakuBlueleaf
Copy link
Owner

@whythisusername does 2.3.0.dev10 still so slow?

@whythisusername
Copy link

@KohakuBlueleaf
yes, haven't significantly changed

  • 2.3.0.dev10 with dropout: 1021/2500 [20:14<29:18, 1.19s/it, avr_loss=0.102]

  • 2.3.0.dev10 no dropout: 1174/2500 [17:35<19:52, 1.11it/s, avr_loss=0.0961]

@KohakuBlueleaf
Copy link
Owner

@whythisusername @killerciao Can you try 3.0.0.dev4?
I totally reconstruct the whole library structure.
Also avoid some redundant operation. I wonder if it will be better now

@killerciao
Copy link
Author

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

  • 4s/it 3.0.0dev4
  • 1.25s/it 2.1.0post2

🤷‍♂️

@KohakuBlueleaf
Copy link
Owner

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

  • 4s/it 3.0.0dev4
  • 1.25s/it 2.1.0post2

🤷‍♂️

umm ok

I think it is due to some dtype things

@KohakuBlueleaf
Copy link
Owner

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

  • 4s/it 3.0.0dev4
  • 1.25s/it 2.1.0post2

🤷‍♂️

can you try other algorithm?
LoKr in my env have same speed across different version

@killerciao
Copy link
Author

LoKr:
1.3s/it 3.0.0dev4
Crash/ 2.1.0post2
Traceback (most recent call last): File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "H:\kohya_ss\sd-scripts\train_network.py", line 864, in train noise_pred = self.call_unet( File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "H:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module x = layer(x, context) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 673, in forward output = self.forward_body(hidden_states, context, timestep) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 655, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 599, in forward hidden_states = module(hidden_states) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 342, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/3588 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module> File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['H:\\kohya_ss\\venv\\Scripts\\python.exe', 'H:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'H:/alariko/model/config_lora-20240525-123304.toml', '--max_grad_norm=0']' returned non-zero exit status 1.

@KohakuBlueleaf
Copy link
Owner

LoKr: 1.3s/it 3.0.0dev4 Crash/ 2.1.0post2 Traceback (most recent call last): File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "H:\kohya_ss\sd-scripts\train_network.py", line 864, in train noise_pred = self.call_unet( File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "H:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module x = layer(x, context) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 673, in forward output = self.forward_body(hidden_states, context, timestep) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 655, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 599, in forward hidden_states = module(hidden_states) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 342, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/3588 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module> File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['H:\\kohya_ss\\venv\\Scripts\\python.exe', 'H:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'H:/alariko/model/config_lora-20240525-123304.toml', '--max_grad_norm=0']' returned non-zero exit status 1.

Thx, looks like locon have some bugs
lokr and loha may be fine

@whythisusername
Copy link

@KohakuBlueleaf
It's way more better now with my config, but still a little slower than it was with 2.2.0.dev7

  • 3.0.0.dev4 with dropout: 1224/2500 [17:51<18:36, 1.14it/s, avr_loss=0.0945]
  • 3.0.0.dev4 no dropout: 1246/2500 [17:51<17:58, 1.16it/s, avr_loss=0.0945]
  • 2.2.0.dev7 with dropout: 1964/2500 [25:01<06:49, 1.31it/s, avr_loss=0.0993]

@KohakuBlueleaf
Copy link
Owner

@KohakuBlueleaf It's way more better now with my config, but still a little slower than it was with 2.2.0.dev7

  • 3.0.0.dev4 with dropout: 1224/2500 [17:51<18:36, 1.14it/s, avr_loss=0.0945]
  • 3.0.0.dev4 no dropout: 1246/2500 [17:51<17:58, 1.16it/s, avr_loss=0.0945]
  • 2.2.0.dev7 with dropout: 1964/2500 [25:01<06:49, 1.31it/s, avr_loss=0.0993]

Can you try to enable "bypass_mode"?
--network_args "bypass_mode=True"

@whythisusername
Copy link

@KohakuBlueleaf
Almost on par with old performance now

  • 1983/2500 [27:16<07:06, 1.21it/s, avr_loss=0.124]

@KohakuBlueleaf
Copy link
Owner

@KohakuBlueleaf Almost on par with old performance now

  • 1983/2500 [27:16<07:06, 1.21it/s, avr_loss=0.124]

Ok I think the problem is solved
The reconstruction mode for LoCon is not fast in bp.
Just enable bypass_mode if you think the speed is slower than expectation (note: LoHa and LoKr with bypass mode will be slower than default)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants