Lycoris extreme slow training after update to latest version #167

killerciao · 2024-03-29T10:29:33Z

After updating to the latest release Lycoris/LoCon (the only one i tested) training in kohya_ss is super slow. With my 4090 with the same setting loaded, in the previous version was running at about 1it/s for SDXL training, now 4-10s/it.
Nothing has changed in the training data.
Uninstalling 2.2post3 and installing 2.1.0post2 fixes the problem

killerciao · 2024-03-29T10:32:35Z

V2.2post3

V2.1.0post2

KohakuBlueleaf · 2024-03-29T10:35:54Z

please provide full configuration.

killerciao · 2024-03-29T10:36:37Z

These are the setting used in both Versions

{
"LoRA_type": "LyCORIS/LoCon",
"LyCORIS_preset": "full",
"adaptive_noise_scale": 0,
"additional_parameters": "--max_grad_norm=0",
"block_alphas": "",
"block_dims": "",
"block_lr_zero_threshold": "",
"bucket_no_upscale": true,
"bucket_reso_steps": 64,
"bypass_mode": false,
"cache_latents": true,
"cache_latents_to_disk": true,
"caption_dropout_every_n_epochs": 0.0,
"caption_dropout_rate": 0,
"caption_extension": ".txt",
"clip_skip": "1",
"color_aug": false,
"constrain": 0.0,
"conv_alpha": 16,
"conv_block_alphas": "",
"conv_block_dims": "",
"conv_dim": 32,
"dataset_config": "",
"debiased_estimation_loss": false,
"decompose_both": false,
"dim_from_weights": false,
"dora_wd": false,
"down_lr_weight": "",
"enable_bucket": true,
"epoch": 6,
"factor": -1,
"flip_aug": false,
"fp8_base": false,
"full_bf16": false,
"full_fp16": false,
"gpu_ids": "",
"gradient_accumulation_steps": 1,
"gradient_checkpointing": false,
"keep_tokens": "0",
"learning_rate": 1.0,
"log_tracker_config": "",
"log_tracker_name": "",
"logging_dir": "H:/t00nstyle/log",
"lora_network_weights": "",
"lr_scheduler": "cosine",
"lr_scheduler_args": "",
"lr_scheduler_num_cycles": "",
"lr_scheduler_power": "",
"lr_warmup": 0,
"max_bucket_reso": 2048,
"max_data_loader_n_workers": "0",
"max_grad_norm": 1,
"max_resolution": "1024,1024",
"max_timestep": 1000,
"max_token_length": "225",
"max_train_epochs": "",
"max_train_steps": "",
"mem_eff_attn": false,
"mid_lr_weight": "",
"min_bucket_reso": 256,
"min_snr_gamma": 0,
"min_timestep": 0,
"mixed_precision": "bf16",
"model_list": "custom",
"module_dropout": 0,
"multi_gpu": false,
"multires_noise_discount": 0.3,
"multires_noise_iterations": 6,
"network_alpha": 16,
"network_dim": 32,
"network_dropout": 0,
"noise_offset": 0.0357,
"noise_offset_type": "Multires",
"num_cpu_threads_per_process": 2,
"num_machines": 1,
"num_processes": 1,
"optimizer": "Prodigy",
"optimizer_args": "decouple=True weight_decay=0.5 betas=0.9,0.99 use_bias_correction=False",
"output_dir": "H:/t00nstyle/model",
"output_name": "t00nstylev1PonySDXL",
"persistent_data_loader_workers": false,
"pretrained_model_name_or_path": "F:/IA FILES/Models/Stablediffusion/ponyDiffusionV6XL_v6StartWithThisOne.safetensors",
"prior_loss_weight": 1.0,
"random_crop": false,
"rank_dropout": 0,
"rank_dropout_scale": false,
"reg_data_dir": "",
"rescaled": false,
"resume": "",
"sample_every_n_epochs": 0,
"sample_every_n_steps": 250,
"sample_prompts": "score_9, score_8_up, score_7_up,flashing tits, nipples, looking at viewer, tongue out, wink, in pool, bikini, t00nstyle --n low quality, worst quality, bad anatomy,bad composition, poor, low effort --w 1024 --h 1024 --d 1 --l 7 --s 28",
"sample_sampler": "euler_a",
"save_every_n_epochs": 1,
"save_every_n_steps": 0,
"save_last_n_steps": 0,
"save_last_n_steps_state": 0,
"save_model_as": "safetensors",
"save_precision": "bf16",
"save_state": false,
"scale_v_pred_loss_like_noise_pred": false,
"scale_weight_norms": 1,
"sdxl": true,
"sdxl_cache_text_encoder_outputs": false,
"sdxl_no_half_vae": true,
"seed": "12345",
"shuffle_caption": true,
"stop_text_encoder_training_pct": 0,
"text_encoder_lr": 1.0,
"train_batch_size": 1,
"train_data_dir": "H:/t00nstyle/img",
"train_norm": false,
"train_on_input": false,
"training_comment": "",
"unet_lr": 1.0,
"unit": 1,
"up_lr_weight": "",
"use_cp": true,
"use_scalar": false,
"use_tucker": false,
"use_wandb": false,
"v2": false,
"v_parameterization": false,
"v_pred_like_loss": 0,
"vae": "",
"vae_batch_size": 0,
"wandb_api_key": "",
"wandb_run_name": "",
"weighted_captions": false,
"xformers": "xformers"
}

KohakuBlueleaf · 2024-03-31T02:55:33Z

What's your hardware

killerciao · 2024-03-31T06:30:06Z

What's your hardware

Win 11, Nvidia 4090, 64gb ddr5 ram, i9 13900k, running from nvme 4th gen

KohakuBlueleaf · 2024-04-05T15:57:03Z

I do some test and cannot reproduce this error...
But I'm using plain kohya-ss/sd-scripts
Will try to use the GUI to test.

I'm using almost same hardware (64GB -> 128GB ram) as yours so it should be ok

whythisusername · 2024-04-08T08:17:59Z

@KohakuBlueleaf
Since someone already reported this issue, can confirm something changed specifically after 2.2.0.dev7 update (tested prev and after versions) that drops the speed from ~1.5 it/s to ~1.5s/it with 4090 training using this config, its for easy-scripts

KohakuBlueleaf · 2024-04-10T01:00:27Z

@whythisusername Can you try drouput=0 (all kind of dropout)?
Maybe something with dropout goes wrong

whythisusername · 2024-04-11T00:44:48Z

@KohakuBlueleaf
Yeah, here is some speed measurements between two versions, dropout significantly affects the speed with the latest, still a little slower than old version without it though

2.2.0.dev7 with dropout: 1321/2500 [17:22<15:30, 1.27it/s, avr_loss=0.107]
2.2.0.dev7 no dropout: 587/2500 [07:32<24:34, 1.30it/s, avr_loss=0.105]
2.3.0.dev6 with dropout: 533/2500 [10:59<40:33, 1.24s/it, avr_loss=0.115]
2.3.0.dev6 no dropout: 721/2500 [10:31<25:57, 1.14it/s, avr_loss=0.0954]

KohakuBlueleaf · 2024-05-15T02:03:48Z

@whythisusername does 2.3.0.dev10 still so slow?

whythisusername · 2024-05-15T09:38:36Z

@KohakuBlueleaf
yes, haven't significantly changed

2.3.0.dev10 with dropout: 1021/2500 [20:14<29:18, 1.19s/it, avr_loss=0.102]
2.3.0.dev10 no dropout: 1174/2500 [17:35<19:52, 1.11it/s, avr_loss=0.0961]

KohakuBlueleaf · 2024-05-25T09:31:51Z

@whythisusername @killerciao Can you try 3.0.0.dev4?
I totally reconstruct the whole library structure.
Also avoid some redundant operation. I wonder if it will be better now

killerciao · 2024-05-25T10:18:03Z

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

4s/it 3.0.0dev4
1.25s/it 2.1.0post2

🤷‍♂️

KohakuBlueleaf · 2024-05-25T10:24:57Z

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

4s/it 3.0.0dev4

1.25s/it 2.1.0post2

🤷‍♂️

umm ok

I think it is due to some dtype things

KohakuBlueleaf · 2024-05-25T10:25:37Z

@whythisusername @killerciao Can you try 3.0.0.dev4? I totally reconstruct the whole library structure. Also avoid some redundant operation. I wonder if it will be better now

Tested with latest kohya_ss gui:

4s/it 3.0.0dev4

1.25s/it 2.1.0post2

🤷‍♂️

can you try other algorithm?
LoKr in my env have same speed across different version

killerciao · 2024-05-25T10:34:22Z

LoKr:
1.3s/it 3.0.0dev4
Crash/ 2.1.0post2
Traceback (most recent call last): File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "H:\kohya_ss\sd-scripts\train_network.py", line 864, in train noise_pred = self.call_unet( File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "H:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module x = layer(x, context) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 673, in forward output = self.forward_body(hidden_states, context, timestep) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 655, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 599, in forward hidden_states = module(hidden_states) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 342, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/3588 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module> File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['H:\\kohya_ss\\venv\\Scripts\\python.exe', 'H:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'H:/alariko/model/config_lora-20240525-123304.toml', '--max_grad_norm=0']' returned non-zero exit status 1.

KohakuBlueleaf · 2024-05-25T10:41:45Z

LoKr: 1.3s/it 3.0.0dev4 Crash/ 2.1.0post2 Traceback (most recent call last): File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module> trainer.train(args) File "H:\kohya_ss\sd-scripts\train_network.py", line 864, in train noise_pred = self.call_unet( File "H:\kohya_ss\sd-scripts\sdxl_train_network.py", line 164, in call_unet noise_pred = unet(noisy_latents, timesteps, text_embedding, vector_embedding) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 680, in forward return model_forward(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\accelerate\utils\operations.py", line 668, in __call__ return convert_to_fp32(self.model_forward(*args, **kwargs)) File "H:\kohya_ss\venv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1104, in forward h = call_module(module, h, emb, context) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 1095, in call_module x = layer(x, context) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 750, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 673, in forward output = self.forward_body(hidden_states, context, timestep) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 655, in forward_body hidden_states = self.ff(self.norm3(hidden_states)) + hidden_states File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 599, in forward hidden_states = module(hidden_states) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\sd-scripts\library\sdxl_original_unet.py", line 577, in forward hidden_states, gate = self.proj(hidden_states).chunk(2, dim=-1) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "H:\kohya_ss\venv\lib\site-packages\lycoris\modules\lokr.py", line 342, in forward self.org_module[0].weight.data.to(x.device, dtype=self.lokr_w1.dtype) File "H:\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 1695, in __getattr__ raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") AttributeError: 'LokrModule' object has no attribute 'lokr_w1'. Did you mean: 'lokr_w1_a'? steps: 0%| | 0/3588 [00:00<?, ?it/s] Traceback (most recent call last): File "C:\Program Files\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Program Files\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "H:\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module> File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command simple_launcher(args) File "H:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['H:\\kohya_ss\\venv\\Scripts\\python.exe', 'H:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'H:/alariko/model/config_lora-20240525-123304.toml', '--max_grad_norm=0']' returned non-zero exit status 1.

Thx, looks like locon have some bugs
lokr and loha may be fine

whythisusername · 2024-05-27T01:40:42Z

@KohakuBlueleaf
It's way more better now with my config, but still a little slower than it was with 2.2.0.dev7

3.0.0.dev4 with dropout: 1224/2500 [17:51<18:36, 1.14it/s, avr_loss=0.0945]
3.0.0.dev4 no dropout: 1246/2500 [17:51<17:58, 1.16it/s, avr_loss=0.0945]
2.2.0.dev7 with dropout: 1964/2500 [25:01<06:49, 1.31it/s, avr_loss=0.0993]

KohakuBlueleaf · 2024-05-27T01:43:45Z

@KohakuBlueleaf It's way more better now with my config, but still a little slower than it was with 2.2.0.dev7

3.0.0.dev4 with dropout: 1224/2500 [17:51<18:36, 1.14it/s, avr_loss=0.0945]

3.0.0.dev4 no dropout: 1246/2500 [17:51<17:58, 1.16it/s, avr_loss=0.0945]

2.2.0.dev7 with dropout: 1964/2500 [25:01<06:49, 1.31it/s, avr_loss=0.0993]

Can you try to enable "bypass_mode"?
--network_args "bypass_mode=True"

whythisusername · 2024-05-27T14:03:12Z

@KohakuBlueleaf
Almost on par with old performance now

1983/2500 [27:16<07:06, 1.21it/s, avr_loss=0.124]

KohakuBlueleaf · 2024-05-31T10:45:43Z

@KohakuBlueleaf Almost on par with old performance now

1983/2500 [27:16<07:06, 1.21it/s, avr_loss=0.124]

Ok I think the problem is solved
The reconstruction mode for LoCon is not fast in bp.
Just enable bypass_mode if you think the speed is slower than expectation (note: LoHa and LoKr with bypass mode will be slower than default)

KohakuBlueleaf closed this as completed May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lycoris extreme slow training after update to latest version #167

Lycoris extreme slow training after update to latest version #167

killerciao commented Mar 29, 2024

killerciao commented Mar 29, 2024 •

edited

Loading

KohakuBlueleaf commented Mar 29, 2024

killerciao commented Mar 29, 2024 •

edited

Loading

KohakuBlueleaf commented Mar 31, 2024

killerciao commented Mar 31, 2024 •

edited

Loading

KohakuBlueleaf commented Apr 5, 2024

whythisusername commented Apr 8, 2024

KohakuBlueleaf commented Apr 10, 2024

whythisusername commented Apr 11, 2024

KohakuBlueleaf commented May 15, 2024

whythisusername commented May 15, 2024

KohakuBlueleaf commented May 25, 2024

killerciao commented May 25, 2024

KohakuBlueleaf commented May 25, 2024

KohakuBlueleaf commented May 25, 2024

killerciao commented May 25, 2024

KohakuBlueleaf commented May 25, 2024

whythisusername commented May 27, 2024

KohakuBlueleaf commented May 27, 2024

whythisusername commented May 27, 2024

KohakuBlueleaf commented May 31, 2024

Lycoris extreme slow training after update to latest version #167

Lycoris extreme slow training after update to latest version #167

Comments

killerciao commented Mar 29, 2024

killerciao commented Mar 29, 2024 • edited Loading

KohakuBlueleaf commented Mar 29, 2024

killerciao commented Mar 29, 2024 • edited Loading

KohakuBlueleaf commented Mar 31, 2024

killerciao commented Mar 31, 2024 • edited Loading

KohakuBlueleaf commented Apr 5, 2024

whythisusername commented Apr 8, 2024

KohakuBlueleaf commented Apr 10, 2024

whythisusername commented Apr 11, 2024

KohakuBlueleaf commented May 15, 2024

whythisusername commented May 15, 2024

KohakuBlueleaf commented May 25, 2024

killerciao commented May 25, 2024

KohakuBlueleaf commented May 25, 2024

KohakuBlueleaf commented May 25, 2024

killerciao commented May 25, 2024

KohakuBlueleaf commented May 25, 2024

whythisusername commented May 27, 2024

KohakuBlueleaf commented May 27, 2024

whythisusername commented May 27, 2024

KohakuBlueleaf commented May 31, 2024

killerciao commented Mar 29, 2024 •

edited

Loading

killerciao commented Mar 29, 2024 •

edited

Loading

killerciao commented Mar 31, 2024 •

edited

Loading