Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not use gpu calcuation with dora #160

Closed
SlZeroth opened this issue Mar 11, 2024 · 16 comments
Closed

not use gpu calcuation with dora #160

SlZeroth opened this issue Mar 11, 2024 · 16 comments

Comments

@SlZeroth
Copy link
Contributor

--network_module="lycoris.kohya",
--network_args "algo=lora" "dora_wd=True""

Hello. I used Lycoris for train dora. but It is really slow when train dora and I got warning in this code

def apply_weight_decompose(self, weight):
    return (
        weight / weight.mean(dim=self.dora_mean_dim, keepdim=True) * self.dora_scale
    )

I think apply_weight_decompose use CPU calucation.

how can i fix it?

@KohakuBlueleaf
Copy link
Owner

I can ensure this is running on GPU.
Give me more context or how to reproduce the problem

@SlZeroth
Copy link
Contributor Author

│ C:\Users\te\anaconda3\envs\kohya_ss\lib\site-packages\lycoris_lora-2.2.0.dev4-py3.10.egg\lyco │
│ ris\modules\locon.py:183 in apply_weight_decompose │
│ │
│ 180 │ │
│ 181 │ def apply_weight_decompose(self, weight): │
│ 182 │ │ return ( │
│ ❱ 183 │ │ │ weight / weight.mean(dim=self.dora_mean_dim, keepdim=True) * self.dora_scale │
│ 184 │ │ ) │
│ 185 │ │
│ 186 │ def custom_state_dict(self): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

===

I got this error when I tried this command

accelerate launch --num_cpu_threads_per_process=2 C:/Users/te/ML/kohya_ss/sdxl_train_network.py --pretrained_model_name_or_path="E:\diffusion-models\Stable-diffusion\mixgirl.safetensors" --train_data_dir="C:\Users\te\ML\diffusion-benchmark\temp\pl9ia1o0" --resolution=1024,1024 --output_dir="E:/diffusion-models/Lora\test16" --logging_dir="C:/Users/te/ML/kohya_ss/logs" --network_alpha=1 --save_model_as=safetensors --network_module=lycoris.kohya --network_args "algo=lora" "dora_wd=True" --text_encoder_lr=0.0004 --unet_lr=0.0004 --network_dim=64 --output_name=woman_young_64rank --lr_scheduler_num_cycles=8 --no_half_vae --learning_rate=0.0004 --lr_scheduler=constant --train_batch_size=1 --max_train_steps=2000 --save_every_n_epochs=1 --mixed_precision=fp16 --save_precision=fp16 --optimizer_type=Adafactor --optimizer_args scale_parameter=False relative_step=False warmup_init=False --max_data_loader_n_workers=0 --bucket_reso_steps=64 --seed=1234 --gradient_checkpointing --full_fp16 --xformers --bucket_no_upscale --noise_offset=0.0 --lowram --cache_latents --cache_latents_to_disk

this command is working when remove the "dora_wd=True" option

@KohakuBlueleaf
Copy link
Owner

I guess this is caused by --lowvram.

@SlZeroth
Copy link
Contributor Author

SlZeroth commented Mar 11, 2024

I got same issue trying with --lowvram

image

print result is like this for debuging

DEVICE cpu
DEVICE1.5 cuda:0
DEVICE2 cpu

I think self.make_weight(x.device).device is changed to gpu when .to(self.scalar)

@SlZeroth
Copy link
Contributor Author

I changed the code

    if use_scalar:
        self.scalar = nn.Parameter(torch.tensor(0.0))
    else:
        self.scalar = torch.tensor(1.0, device=torch.device('cuda'))

this is working for me

@KohakuBlueleaf
Copy link
Owner

This is not how it works.
I mean you should try disable --lowvram

@SlZeroth
Copy link
Contributor Author

I tried with --lowvram but I got same issue

@KohakuBlueleaf
Copy link
Owner

I tried with --lowvram but I got same issue

No
I mean
You should REMOVE lowvram

@SlZeroth
Copy link
Contributor Author

I apologize for the typo. I proceeded without using the --lowvram option, but encountered an issue where the device type does not match. If I add "dora_wd=True" like --network_module="lycoris.kohya", --network_args "algo=lora" "dora_wd=True", an error occurs. However, if I remove "dora_wd=True", the error does not occur.

@KohakuBlueleaf
Copy link
Owner

@SlZeroth Should be solved in 2.2.0.dev7

@KohakuBlueleaf
Copy link
Owner

Will close this issue on the weekend if no reply.
LyCORIS 2.2.0 should have correct implementation

@SlZeroth
Copy link
Contributor Author

SlZeroth commented Mar 15, 2024

This issue was resolved in version 2.2.0.dev7, but the same error has reoccurred in versions released after 2.2.0.dev8.

@KohakuBlueleaf
Copy link
Owner

This issue was resolved in version 2.2.0.dev7, but the same error has reoccurred in versions released after 2.2.0.dev8.

Thx for info

@SlZeroth
Copy link
Contributor Author

@KohakuBlueleaf thank you for check!

@SlZeroth
Copy link
Contributor Author

@KohakuBlueleaf thank you I checked latest update version ! it works fine

@avan06
Copy link

avan06 commented May 16, 2024

Hi KohakuBlueleaf,

The recent changes to the apply_weight_decompose function might have caused the [Expected all tensors to be on the same device] issue to resurface. The dora_scale in the return value of this function might be running on the CPU.

def apply_weight_decompose(self, weight):
    weight_norm = (
        weight.transpose(0, 1)
        .reshape(weight.shape[1], -1)
        .norm(dim=1, keepdim=True)
        .reshape(weight.shape[1], *[1] * self.dora_norm_dims)
        .transpose(0, 1)
    )
    return weight * (self.dora_scale / weight_norm)

After making some test adjustments to the "lycoris/modules/locon.py" file, I was able to run it successfully, but I'm not sure if these changes are entirely correct.

    def make_weight(self, device=None):
        ...
        ...
        ...
        if self.wd and self.dora_scale.device != weight.device:
            #print("self.dora_scale.device:", self.dora_scale.device) => self.dora_scale.device: cpu
            #print("weight.device:", weight.device) => weight.device: cuda:0
            self.dora_scale = self.dora_scale.to(weight.device)

        return weight * self.scalar.to(device)

Could you please help verify this? Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants