Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Error while trying to create a Lora based on realismEngineSDXL_v30VAE #331

Open
fuzzballb opened this issue Jan 14, 2024 · 0 comments
Open

Comments

@fuzzballb
Copy link

fuzzballb commented Jan 14, 2024

I am trying to create a SDXL Lora based on an existing model (realismEngineSDXL_v30VAE), but this fails at the training step. Below are the config and the training output. Are SDXL models supported?

The model was successfully downloaded

The error

RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight",
"down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight",

Config

[model_arguments]
v2 = true
v_parameterization = true
pretrained_model_name_or_path = "/content/pretrained_model/realismEngineSDXL_v30VAE.safetensors"

[additional_network_arguments]
no_metadata = false
unet_lr = 0.0001
text_encoder_lr = 5e-5
network_module = "networks.lora"
network_dim = 32
network_alpha = 16
network_train_unet_only = false
network_train_text_encoder_only = false

[optimizer_arguments]
optimizer_type = "AdamW8bit"
learning_rate = 0.0001
max_grad_norm = 1.0
lr_scheduler = "constant"
lr_warmup_steps = 0

[dataset_arguments]
cache_latents = true
debug_dataset = false
vae_batch_size = 1

[training_arguments]
output_dir = "/content/LoRA/output"
output_name = "Chantal"
save_precision = "fp16"
save_every_n_epochs = 2
train_batch_size = 1
max_token_length = 225
mem_eff_attn = false
xformers = true
max_train_epochs = 10
max_data_loader_n_workers = 8
persistent_data_loader_workers = true
gradient_checkpointing = false
gradient_accumulation_steps = 1
mixed_precision = "fp16"
logging_dir = "/content/LoRA/logs"
log_prefix = "Chantal"
lowram = false

[sample_prompt_arguments]
sample_every_n_epochs = 1
sample_sampler = "dpmsolver++"

[dreambooth_arguments]
prior_loss_weight = 1.0

[saving_arguments]
save_model_as = "safetensors"

Training

CUDA backend failed to initialize: Found CUDA version 12010, but JAX was built against version 12020, which is newer. The copy of CUDA that is installed must be at least as new as the version against which JAX was built. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
Loading settings from /content/LoRA/config/config_file.toml...
/content/LoRA/config/config_file
prepare tokenizer
update token length: 225
Load dataset config from /content/LoRA/config/dataset_config.toml
prepare images.
found directory /content/LoRA/train_data contains 26 image files
260 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: False

[Subset 0 of Dataset 0]
image_dir: "/content/LoRA/train_data"
image_count: 26
num_repeats: 10
shuffle_caption: True
keep_tokens: 0
caption_dropout_rate: 0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: mksks
caption_extension: .txt

[Dataset 0]
loading image sizes.
100% 26/26 [00:00<00:00, 3964.51it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 260
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
loading model for process 0/1
load StableDiffusion checkpoint
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /content/kohya-trainer/train_network.py:752 in │
│ │
│ 749 │ args = parser.parse_args() │
│ 750 │ args = train_util.read_config_from_file(args, parser) │
│ 751 │ │
│ ❱ 752 │ train(args) │
│ 753 │
│ │
│ /content/kohya-trainer/train_network.py:152 in train │
│ │
│ 149 │ │ if pi == accelerator.state.local_process_index: │
│ 150 │ │ │ print(f"loading model for process {accelerator.state.local_process_index}/{a │
│ 151 │ │ │ │
│ ❱ 152 │ │ │ text_encoder, vae, unet, _ = train_util.load_target_model( │
│ 153 │ │ │ │ args, weight_dtype, accelerator.device if args.lowram else "cpu" │
│ 154 │ │ │ ) │
│ 155 │
│ │
│ /content/kohya-trainer/library/train_util.py:2739 in load_target_model │
│ │
│ 2736 │ load_stable_diffusion_format = os.path.isfile(name_or_path) # determine SD or Diffu │
│ 2737 │ if load_stable_diffusion_format: │
│ 2738 │ │ print("load StableDiffusion checkpoint") │
│ ❱ 2739 │ │ text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoin │
│ 2740 │ else: │
│ 2741 │ │ # Diffusers model is loaded to CPU │
│ 2742 │ │ print("load Diffusers pretrained models") │
│ │
│ /content/kohya-trainer/library/model_util.py:857 in load_models_from_stable_diffusion_checkpoint │
│ │
│ 854 │ converted_unet_checkpoint = convert_ldm_unet_checkpoint(v2, state_dict, unet_config) │
│ 855 │ │
│ 856 │ unet = UNet2DConditionModel(**unet_config).to(device) │
│ ❱ 857 │ info = unet.load_state_dict(converted_unet_checkpoint) │
│ 858 │ print("loading u-net:", info) │
│ 859 │ │
│ 860 │ # Convert the VAE model. │
│ │
│ /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:2152 in load_state_dict │
│ │
│ 2149 │ │ │ │ │ │ ', '.join(f'"{k}"' for k in missing_keys))) │
│ 2150 │ │ │
│ 2151 │ │ if len(error_msgs) > 0: │
│ ❱ 2152 │ │ │ raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( │
│ 2153 │ │ │ │ │ │ │ self.class.name, "\n\t".join(error_msgs))) │
│ 2154 │ │ return _IncompatibleKeys(missing_keys, unexpected_keys) │
│ 2155 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel:
Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight",
"down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight",
"down_blocks.0.attentions.0.proj_in.bias",

...

"mid_block.attentions.0.transformer_blocks.9.norm3.weight".
size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying
a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is
torch.Size([640, 1024]).

...

    size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a 

param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is
torch.Size([1280, 1024]).
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /usr/local/bin/accelerate:8 in │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:1104 in launch_command │
│ │
│ 1101 │ elif defaults is not None and defaults.compute_environment == ComputeEnvironment.AMA │
│ 1102 │ │ sagemaker_launcher(defaults, args) │
│ 1103 │ else: │
│ ❱ 1104 │ │ simple_launcher(args) │
│ 1105 │
│ 1106 │
│ 1107 def main(): │
│ │
│ /usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py:567 in simple_launcher │
│ │
│ 564 │ process = subprocess.Popen(cmd, env=current_env) │
│ 565 │ process.wait() │
│ 566 │ if process.returncode != 0: │
│ ❱ 567 │ │ raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) │
│ 568 │
│ 569 │
│ 570 def multi_gpu_launcher(args): │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/usr/bin/python3', 'train_network.py',
'--sample_prompts=/content/LoRA/config/sample_prompt.txt',
'--dataset_config=/content/LoRA/config/dataset_config.toml',
'--config_file=/content/LoRA/config/config_file.toml']' returned non-zero exit status 1.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant