Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't train a lora for SDXL from a fresh install. #2396

Closed
Deejay85 opened this issue Apr 26, 2024 · 22 comments
Closed

Can't train a lora for SDXL from a fresh install. #2396

Deejay85 opened this issue Apr 26, 2024 · 22 comments

Comments

@Deejay85
Copy link

I tried installing the newest version of Kohya, but when I try to train something, I receive the following error message. I've not idea if it's something on my end, or if it's a file that needs to be fixed.

11:53:04-806919 INFO     Kohya_ss GUI version: v24.0.7
fatal: not a git repository (or any of the parent directories): .git
11:53:05-051045 ERROR    Error during Git operation: Command '['git', 'submodule', 'update', '--init', '--recursive',
                         '--quiet']' returned non-zero exit status 128.
11:53:05-053975 INFO     nVidia toolkit detected
11:53:06-264833 INFO     Torch 2.1.2+cu118
11:53:06-282409 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8905
11:53:06-284362 INFO     Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24564 Arch (8, 9) Cores 128
11:53:06-288269 INFO     Python version is 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit
                         (AMD64)]
11:53:06-290221 INFO     Verifying modules installation status from requirements_pytorch_windows.txt...
11:53:06-292175 INFO     Verifying modules installation status from requirements_windows.txt...
11:53:06-294127 INFO     Verifying modules installation status from requirements.txt...
11:53:11-983209 INFO     headless: False
11:53:12-019340 INFO     Using shell=True when running external commands...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Exception in thread Thread-5 (_do_normal_analytics_request):
Traceback (most recent call last):
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_transports\default.py", line 69, in map_httpcore_exceptions
    yield
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_transports\default.py", line 233, in handle_request
    resp = self._pool.handle_request(req)
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\connection_pool.py", line 216, in handle_request
    raise exc from None
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\connection_pool.py", line 196, in handle_request
    response = connection.handle_request(
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\connection.py", line 101, in handle_request
    return self._connection.handle_request(request)
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\http11.py", line 143, in handle_request
    raise exc
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\http11.py", line 95, in handle_request
    self._send_request_body(**kwargs)
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\http11.py", line 166, in _send_request_body
    self._send_event(event, timeout=timeout)
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_sync\http11.py", line 175, in _send_event
    self._network_stream.write(bytes_to_send, timeout=timeout)
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_backends\sync.py", line 133, in write
    with map_exceptions(exc_map):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "M:\kohya_ss\venv\lib\site-packages\httpcore\_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.WriteTimeout: The write operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "M:\kohya_ss\venv\lib\site-packages\gradio\analytics.py", line 63, in _do_normal_analytics_request
    httpx.post(url, data=data, timeout=5)
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_api.py", line 319, in post
    return request(
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_api.py", line 106, in request
    return client.request(
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_client.py", line 827, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_client.py", line 914, in send
    response = self._send_handling_auth(
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_client.py", line 1015, in _send_single_request
    response = transport.handle_request(request)
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_transports\default.py", line 232, in handle_request
    with map_httpcore_exceptions():
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "M:\kohya_ss\venv\lib\site-packages\httpx\_transports\default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
11:53:23-044989 INFO     Loading config...
11:53:25-111259 INFO     Start training Dreambooth...
11:53:25-112237 INFO     Validating lr scheduler arguments...
11:53:25-114189 INFO     Validating optimizer arguments...
11:53:25-115167 INFO     Validating model file or folder path
                         M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safetensors existence...
11:53:25-117119 INFO     ...valid
11:53:25-120049 INFO     Validating output_dir path M:/kohya_ss/Sampleimages/model existence...
11:53:25-126885 INFO     ...valid
11:53:25-127861 INFO     Validating train_data_dir path M:/kohya_ss/Sampleimages/Images existence...
11:53:25-128837 INFO     ...valid
11:53:25-129815 INFO     reg_data_dir not specified, skipping validation
11:53:25-130790 INFO     Validating logging_dir path M:/kohya_ss/Sampleimages/log existence...
11:53:25-131768 INFO     ...valid
11:53:25-135673 INFO     log_tracker_config not specified, skipping validation
11:53:25-136652 INFO     resume not specified, skipping validation
11:53:25-137627 INFO     vae not specified, skipping validation
11:53:25-138605 INFO     dataset_config not specified, skipping validation
11:53:25-139580 INFO     Folder 4_giganticbreasts: 4 repeats found
11:53:25-141533 INFO     Folder 4_giganticbreasts: 115 images found
11:53:25-142509 INFO     Folder 4_giganticbreasts: 115 * 4 = 460 steps
11:53:25-142509 INFO     Regulatization factor: 1
11:53:25-144460 INFO     Total steps: 460
11:53:25-145438 INFO     Train batch size: 1
11:53:25-146414 INFO     Gradient accumulation steps: 1
11:53:25-147390 INFO     Epoch: 40
11:53:25-149344 INFO     Max train steps: 1600
11:53:25-150320 INFO     lr_warmup_steps = 160
11:53:25-152273 INFO     Saving training config to
                         M:/kohya_ss/Sampleimages/model\giganticbreasts_20240426-115325.json...
11:53:25-153249 INFO     Executing command: "M:\kohya_ss\venv\Scripts\accelerate.EXE" launch --dynamo_backend no
                         --dynamo_mode default --gpu_ids 10de268488e21043 --mixed_precision bf16 --num_processes 1
                         --num_machines 1 --num_cpu_threads_per_process 2 "M:/kohya_ss/sd-scripts/sdxl_train.py"
                         --config_file "./outputs/tmpfiledbooth.toml" with shell=True
11:53:25-159108 INFO     Command executed.
2024-04-26 11:53:32 INFO     Loading settings from ./outputs/tmpfiledbooth.toml...                    train_util.py:3744
                    INFO     ./outputs/tmpfiledbooth                                                  train_util.py:3763
                    WARNING  clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません   sdxl_train_util.py:343
2024-04-26 11:53:32 INFO     prepare tokenizers                                                   sdxl_train_util.py:134
                    INFO     update token length: 75                                              sdxl_train_util.py:159
                    INFO     Using DreamBooth method.                                                  sdxl_train.py:144
2024-04-26 11:53:33 INFO     prepare images.                                                          train_util.py:1572
                    INFO     found directory M:\kohya_ss\Sampleimages\Images\4_giganticbreasts        train_util.py:1519
                             contains 115 image files
                    INFO     460 train images with repeating.                                         train_util.py:1613
                    INFO     0 reg images.                                                            train_util.py:1616
                    WARNING  no regularization images / 正則化画像が見つかりませんでした              train_util.py:1621
                    INFO     [Dataset 0]                                                              config_util.py:565
                               batch_size: 1
                               resolution: (1024, 1024)
                               enable_bucket: True
                               network_multiplier: 1.0
                               min_bucket_reso: 64
                               max_bucket_reso: 2048
                               bucket_reso_steps: 64
                               bucket_no_upscale: True

                               [Subset 0 of Dataset 0]
                                 image_dir: "M:\kohya_ss\Sampleimages\Images\4_giganticbreasts"
                                 image_count: 115
                                 num_repeats: 4
                                 shuffle_caption: True
                                 keep_tokens: 1
                                 keep_tokens_separator:
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 is_reg: False
                                 class_tokens: giganticbreasts
                                 caption_extension: .txt


                    INFO     [Dataset 0]                                                              config_util.py:571
                    INFO     loading image sizes.                                                      train_util.py:853
100%|█████████████████████████████████████████████████████████████████████████████| 115/115 [00:00<00:00, 39253.33it/s]
                    INFO     make buckets                                                              train_util.py:859
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is   train_util.py:876
                             set, because bucket reso is defined by image size automatically /
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
                             算されるため、min_bucket_resoとmax_bucket_resoは無視されます
                    INFO     number of images (including repeats) /                                    train_util.py:905
                             各bucketの画像枚数(繰り返し回数を含む)
                    INFO     bucket 0: resolution (576, 832), count: 4                                 train_util.py:910
                    INFO     bucket 1: resolution (576, 960), count: 4                                 train_util.py:910
                    INFO     bucket 2: resolution (640, 640), count: 4                                 train_util.py:910
                    INFO     bucket 3: resolution (704, 960), count: 4                                 train_util.py:910
                    INFO     bucket 4: resolution (704, 1280), count: 4                                train_util.py:910
                    INFO     bucket 5: resolution (704, 1344), count: 4                                train_util.py:910
                    INFO     bucket 6: resolution (704, 1408), count: 4                                train_util.py:910
                    INFO     bucket 7: resolution (768, 704), count: 4                                 train_util.py:910
                    INFO     bucket 8: resolution (768, 1152), count: 12                               train_util.py:910
                    INFO     bucket 9: resolution (768, 1216), count: 4                                train_util.py:910
                    INFO     bucket 10: resolution (768, 1344), count: 4                               train_util.py:910
                    INFO     bucket 11: resolution (832, 768), count: 4                                train_util.py:910
                    INFO     bucket 12: resolution (832, 896), count: 4                                train_util.py:910
                    INFO     bucket 13: resolution (832, 1024), count: 4                               train_util.py:910
                    INFO     bucket 14: resolution (832, 1088), count: 36                              train_util.py:910
                    INFO     bucket 15: resolution (832, 1152), count: 68                              train_util.py:910
                    INFO     bucket 16: resolution (832, 1216), count: 44                              train_util.py:910
                    INFO     bucket 17: resolution (896, 832), count: 4                                train_util.py:910
                    INFO     bucket 18: resolution (896, 1024), count: 16                              train_util.py:910
                    INFO     bucket 19: resolution (896, 1088), count: 40                              train_util.py:910
                    INFO     bucket 20: resolution (896, 1152), count: 40                              train_util.py:910
                    INFO     bucket 21: resolution (960, 960), count: 16                               train_util.py:910
                    INFO     bucket 22: resolution (960, 1024), count: 36                              train_util.py:910
                    INFO     bucket 23: resolution (1024, 896), count: 4                               train_util.py:910
                    INFO     bucket 24: resolution (1024, 960), count: 4                               train_util.py:910
                    INFO     bucket 25: resolution (1024, 1024), count: 36                             train_util.py:910
                    INFO     bucket 26: resolution (1088, 832), count: 8                               train_util.py:910
                    INFO     bucket 27: resolution (1088, 896), count: 4                               train_util.py:910
                    INFO     bucket 28: resolution (1152, 832), count: 8                               train_util.py:910
                    INFO     bucket 29: resolution (1152, 896), count: 8                               train_util.py:910
                    INFO     bucket 30: resolution (1216, 832), count: 12                              train_util.py:910
                    INFO     bucket 31: resolution (1280, 704), count: 4                               train_util.py:910
                    INFO     bucket 32: resolution (1344, 768), count: 8                               train_util.py:910
                    INFO     mean ar error (without repeats): 0.012568990147454271                     train_util.py:915
                    INFO     prepare accelerator                                                       sdxl_train.py:201
accelerator device: cpu
                    INFO     loading model for process 0/1                                         sdxl_train_util.py:30
                    INFO     load StableDiffusion checkpoint:                                      sdxl_train_util.py:70
                             M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safete
                             nsors
2024-04-26 11:53:34 INFO     building U-Net                                                       sdxl_model_util.py:192
                    INFO     loading U-Net from checkpoint                                        sdxl_model_util.py:196
2024-04-26 11:53:46 INFO     U-Net: <All keys matched successfully>                               sdxl_model_util.py:202
                    INFO     building text encoders                                               sdxl_model_util.py:205
                    INFO     loading text encoders from checkpoint                                sdxl_model_util.py:258
                    INFO     text encoder 1: <All keys matched successfully>                      sdxl_model_util.py:272
2024-04-26 11:53:49 INFO     text encoder 2: <All keys matched successfully>                      sdxl_model_util.py:276
                    INFO     building VAE                                                         sdxl_model_util.py:279
                    INFO     loading VAE from checkpoint                                          sdxl_model_util.py:284
2024-04-26 11:53:50 INFO     VAE: <All keys matched successfully>                                 sdxl_model_util.py:287
Disable Diffusers' xformers
                    INFO     Enable xformers for U-Net                                                train_util.py:2660
Traceback (most recent call last):
  File "M:\kohya_ss\sd-scripts\sdxl_train.py", line 818, in <module>
    train(args)
  File "M:\kohya_ss\sd-scripts\sdxl_train.py", line 258, in train
    vae.set_use_memory_efficient_attention_xformers(args.xformers)
  File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers
    raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "M:\kohya_ss\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['M:\\kohya_ss\\venv\\Scripts\\python.exe', 'M:/kohya_ss/sd-scripts/sdxl_train.py', '--config_file', './outputs/tmpfiledbooth.toml']' returned non-zero exit status 1.
11:53:53-325240 INFO     Training has ended.
@eija06
Copy link

eija06 commented Apr 26, 2024

looks like something wrong with your CUDA installation. installed requirements.txt ?

@bmaltais
Copy link
Owner

Did you install CUDA 11.8 as per the README instructions?

@Deejay85
Copy link
Author

I installed not only Kohya from the setup.bat, but also CuDDN, bitsandbytes, CUDA 11.8 (cuda_11.8.0_522.06_windows to be precise), the files required in the sd-script folder, and I updated PIP from venv and redid all of that...I'm not sure what else I need to do.

@machineminded
Copy link

I'm having a similar issue. I've been trying to get kohya to work for a few days, and I see a tangentially related error:

NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 15744, 1, 512) (torch.float32)
     key         : shape=(1, 15744, 1, 512) (torch.float32)
     value       : shape=(1, 15744, 1, 512) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0

Same as @Deejay85 - CUDA 11.8, CuCDDN... ran everything from setup.bat.

@Deejay85
Copy link
Author

I was hoping that, since someone else is having the same problem I am, that an answer would be forthcoming, but apparently not. I did try to install the new version that was released today, and even tried to reinstall all the packages for Kohya just in case, but that didn't fix it either...I seem to be getting the same messages as I did before. I really don't know if it is something on my end, or if it's something on Kohya's end, but I really would wish the dev would take a look at it, so that I at least know where to start in fixing the problem.

@Deejay85
Copy link
Author

Deejay85 commented May 9, 2024

Didn't want to make a new thread, so I decided to bump the old one I made two weeks ago. I tried copying the newest release into Kohya, but that didn't make any difference, so even after two releases, I'm still having the same problems I did before.

@machineminded
Copy link

machineminded commented May 9, 2024

I ended up uninstalling anything related to python, cuda, nvidia, and microsoft development (cpp redistributables), then reinstalled and it fixed all of my issues. Before I also had Cuda 11.8 and 12.x installed and I'm guessing something went stupid there. So I stuck with cuda 11.8 this time. But not really sure - basically uninstalling and reinstalling fixed everything.

@bmaltais
Copy link
Owner

bmaltais commented May 9, 2024

Yeah, so many thing. A break down within the software stack… This was the best thing to do. Glad it fixed thing for you.

@bmaltais bmaltais closed this as completed May 9, 2024
@Agnusse
Copy link

Agnusse commented May 11, 2024

I am having the exact same issue. Did anyone find a solution that does not involve reinstalling everything?

@Deejay85
Copy link
Author

I uninstalled everything as listed by Machineminded, and mine is still producing the same exact problems as before. Should I paste the entire log just to verify?

@Deejay85
Copy link
Author

Downloaded the newest version of Kohya, did a fresh install to a new directory, installed everything, and here are the results I got when I tried to train something:

07:34:51-795074 INFO     Kohya_ss GUI version: v24.1.4
fatal: not a git repository (or any of the parent directories): .git
07:34:52-077285 ERROR    Error during Git operation: Command '['git', 'submodule', 'update', '--init', '--recursive',
                         '--quiet']' returned non-zero exit status 128.
07:34:52-081194 INFO     nVidia toolkit detected
07:34:53-412725 INFO     Torch 2.1.2+cu118
07:34:53-437137 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8905
07:34:53-439117 INFO     Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24564 Arch (8, 9) Cores 128
07:34:53-444947 INFO     Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr  5 2023, 00:38:17) [MSC v.1929 64 bit
                         (AMD64)]
07:34:53-447878 INFO     Verifying modules installation status from requirements_pytorch_windows.txt...
07:34:53-450808 INFO     Verifying modules installation status from requirements_windows.txt...
07:34:53-451785 WARNING  Package wrong version: bitsandbytes 0.41.2.post2 required 0.43.0
07:34:53-453735 INFO     Installing package: bitsandbytes==0.43.0
07:34:58-070392 INFO     Verifying modules installation status from requirements.txt...
07:35:06-749071 INFO     headless: False
07:35:06-783250 INFO     Using shell=True when running external commands...
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.
--------
Exception in thread Thread-5 (_do_normal_analytics_request):
Traceback (most recent call last):
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 69, in map_httpcore_exceptions
    yield
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 233, in handle_request
    resp = self._pool.handle_request(req)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\connection_pool.py", line 216, in handle_request
    raise exc from None
  File "M:\z\venv\lib\site-packages\httpcore\_sync\connection_pool.py", line 196, in handle_request
    response = connection.handle_request(
  File "M:\z\venv\lib\site-packages\httpcore\_sync\connection.py", line 101, in handle_request
    return self._connection.handle_request(request)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 143, in handle_request
    raise exc
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 95, in handle_request
    self._send_request_body(**kwargs)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 166, in _send_request_body
    self._send_event(event, timeout=timeout)
  File "M:\z\venv\lib\site-packages\httpcore\_sync\http11.py", line 175, in _send_event
    self._network_stream.write(bytes_to_send, timeout=timeout)
  File "M:\z\venv\lib\site-packages\httpcore\_backends\sync.py", line 133, in write
    with map_exceptions(exc_map):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "M:\z\venv\lib\site-packages\httpcore\_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.WriteTimeout: The write operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "M:\z\venv\lib\site-packages\gradio\analytics.py", line 63, in _do_normal_analytics_request
    httpx.post(url, data=data, timeout=5)
  File "M:\z\venv\lib\site-packages\httpx\_api.py", line 319, in post
    return request(
  File "M:\z\venv\lib\site-packages\httpx\_api.py", line 106, in request
    return client.request(
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 827, in request
    return self.send(request, auth=auth, follow_redirects=follow_redirects)
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 914, in send
    response = self._send_handling_auth(
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 942, in _send_handling_auth
    response = self._send_handling_redirects(
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 979, in _send_handling_redirects
    response = self._send_single_request(request)
  File "M:\z\venv\lib\site-packages\httpx\_client.py", line 1015, in _send_single_request
    response = transport.handle_request(request)
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 232, in handle_request
    with map_httpcore_exceptions():
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "M:\z\venv\lib\site-packages\httpx\_transports\default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
07:35:38-075101 INFO     Loading config...
07:35:43-538559 INFO     Start training LoRA Standard ...
07:35:43-540511 INFO     Validating lr scheduler arguments...
07:35:43-541488 INFO     Validating optimizer arguments...
07:35:43-542464 INFO     Validating M:/kohya_ss/Sampleimages/log existence and writability... SUCCESS
07:35:43-543440 INFO     Validating M:/kohya_ss/Sampleimages/model existence and writability... SUCCESS
07:35:43-544417 INFO     Validating M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safetensors
                         existence... SUCCESS
07:35:43-545393 INFO     Validating M:/kohya_ss/Sampleimages/Images existence... SUCCESS
07:35:43-546370 INFO     Folder 4_giganticbreasts: 4 repeats found
07:35:43-547347 INFO     Folder 4_giganticbreasts: 115 images found
07:35:43-548324 INFO     Folder 4_giganticbreasts: 115 * 4 = 460 steps
07:35:43-551252 INFO     Regulatization factor: 1
07:35:43-553205 INFO     Total steps: 460
07:35:43-553205 INFO     Train batch size: 1
07:35:43-554183 INFO     Gradient accumulation steps: 1
07:35:43-555159 INFO     Epoch: 40
07:35:43-556136 INFO     Max train steps: 1600
07:35:43-556136 INFO     stop_text_encoder_training = 0
07:35:43-557111 INFO     lr_warmup_steps = 160
07:35:43-559066 INFO     Saving training config to
                         M:/kohya_ss/Sampleimages/model\giganticbreasts_20240512-073543.json...
07:35:43-561017 INFO     Executing command: M:\z\venv\Scripts\accelerate.EXE launch --dynamo_backend no --dynamo_mode
                         default --gpu_ids 10de268488e21043 --mixed_precision bf16 --num_processes 1 --num_machines 1
                         --num_cpu_threads_per_process 2 M:/z/sd-scripts/sdxl_train_network.py --config_file
                         M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543.toml
07:35:43-564925 INFO     Command executed.
2024-05-12 07:35:51 INFO     Loading settings from                                                    train_util.py:3744
                             M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543.toml...
                    INFO     M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543               train_util.py:3763
2024-05-12 07:35:51 INFO     prepare tokenizers                                                   sdxl_train_util.py:134
2024-05-12 07:35:53 INFO     update token length: 75                                              sdxl_train_util.py:159
                    INFO     Using DreamBooth method.                                               train_network.py:172
                    INFO     prepare images.                                                          train_util.py:1572
                    INFO     found directory M:\kohya_ss\Sampleimages\Images\4_giganticbreasts        train_util.py:1519
                             contains 115 image files
                    INFO     460 train images with repeating.                                         train_util.py:1613
                    INFO     0 reg images.                                                            train_util.py:1616
                    WARNING  no regularization images / 正則化画像が見つかりませんでした              train_util.py:1621
                    INFO     [Dataset 0]                                                              config_util.py:565
                               batch_size: 1
                               resolution: (1024, 1024)
                               enable_bucket: True
                               network_multiplier: 1.0
                               min_bucket_reso: 64
                               max_bucket_reso: 2048
                               bucket_reso_steps: 64
                               bucket_no_upscale: True

                               [Subset 0 of Dataset 0]
                                 image_dir: "M:\kohya_ss\Sampleimages\Images\4_giganticbreasts"
                                 image_count: 115
                                 num_repeats: 4
                                 shuffle_caption: True
                                 keep_tokens: 1
                                 keep_tokens_separator:
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 is_reg: False
                                 class_tokens: giganticbreasts
                                 caption_extension: .txt


                    INFO     [Dataset 0]                                                              config_util.py:571
                    INFO     loading image sizes.                                                      train_util.py:853
100%|█████████████████████████████████████████████████████████████████████████████| 115/115 [00:00<00:00, 39272.51it/s]
                    INFO     make buckets                                                              train_util.py:859
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is   train_util.py:876
                             set, because bucket reso is defined by image size automatically /
                             bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
                             算されるため、min_bucket_resoとmax_bucket_resoは無視されます
                    INFO     number of images (including repeats) /                                    train_util.py:905
                             各bucketの画像枚数(繰り返し回数を含む)
                    INFO     bucket 0: resolution (576, 832), count: 4                                 train_util.py:910
                    INFO     bucket 1: resolution (576, 960), count: 4                                 train_util.py:910
                    INFO     bucket 2: resolution (640, 640), count: 4                                 train_util.py:910
                    INFO     bucket 3: resolution (704, 960), count: 4                                 train_util.py:910
                    INFO     bucket 4: resolution (704, 1280), count: 4                                train_util.py:910
                    INFO     bucket 5: resolution (704, 1344), count: 4                                train_util.py:910
                    INFO     bucket 6: resolution (704, 1408), count: 4                                train_util.py:910
                    INFO     bucket 7: resolution (768, 704), count: 4                                 train_util.py:910
                    INFO     bucket 8: resolution (768, 1152), count: 12                               train_util.py:910
                    INFO     bucket 9: resolution (768, 1216), count: 4                                train_util.py:910
                    INFO     bucket 10: resolution (768, 1344), count: 4                               train_util.py:910
                    INFO     bucket 11: resolution (832, 768), count: 4                                train_util.py:910
                    INFO     bucket 12: resolution (832, 896), count: 4                                train_util.py:910
                    INFO     bucket 13: resolution (832, 1024), count: 4                               train_util.py:910
                    INFO     bucket 14: resolution (832, 1088), count: 36                              train_util.py:910
                    INFO     bucket 15: resolution (832, 1152), count: 68                              train_util.py:910
                    INFO     bucket 16: resolution (832, 1216), count: 44                              train_util.py:910
                    INFO     bucket 17: resolution (896, 832), count: 4                                train_util.py:910
                    INFO     bucket 18: resolution (896, 1024), count: 16                              train_util.py:910
                    INFO     bucket 19: resolution (896, 1088), count: 40                              train_util.py:910
                    INFO     bucket 20: resolution (896, 1152), count: 40                              train_util.py:910
                    INFO     bucket 21: resolution (960, 960), count: 16                               train_util.py:910
                    INFO     bucket 22: resolution (960, 1024), count: 36                              train_util.py:910
                    INFO     bucket 23: resolution (1024, 896), count: 4                               train_util.py:910
                    INFO     bucket 24: resolution (1024, 960), count: 4                               train_util.py:910
                    INFO     bucket 25: resolution (1024, 1024), count: 36                             train_util.py:910
                    INFO     bucket 26: resolution (1088, 832), count: 8                               train_util.py:910
                    INFO     bucket 27: resolution (1088, 896), count: 4                               train_util.py:910
                    INFO     bucket 28: resolution (1152, 832), count: 8                               train_util.py:910
                    INFO     bucket 29: resolution (1152, 896), count: 8                               train_util.py:910
                    INFO     bucket 30: resolution (1216, 832), count: 12                              train_util.py:910
                    INFO     bucket 31: resolution (1280, 704), count: 4                               train_util.py:910
                    INFO     bucket 32: resolution (1344, 768), count: 8                               train_util.py:910
                    INFO     mean ar error (without repeats): 0.012568990147454271                     train_util.py:915
                    WARNING  clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません   sdxl_train_util.py:343
                    INFO     preparing accelerator                                                  train_network.py:225
accelerator device: cpu
                    INFO     loading model for process 0/1                                         sdxl_train_util.py:30
                    INFO     load StableDiffusion checkpoint:                                      sdxl_train_util.py:70
                             M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safete
                             nsors
                    INFO     building U-Net                                                       sdxl_model_util.py:192
2024-05-12 07:35:54 INFO     loading U-Net from checkpoint                                        sdxl_model_util.py:196
2024-05-12 07:36:06 INFO     U-Net: <All keys matched successfully>                               sdxl_model_util.py:202
                    INFO     building text encoders                                               sdxl_model_util.py:205
                    INFO     loading text encoders from checkpoint                                sdxl_model_util.py:258
                    INFO     text encoder 1: <All keys matched successfully>                      sdxl_model_util.py:272
2024-05-12 07:36:10 INFO     text encoder 2: <All keys matched successfully>                      sdxl_model_util.py:276
                    INFO     building VAE                                                         sdxl_model_util.py:279
                    INFO     loading VAE from checkpoint                                          sdxl_model_util.py:284
                    INFO     VAE: <All keys matched successfully>                                 sdxl_model_util.py:287
                    INFO     Enable xformers for U-Net                                                train_util.py:2660
Traceback (most recent call last):
  File "M:\z\sd-scripts\sdxl_train_network.py", line 185, in <module>
    trainer.train(args)
  File "M:\z\sd-scripts\train_network.py", line 242, in train
    vae.set_use_memory_efficient_attention_xformers(args.xformers)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "M:\z\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "M:\z\venv\lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers
    raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "M:\z\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
  File "M:\z\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "M:\z\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "M:\z\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['M:\\z\\venv\\Scripts\\python.exe', 'M:/z/sd-scripts/sdxl_train_network.py', '--config_file', 'M:/kohya_ss/Sampleimages/model/config_lora-20240512-073543.toml']' returned non-zero exit status 1.
07:36:12-768539 INFO     Training has ended.
Keyboard interruption in main thread... closing server.

In short the same old song and dance. 😩 Any advice?

@machineminded
Copy link

ValueError: torch.cuda.is_available() should be True but is False.

Not sure what happened but you should be able to install torch+cu118 and resolve this. Check this link:

https://pytorch.org/get-started/locally/

@b-fission
Copy link
Contributor

@Deejay85
This part of your log indicates the most likely problem: --gpu_ids 10de268488e21043

The GPU IDs option on your config seems to be junk text. (It's an option located under the Accelerate launch category)
Leave it blank so it resembles the screenshot below, and the training should be able to run.

gpuids

@bmaltais
Copy link
Owner

I might add an input validator and log a message if it does not match the expected pattern

@Deejay85
Copy link
Author

I tried leaving it blank, with spaces, dashes, and as two blocks of text separated only by a hyphen...none of that worked. I am using only one graphics card BTW, because 4090s don't grow on trees you know? 😜

@b-fission
Copy link
Contributor

b-fission commented May 26, 2024

Supposing you kept it blank for GPU ID, does the log still show this error you had before like ValueError: torch.cuda.is_available() should be True but is False .. or was it a different error?

@Deejay85
Copy link
Author

Same error. If you want I could copy/paste the new log.

@b-fission
Copy link
Contributor

Sure, post your log output.

@Deejay85
Copy link
Author

18:54:24-292864 INFO Kohya_ss GUI version: v24.1.4
fatal: not a git repository (or any of the parent directories): .git
18:54:24-530151 ERROR Error during Git operation: Command '['git', 'submodule', 'update', '--init', '--recursive',
'--quiet']' returned non-zero exit status 128.
18:54:24-535034 INFO nVidia toolkit detected
18:54:25-865018 INFO Torch 2.1.2+cu118
18:54:25-885525 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8905
18:54:25-888454 INFO Torch detected GPU: NVIDIA GeForce RTX 4090 VRAM 24564 Arch (8, 9) Cores 128
18:54:25-892360 INFO Python version is 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit
(AMD64)]
18:54:25-895289 INFO Verifying modules installation status from requirements_pytorch_windows.txt...
18:54:25-898219 INFO Verifying modules installation status from requirements_windows.txt...
18:54:25-900172 INFO Verifying modules installation status from requirements.txt...
18:54:31-930997 INFO headless: False
18:54:31-969079 INFO Using shell=True when running external commands...
IMPORTANT: You are using gradio version 4.26.0, however version 4.29.0 is available, please upgrade.

Running on local URL: http://127.0.0.1:7860

To create a public link, set share=True in launch().
Exception in thread Thread-5 (_do_normal_analytics_request):
Traceback (most recent call last):
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 69, in map_httpcore_exceptions
yield
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 233, in handle_request
resp = self._pool.handle_request(req)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\connection_pool.py", line 216, in handle_request
raise exc from None
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\connection_pool.py", line 196, in handle_request
response = connection.handle_request(
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\connection.py", line 101, in handle_request
return self._connection.handle_request(request)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 143, in handle_request
raise exc
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 95, in handle_request
self._send_request_body(**kwargs)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 166, in _send_request_body
self._send_event(event, timeout=timeout)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_sync\http11.py", line 175, in _send_event
self._network_stream.write(bytes_to_send, timeout=timeout)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_backends\sync.py", line 133, in write
with map_exceptions(exc_map):
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "M:\kohya_ss\venv\lib\site-packages\httpcore_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.WriteTimeout: The write operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 1016, in _bootstrap_inner
self.run()
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "M:\kohya_ss\venv\lib\site-packages\gradio\analytics.py", line 63, in _do_normal_analytics_request
httpx.post(url, data=data, timeout=5)
File "M:\kohya_ss\venv\lib\site-packages\httpx_api.py", line 319, in post
return request(
File "M:\kohya_ss\venv\lib\site-packages\httpx_api.py", line 106, in request
return client.request(
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 827, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 914, in send
response = self._send_handling_auth(
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
File "M:\kohya_ss\venv\lib\site-packages\httpx_client.py", line 1015, in _send_single_request
response = transport.handle_request(request)
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 232, in handle_request
with map_httpcore_exceptions():
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "M:\kohya_ss\venv\lib\site-packages\httpx_transports\default.py", line 86, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.WriteTimeout: The write operation timed out
18:54:53-247570 INFO Loading config...
18:55:04-303432 INFO Save...
18:55:07-492658 INFO Start training LoRA Standard ...
18:55:07-493635 INFO Validating lr scheduler arguments...
18:55:07-495588 INFO Validating optimizer arguments...
18:55:07-496565 INFO Validating M:/kohya_ss/Sampleimages/log existence and writability... SUCCESS
18:55:07-497541 INFO Validating M:/kohya_ss/Sampleimages/model existence and writability... SUCCESS
18:55:07-498521 INFO Validating M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safetensors
existence... SUCCESS
18:55:07-499494 INFO Validating M:/kohya_ss/Sampleimages/Images existence... SUCCESS
18:55:07-500471 INFO Folder 4_giganticbreasts: 4 repeats found
18:55:07-501447 INFO Folder 4_giganticbreasts: 115 images found
18:55:07-502424 INFO Folder 4_giganticbreasts: 115 * 4 = 460 steps
18:55:07-504377 INFO Regulatization factor: 1
18:55:07-505353 INFO Total steps: 460
18:55:07-508283 INFO Train batch size: 1
18:55:07-512189 INFO Gradient accumulation steps: 1
18:55:07-516095 INFO Epoch: 40
18:55:07-517072 INFO Max train steps: 1600
18:55:07-518049 INFO stop_text_encoder_training = 0
18:55:07-519025 INFO lr_warmup_steps = 160
18:55:07-520978 INFO Saving training config to
M:/kohya_ss/Sampleimages/model\giganticbreasts_20240526-185507.json...
18:55:07-521953 INFO Executing command: M:\kohya_ss\venv\Scripts\accelerate.EXE launch --dynamo_backend no
--dynamo_mode default --mixed_precision bf16 --num_processes 1 --num_machines 1
--num_cpu_threads_per_process 2 M:/kohya_ss/sd-scripts/sdxl_train_network.py --config_file
M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507.toml
18:55:07-526836 INFO Command executed.
2024-05-26 18:55:14 INFO Loading settings from train_util.py:3744
M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507.toml...
INFO M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507 train_util.py:3763
2024-05-26 18:55:14 INFO prepare tokenizers sdxl_train_util.py:134
2024-05-26 18:55:15 INFO update token length: 75 sdxl_train_util.py:159
INFO Using DreamBooth method. train_network.py:172
INFO prepare images. train_util.py:1572
INFO found directory M:\kohya_ss\Sampleimages\Images\4_giganticbreasts train_util.py:1519
contains 115 image files
INFO 460 train images with repeating. train_util.py:1613
INFO 0 reg images. train_util.py:1616
WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1621
INFO [Dataset 0] config_util.py:565
batch_size: 1
resolution: (1024, 1024)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 64
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True

                           [Subset 0 of Dataset 0]
                             image_dir: "M:\kohya_ss\Sampleimages\Images\4_giganticbreasts"
                             image_count: 115
                             num_repeats: 4
                             shuffle_caption: True
                             keep_tokens: 1
                             keep_tokens_separator:
                             secondary_separator: None
                             enable_wildcard: False
                             caption_dropout_rate: 0.0
                             caption_dropout_every_n_epoches: 0
                             caption_tag_dropout_rate: 0.0
                             caption_prefix: None
                             caption_suffix: None
                             color_aug: False
                             flip_aug: False
                             face_crop_aug_range: None
                             random_crop: False
                             token_warmup_min: 1,
                             token_warmup_step: 0,
                             is_reg: False
                             class_tokens: giganticbreasts
                             caption_extension: .txt


                INFO     [Dataset 0]                                                              config_util.py:571
                INFO     loading image sizes.                                                      train_util.py:853

100%|█████████████████████████████████████████████████████████████████████████████| 115/115 [00:00<00:00, 39262.92it/s]
INFO make buckets train_util.py:859
WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is train_util.py:876
set, because bucket reso is defined by image size automatically /
bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計
算されるため、min_bucket_resoとmax_bucket_resoは無視されます
INFO number of images (including repeats) / train_util.py:905
各bucketの画像枚数(繰り返し回数を含む)
INFO bucket 0: resolution (576, 832), count: 4 train_util.py:910
INFO bucket 1: resolution (576, 960), count: 4 train_util.py:910
INFO bucket 2: resolution (640, 640), count: 4 train_util.py:910
INFO bucket 3: resolution (704, 960), count: 4 train_util.py:910
INFO bucket 4: resolution (704, 1280), count: 4 train_util.py:910
INFO bucket 5: resolution (704, 1344), count: 4 train_util.py:910
INFO bucket 6: resolution (704, 1408), count: 4 train_util.py:910
INFO bucket 7: resolution (768, 704), count: 4 train_util.py:910
INFO bucket 8: resolution (768, 1152), count: 12 train_util.py:910
INFO bucket 9: resolution (768, 1216), count: 4 train_util.py:910
INFO bucket 10: resolution (768, 1344), count: 4 train_util.py:910
INFO bucket 11: resolution (832, 768), count: 4 train_util.py:910
INFO bucket 12: resolution (832, 896), count: 4 train_util.py:910
INFO bucket 13: resolution (832, 1024), count: 4 train_util.py:910
INFO bucket 14: resolution (832, 1088), count: 36 train_util.py:910
INFO bucket 15: resolution (832, 1152), count: 68 train_util.py:910
INFO bucket 16: resolution (832, 1216), count: 44 train_util.py:910
INFO bucket 17: resolution (896, 832), count: 4 train_util.py:910
INFO bucket 18: resolution (896, 1024), count: 16 train_util.py:910
INFO bucket 19: resolution (896, 1088), count: 40 train_util.py:910
INFO bucket 20: resolution (896, 1152), count: 40 train_util.py:910
INFO bucket 21: resolution (960, 960), count: 16 train_util.py:910
INFO bucket 22: resolution (960, 1024), count: 36 train_util.py:910
INFO bucket 23: resolution (1024, 896), count: 4 train_util.py:910
INFO bucket 24: resolution (1024, 960), count: 4 train_util.py:910
INFO bucket 25: resolution (1024, 1024), count: 36 train_util.py:910
INFO bucket 26: resolution (1088, 832), count: 8 train_util.py:910
INFO bucket 27: resolution (1088, 896), count: 4 train_util.py:910
INFO bucket 28: resolution (1152, 832), count: 8 train_util.py:910
INFO bucket 29: resolution (1152, 896), count: 8 train_util.py:910
INFO bucket 30: resolution (1216, 832), count: 12 train_util.py:910
INFO bucket 31: resolution (1280, 704), count: 4 train_util.py:910
INFO bucket 32: resolution (1344, 768), count: 8 train_util.py:910
INFO mean ar error (without repeats): 0.012568990147454271 train_util.py:915
WARNING clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません sdxl_train_util.py:343
INFO preparing accelerator train_network.py:225
accelerator device: cpu
INFO loading model for process 0/1 sdxl_train_util.py:30
INFO load StableDiffusion checkpoint: sdxl_train_util.py:70
M:/StableDiffusion/models/Stable-diffusion/SDXL/sd_xl_base_1.0.safete
nsors
2024-05-26 18:55:16 INFO building U-Net sdxl_model_util.py:192
INFO loading U-Net from checkpoint sdxl_model_util.py:196
2024-05-26 18:55:28 INFO U-Net: sdxl_model_util.py:202
INFO building text encoders sdxl_model_util.py:205
INFO loading text encoders from checkpoint sdxl_model_util.py:258
INFO text encoder 1: sdxl_model_util.py:272
2024-05-26 18:55:32 INFO text encoder 2: sdxl_model_util.py:276
INFO building VAE sdxl_model_util.py:279
INFO loading VAE from checkpoint sdxl_model_util.py:284
INFO VAE: sdxl_model_util.py:287
INFO Enable xformers for U-Net train_util.py:2660
Traceback (most recent call last):
File "M:\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in
trainer.train(args)
File "M:\kohya_ss\sd-scripts\train_network.py", line 242, in train
vae.set_use_memory_efficient_attention_xformers(args.xformers)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "M:\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 260, in set_use_memory_efficient_attention_xformers
raise ValueError(
ValueError: torch.cuda.is_available() should be True but is False. xformers' memory efficient attention is only available for GPU
Traceback (most recent call last):
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Ande\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "M:\kohya_ss\venv\Scripts\accelerate.EXE_main
.py", line 7, in
File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
simple_launcher(args)
File "M:\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['M:\kohya_ss\venv\Scripts\python.exe', 'M:/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'M:/kohya_ss/Sampleimages/model/config_lora-20240526-185507.toml']' returned non-zero exit status 1.
18:55:35-725043 INFO Training has ended.

@b-fission
Copy link
Contributor

Can you look in the folder at C:\Users\yourname\.cache\huggingface\accelerate
If you see a file called default_config.yaml then delete that file, and see if that fixes it.

@Deejay85
Copy link
Author

Surprisingly it did. 🎉 Now if I only knew what value was messing it up.

@b-fission
Copy link
Contributor

It's probably the gpu_ids setting in that file. The default value is all and I'll assume it wasn't at default which caused the problems here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants