Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torch not compiled with CUDA enabled on M1 mac #883

Closed
Aniket22156 opened this issue Jun 1, 2023 · 10 comments
Closed

Torch not compiled with CUDA enabled on M1 mac #883

Aniket22156 opened this issue Jun 1, 2023 · 10 comments

Comments

@Aniket22156
Copy link

Aniket22156 commented Jun 1, 2023

Folder 100_heer: 26 images found
Folder 100_heer: 2600 steps
Total steps: 2600
Train batch size: 1
Gradient accumulation steps: 1.0
Epoch: 1
Regulatization factor: 1
max_train_steps (2600 / 1 / 1.0 * 1 * 1) = 2600
stop_text_encoder_training = 0
lr_warmup_steps = 260
accelerate launch --num_cpu_threads_per_process=2 "train_network.py" --enable_bucket --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="/Users/aniketsharma/Documents/Sharma/image" --resolution=512,512 --output_dir="/Users/aniketsharma/Documents/Sharma/model" --logging_dir="/Users/aniketsharma/Documents/Sharma/log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.0001 --network_dim=8 --output_name="last" --lr_scheduler_num_cycles="1" --learning_rate="0.0001" --lr_scheduler="cosine" --lr_warmup_steps="260" --train_batch_size="1" --max_train_steps="2600" --save_every_n_epochs="1" --mixed_precision="no" --save_precision="float" --cache_latents --optimizer_type="AdamW8bit" --max_data_loader_n_workers="0" --bucket_reso_steps=64 --mem_eff_attn --xformers --bucket_no_upscale
prepare tokenizer
Use DreamBooth method.
prepare images.
found directory /Users/aniketsharma/Documents/Sharma/image/100_heer contains 26 image files
2600 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 1
resolution: (512, 512)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 1024
bucket_reso_steps: 64
bucket_no_upscale: True

[Subset 0 of Dataset 0]
image_dir: "/Users/aniketsharma/Documents/Sharma/image/100_heer"
image_count: 26
num_repeats: 100
shuffle_caption: False
keep_tokens: 0
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: heer
caption_extension: .caption

[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:00<00:00, 2470.48it/s]
make buckets
min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 2600
mean ar error (without repeats): 0.0
prepare accelerator
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/accelerator.py:249: FutureWarning: logging_dir is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use project_dir instead.
warnings.warn(
Using accelerator 0.15.0 or above.
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 37673.39it/s]
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/safetensors/torch.py:98: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(filename, framework="pt", device=device) as f:
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/storage.py:899: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
storage = cls(wrap_storage=untyped_storage)
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/transformers/modeling_utils.py:402: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
with safe_open(checkpoint_file, framework="pt") as f:
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 .
CrossAttention.forward has been replaced to FlashAttention (not xformers)
[Dataset 0]
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:09<00:00, 2.69it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 8, alpha: 1.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues

bin /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:33: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
warn("The installed version of bitsandbytes was compiled without GPU support. "
CUDA SETUP: Loading binary /Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
dlopen(/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so, 0x0006): tried: '/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file), '/System/Volumes/Preboot/Cryptexes/OS/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (no such file), '/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so' (not a mach-o file)
use 8-bit AdamW optimizer | {}
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 2600
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 2600
num epochs / epoch数: 1
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 2600
steps: 0%| | 0/2600 [00:00<?, ?it/s]
epoch 1/1
Traceback (most recent call last):
File "/Users/aniketsharma/Documents/taining/kohya_ss/train_network.py", line 783, in
train(args)
File "/Users/aniketsharma/Documents/taining/kohya_ss/train_network.py", line 634, in train
optimizer.step()
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/optimizer.py", line 140, in step
self.optimizer.step(closure)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/lr_scheduler.py", line 69, in wrapper
return wrapped(*args, **kwargs)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/optimizer.py", line 280, in wrapper
out = func(*args, **kwargs)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 263, in step
self.update_step(group, p, gindex, pindex)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/optim/optimizer.py", line 504, in update_step
F.optimizer_update_8bit_blockwise(
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 972, in optimizer_update_8bit_blockwise
prev_device = pre_call(g.device)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/functional.py", line 317, in pre_call
prev_device = torch.cuda.current_device()
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 674, in current_device
_lazy_init()
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
steps: 0%| | 0/2600 [00:05<?, ?it/s]
Traceback (most recent call last):
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/bin/accelerate", line 8, in
sys.exit(main())
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 923, in launch_command
simple_launcher(args)
File "/Users/aniketsharma/Documents/taining/kohya_ss/venv/lib/python3.10/site-packages/accelerate/commands/launch.py", line 579, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/Users/aniketsharma/Documents/taining/kohya_ss/venv/bin/python', 'train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=/Users/aniketsharma/Documents/Sharma/image', '--resolution=512,512', '--output_dir=/Users/aniketsharma/Documents/Sharma/model', '--logging_dir=/Users/aniketsharma/Documents/Sharma/log', '--network_alpha=1', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=8', '--output_name=last', '--lr_scheduler_num_cycles=1', '--learning_rate=0.0001', '--lr_scheduler=cosine', '--lr_warmup_steps=260', '--train_batch_size=1', '--max_train_steps=2600', '--save_every_n_epochs=1', '--mixed_precision=no', '--save_precision=float', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_data_loader_n_workers=0', '--bucket_reso_steps=64', '--mem_eff_attn', '--xformers', '--bucket_no_upscale']' returned non-zero exit status 1.
^CKeyboard interruption in main thread... closing server.

@andupotorac
Copy link

andupotorac commented Jun 1, 2023

Follow this (translate to english):
https://planaria.page/blog/?p=671

Might need to use this at some point
conda install -n base conda=23.1.0

And also to activate before calling accelerate config.
. ./venv/bin/activate

xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

@Aniket22156
Copy link
Author

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found
./default_config.yaml: line 2: commands:: command not found
./default_config.yaml: line 3: compute_environment:: command not found
./default_config.yaml: line 4: deepspeed_config:: command not found
./default_config.yaml: line 5: distributed_type:: command not found
./default_config.yaml: line 6: downcast_bf16:: command not found
./default_config.yaml: line 7: dynamo_backend:: command not found
./default_config.yaml: line 8: fsdp_config:: command not found
./default_config.yaml: line 9: gpu_ids:: command not found
./default_config.yaml: line 10: machine_rank:: command not found
./default_config.yaml: line 11: main_process_ip:: command not found
./default_config.yaml: line 12: main_process_port:: command not found
./default_config.yaml: line 13: main_training_function:: command not found
./default_config.yaml: line 14: megatron_lm_config:: command not found
./default_config.yaml: line 15: mixed_precision:: command not found
./default_config.yaml: line 16: num_machines:: command not found
./default_config.yaml: line 17: num_processes:: command not found
./default_config.yaml: line 18: rdzv_backend:: command not found
./default_config.yaml: line 19: same_network:: command not found
./default_config.yaml: line 20: tpu_name:: command not found
./default_config.yaml: line 21: tpu_zone:: command not found
./default_config.yaml: line 22: use_cpu:: command not found

@Aniket22156
Copy link
Author

Follow this (translate to english): https://planaria.page/blog/?p=671

Might need to use this at some point conda install -n base conda=23.1.0

And also to activate before calling accelerate config. . ./venv/bin/activate

xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found
./default_config.yaml: line 2: commands:: command not found
./default_config.yaml: line 3: compute_environment:: command not found
./default_config.yaml: line 4: deepspeed_config:: command not found
./default_config.yaml: line 5: distributed_type:: command not found
./default_config.yaml: line 6: downcast_bf16:: command not found
./default_config.yaml: line 7: dynamo_backend:: command not found
./default_config.yaml: line 8: fsdp_config:: command not found
./default_config.yaml: line 9: gpu_ids:: command not found
./default_config.yaml: line 10: machine_rank:: command not found
./default_config.yaml: line 11: main_process_ip:: command not found
./default_config.yaml: line 12: main_process_port:: command not found
./default_config.yaml: line 13: main_training_function:: command not found
./default_config.yaml: line 14: megatron_lm_config:: command not found
./default_config.yaml: line 15: mixed_precision:: command not found
./default_config.yaml: line 16: num_machines:: command not found
./default_config.yaml: line 17: num_processes:: command not found
./default_config.yaml: line 18: rdzv_backend:: command not found
./default_config.yaml: line 19: same_network:: command not found
./default_config.yaml: line 20: tpu_name:: command not found
./default_config.yaml: line 21: tpu_zone:: command not found
./default_config.yaml: line 22: use_cpu:: command not found

@andupotorac
Copy link

Follow this (translate to english): https://planaria.page/blog/?p=671
Might need to use this at some point conda install -n base conda=23.1.0
And also to activate before calling accelerate config. . ./venv/bin/activate
xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found ./default_config.yaml: line 2: commands:: command not found ./default_config.yaml: line 3: compute_environment:: command not found ./default_config.yaml: line 4: deepspeed_config:: command not found ./default_config.yaml: line 5: distributed_type:: command not found ./default_config.yaml: line 6: downcast_bf16:: command not found ./default_config.yaml: line 7: dynamo_backend:: command not found ./default_config.yaml: line 8: fsdp_config:: command not found ./default_config.yaml: line 9: gpu_ids:: command not found ./default_config.yaml: line 10: machine_rank:: command not found ./default_config.yaml: line 11: main_process_ip:: command not found ./default_config.yaml: line 12: main_process_port:: command not found ./default_config.yaml: line 13: main_training_function:: command not found ./default_config.yaml: line 14: megatron_lm_config:: command not found ./default_config.yaml: line 15: mixed_precision:: command not found ./default_config.yaml: line 16: num_machines:: command not found ./default_config.yaml: line 17: num_processes:: command not found ./default_config.yaml: line 18: rdzv_backend:: command not found ./default_config.yaml: line 19: same_network:: command not found ./default_config.yaml: line 20: tpu_name:: command not found ./default_config.yaml: line 21: tpu_zone:: command not found ./default_config.yaml: line 22: use_cpu:: command not found

Did you activate your environment first from within Kohya?
. ./venv/bin/activate

@Aniket22156
Copy link
Author

Follow this (translate to english): https://planaria.page/blog/?p=671
Might need to use this at some point conda install -n base conda=23.1.0
And also to activate before calling accelerate config. . ./venv/bin/activate
xformers and memory attention will need to be on, if you want to be able to run the LORA training for less than hundreds of hours, when you get to editing the params.

can you tell me more in depth, I tried to run the accelerate config but it returns ./default_config.yaml: line 1: command_file:: command not found ./default_config.yaml: line 2: commands:: command not found ./default_config.yaml: line 3: compute_environment:: command not found ./default_config.yaml: line 4: deepspeed_config:: command not found ./default_config.yaml: line 5: distributed_type:: command not found ./default_config.yaml: line 6: downcast_bf16:: command not found ./default_config.yaml: line 7: dynamo_backend:: command not found ./default_config.yaml: line 8: fsdp_config:: command not found ./default_config.yaml: line 9: gpu_ids:: command not found ./default_config.yaml: line 10: machine_rank:: command not found ./default_config.yaml: line 11: main_process_ip:: command not found ./default_config.yaml: line 12: main_process_port:: command not found ./default_config.yaml: line 13: main_training_function:: command not found ./default_config.yaml: line 14: megatron_lm_config:: command not found ./default_config.yaml: line 15: mixed_precision:: command not found ./default_config.yaml: line 16: num_machines:: command not found ./default_config.yaml: line 17: num_processes:: command not found ./default_config.yaml: line 18: rdzv_backend:: command not found ./default_config.yaml: line 19: same_network:: command not found ./default_config.yaml: line 20: tpu_name:: command not found ./default_config.yaml: line 21: tpu_zone:: command not found ./default_config.yaml: line 22: use_cpu:: command not found

Did you activate your environment first from within Kohya? . ./venv/bin/activate

yes

@andupotorac
Copy link

Feel free to overwrite yours (be sure it's the one in cache, from huggingspace) where it's saved:
https://rentry.org/85cps

@justostoll
Copy link

I am also getting this issue on a Mac M1 when I start training, even though I have not selected options for a GPU in the settings:

00:16:19-704016 INFO accelerate launch --num_cpu_threads_per_process=8
"./train_network.py" --enable_bucket
--min_bucket_reso=256 --max_bucket_reso=2048
--pretrained_model_name_or_path="runwayml/stable-diffus
ion-v1-5"
--train_data_dir="/Volumes/EXT04005/Miscellaneous/Video
s/TED/Misc/Others/Training"
--resolution="512,512"
--output_dir="/Users/user/stable-diffusion-webui/
embeddings"
--logging_dir="/Volumes/EXT04005/Miscellaneous/Videos/T
ED/Misc/Others/Training/100_Training
/logs" --network_alpha="128"
--save_model_as=safetensors
--network_module=networks.lora --text_encoder_lr=5e-05
--unet_lr=0.0001 --network_dim=128
--output_name="Output_v1"
--lr_scheduler_num_cycles="1" --no_half_vae
--learning_rate="0.0001" --lr_scheduler="constant"
--train_batch_size="2" --max_train_steps="500"
--save_every_n_epochs="1" --mixed_precision="no"
--save_precision="float" --seed="1234"
--caption_extension=".txt" --cache_latents
--optimizer_type="AdamW8bit"
--max_data_loader_n_workers="1" --bucket_reso_steps=64
--bucket_no_upscale --noise_offset=0.0

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /Users/User/kohya_ss/./train_network.py:990 in │
│ │
│ 987 │ args = train_util.read_config_from_file(args, parser) │
│ 988 │ │
│ 989 │ trainer = NetworkTrainer() │
│ ❱ 990 │ trainer.train(args) │
│ 991 │
│ │
│ /Users/User/kohya_ss/./train_network.py:803 in train │
│ │
│ 800 │ │ │ │ │ │ params_to_clip = network.get_trainable_params( │
│ 801 │ │ │ │ │ │ accelerator.clip_grad_norm_(params_to_clip, ar │
│ 802 │ │ │ │ │ │
│ ❱ 803 │ │ │ │ │ optimizer.step() │
│ 804 │ │ │ │ │ lr_scheduler.step() │
│ 805 │ │ │ │ │ optimizer.zero_grad(set_to_none=True) │
│ 806 │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/opti │
│ mizer.py:140 in step │
│ │
│ 137 │ │ │ │ # If we reduced the loss scale, it means the optimizer │
│ 138 │ │ │ │ self._is_overflow = scale_after < scale_before │
│ 139 │ │ │ else: │
│ ❱ 140 │ │ │ │ self.optimizer.step(closure) │
│ 141 │ │
│ 142 │ def switch_parameters(self, parameters_map): │
│ 143 │ │ for param_group in self.optimizer.param_groups: │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/lr

│ scheduler.py:69 in wrapper │
│ │
│ 66 │ │ │ │ instance = instance_ref() │
│ 67 │ │ │ │ instance._step_count += 1 │
│ 68 │ │ │ │ wrapped = func.get(instance, cls) │
│ ❱ 69 │ │ │ │ return wrapped(*args, **kwargs) │
│ 70 │ │ │ │
│ 71 │ │ │ # Note that the returned function here is no longer a bou │
│ 72 │ │ │ # so attributes like __func__ and __self__ no longer │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/optim/opt │
│ imizer.py:280 in wrapper │
│ │
│ 277 │ │ │ │ │ │ │ raise RuntimeError(f"{func} must return No │
│ 278 │ │ │ │ │ │ │ │ │ │ │ f"but got {result}.") │
│ 279 │ │ │ │ │
│ ❱ 280 │ │ │ │ out = func(*args, **kwargs) │
│ 281 │ │ │ │ self._optimizer_step_code() │
│ 282 │ │ │ │ │
│ 283 │ │ │ │ # call optimizer step post hooks │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_co │
│ ntextlib.py:115 in decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/op │
│ tim/optimizer.py:269 in step │
│ │
│ 266 │ │ │ │ │ self.init_state(group, p, gindex, pindex) │
│ 267 │ │ │ │ │
│ 268 │ │ │ │ self.prefetch_state(p) │
│ ❱ 269 │ │ │ │ self.update_step(group, p, gindex, pindex) │
│ 270 │ │ │ │ torch.cuda.synchronize() │
│ 271 │ │ if self.is_paged: │
│ 272 │ │ │ # all paged operation are asynchronous, we need │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/utils/_co │
│ ntextlib.py:115 in decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/op │
│ tim/optimizer.py:517 in update_step │
│ │
│ 514 │ │ │ state["max1"], state["new_max1"] = state["new_max1"], stat │
│ 515 │ │ │ state["max2"], state["new_max2"] = state["new_max2"], stat │
│ 516 │ │ elif state["state1"].dtype == torch.uint8 and config["block_wi │
│ ❱ 517 │ │ │ F.optimizer_update_8bit_blockwise( │
│ 518 │ │ │ │ self.optimizer_name, │
│ 519 │ │ │ │ grad, │
│ 520 │ │ │ │ p, │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/fu │
│ nctional.py:1278 in optimizer_update_8bit_blockwise │
│ │
│ 1275 ) -> None: │
│ 1276 │ │
│ 1277 │ optim_func = None │
│ ❱ 1278 │ prev_device = pre_call(g.device) │
│ 1279 │ is_on_gpu([g, p, state1, state2, qmap1, qmap2, absmax1, absmax2]) │
│ 1280 │ if g.dtype == torch.float32 and state1.dtype == torch.uint8: │
│ 1281 │ │ optim_func = str2optimizer8bit_blockwise[optimizer_name][0] │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/bitsandbytes/fu │
│ nctional.py:415 in pre_call │
│ │
│ 412 │
│ 413 │
│ 414 def pre_call(device): │
│ ❱ 415 │ prev_device = torch.cuda.current_device() │
│ 416 │ torch.cuda.set_device(device) │
│ 417 │ return prev_device │
│ 418 │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/in │
│ it
.py:674 in current_device │
│ │
│ 671 │
│ 672 def current_device() -> int: │
│ 673 │ r"""Returns the index of a currently selected device.""" │
│ ❱ 674 │ _lazy_init() │
│ 675 │ return torch._C._cuda_getDevice() │
│ 676 │
│ 677 │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/torch/cuda/in │
│ it
.py:239 in _lazy_init │
│ │
│ 236 │ │ │ │ "Cannot re-initialize CUDA in forked subprocess. To u │
│ 237 │ │ │ │ "multiprocessing, you must use the 'spawn' start meth │
│ 238 │ │ if not hasattr(torch._C, '_cuda_getDeviceCount'): │
│ ❱ 239 │ │ │ raise AssertionError("Torch not compiled with CUDA enable │
│ 240 │ │ if _cudart is None: │
│ 241 │ │ │ raise AssertionError( │
│ 242 │ │ │ │ "libcudart functions unavailable. It looks like you h │
╰──────────────────────────────────────────────────────────────────────────────╯
AssertionError: Torch not compiled with CUDA enabled
steps: 0%| | 0/500 [00:10<?, ?it/s]
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /Users/User/kohya_ss/venv/bin/accelerate:8 in │
│ │
│ 5 from accelerate.commands.accelerate_cli import main │
│ 6 if name == 'main': │
│ 7 │ sys.argv[0] = re.sub(r'(-script.pyw|.exe)?$', '', sys.argv[0]) │
│ ❱ 8 │ sys.exit(main()) │
│ 9 │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/comm │
│ ands/accelerate_cli.py:45 in main │
│ │
│ 42 │ │ exit(1) │
│ 43 │ │
│ 44 │ # Run │
│ ❱ 45 │ args.func(args) │
│ 46 │
│ 47 │
│ 48 if name == "main": │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/comm │
│ ands/launch.py:918 in launch_command │
│ │
│ 915 │ elif defaults is not None and defaults.compute_environment == Comp │
│ 916 │ │ sagemaker_launcher(defaults, args) │
│ 917 │ else: │
│ ❱ 918 │ │ simple_launcher(args) │
│ 919 │
│ 920 │
│ 921 def main(): │
│ │
│ /Users/User/kohya_ss/venv/lib/python3.10/site-packages/accelerate/comm │
│ ands/launch.py:580 in simple_launcher │
│ │
│ 577 │ process.wait() │
│ 578 │ if process.returncode != 0: │
│ 579 │ │ if not args.quiet: │
│ ❱ 580 │ │ │ raise subprocess.CalledProcessError(returncode=process.ret │
│ 581 │ │ else: │
│ 582 │ │ │ sys.exit(1) │
│ 583 │
╰──────────────────────────────────────────────────────────────────────────────╯
CalledProcessError: Command '['/Users/User/kohya_ss/venv/bin/python',
'./train_network.py', '--enable_bucket', '--min_bucket_reso=256',
'--max_bucket_reso=2048',
'--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5',
'--train_data_dir=/Volumes/EXT04005/Miscellaneous/Videos/TED/Misc/Others/Training', '--resolution=512,512',
'--output_dir=/Users/User/stable-diffusion-webui/embeddings',
'--logging_dir=/Volumes/EXT04005/Miscellaneous/Videos/TED/Misc/Others/Training/100_Training/logs', '--network_alpha=128',
'--save_model_as=safetensors', '--network_module=networks.lora',
'--text_encoder_lr=5e-05', '--unet_lr=0.0001', '--network_dim=128',
'--output_name=Output_v1', '--lr_scheduler_num_cycles=1',
'--no_half_vae', '--learning_rate=0.0001', '--lr_scheduler=constant',
'--train_batch_size=2', '--max_train_steps=500', '--save_every_n_epochs=1',
'--mixed_precision=no', '--save_precision=float', '--seed=1234',
'--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit',
'--max_data_loader_n_workers=1', '--bucket_reso_steps=64',
'--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

@bneigher
Copy link

bneigher commented Nov 5, 2023

same

@ourcolour
Copy link

+1

@ourcolour
Copy link

I've tried this way on My Macbook Pro M2:

Mixed Precision: no
Save Precision: float
Optimizer: AdamW
Advanced Configuraion: UN-check xformers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants