Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liunx上训练报错 #48

Closed
xujipm opened this issue Mar 27, 2023 · 2 comments
Closed

Liunx上训练报错 #48

xujipm opened this issue Mar 27, 2023 · 2 comments

Comments

@xujipm
Copy link

xujipm commented Mar 27, 2023

running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 78
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 26
  num epochs / epoch数: 20
  batch size per device / バッチサイズ: 3
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 520
steps:   0%|                                                                                    | 0/520 [00:00<?, ?it/s]epoch 1/20
Traceback (most recent call last):
  File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 699, in <module>
    train(args)
  File "/home/stable/lora-scripts/./sd-scripts/train_network.py", line 538, in train
    noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/utils/operations.py", line 490, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_condition.py", line 381, in forward
    sample, res_samples = downsample_block(
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/unet_2d_blocks.py", line 612, in forward
    hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 216, in forward
    hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/diffusers/models/attention.py", line 484, in forward
    hidden_states = self.attn1(norm_hidden_states) + hidden_states
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/stable/lora-scripts/sd-scripts/library/train_util.py", line 1700, in forward_xformers
    out = xformers.ops.memory_efficient_attention(q, k, v, attn_bias=None)  # 最適なのを選んでくれる
  File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 975, in memory_efficient_attention
    return op.apply(query, key, value, attn_bias, p, scale).reshape(output_shape)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/xformers/ops/memory_efficient_attention.py", line 360, in forward
    out, lse = cls.FORWARD_OPERATOR(
  File "/home/stable/anaconda3/lib/python3.10/site-packages/torch/_ops.py", line 442, in __call__
    return self._op(*args, **kwargs or {})
NotImplementedError: Could not run 'xformers::efficient_attention_forward_cutlass' with arguments from the 'CUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'xformers::efficient_attention_forward_cutlass' is only available for these backends: [BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradMPS, AutogradXPU, AutogradHPU, AutogradLazy, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:140 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:488 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:291 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback]
AutogradOther: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:35 [backend fallback]
AutogradCPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:39 [backend fallback]
AutogradCUDA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:47 [backend fallback]
AutogradXLA: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:51 [backend fallback]
AutogradMPS: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:59 [backend fallback]
AutogradXPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:43 [backend fallback]
AutogradHPU: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:68 [backend fallback]
AutogradLazy: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:55 [backend fallback]
Tracer: registered at ../torch/csrc/autograd/TraceTypeManual.cpp:296 [backend fallback]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:482 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:324 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:743 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1064 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:189 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:484 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]

steps:   0%|                                                                                    | 0/520 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/stable/anaconda3/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "/home/stable/anaconda3/lib/python3.10/site-packages/accelerate/commands/launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/home/stable/anaconda3/bin/python', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/yazi', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,704', '--network_module=networks.lora', '--max_train_epochs=20', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=ba_yazi_V10', '--train_batch_size=3', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam', '--noise_offset', '0']' returned non-zero exit status 1.

大佬们帮忙看下是什么情况呢

@jzjbyq
Copy link

jzjbyq commented May 8, 2023

ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/anyio/streams/memory.py", line 94, in receive
return self.receive_nowait()
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/anyio/streams/memory.py", line 89, in receive_nowait
raise WouldBlock
anyio.WouldBlock

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 78, in call_next
message = await recv_stream.receive()
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/anyio/streams/memory.py", line 114, in receive
raise EndOfStream
anyio.EndOfStream

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi
result = await app( # type: ignore[func-returns-value]
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call
return await self.app(scope, receive, send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/applications.py", line 276, in call
await super().call(scope, receive, send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/applications.py", line 122, in call
await self.middleware_stack(scope, receive, send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in call
raise exc
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in call
await self.app(scope, receive, _send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 108, in call
response = await self.dispatch_func(request, call_next)
File "/media/zhi/sd/Ai-test/new-lora-scripts/lora-scripts/gui.py", line 123, in add_cache_control_header
response = await call_next(request)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 84, in call_next
raise app_exc
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/base.py", line 70, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in call
raise exc
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in call
await self.app(scope, receive, sender)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in call
raise e
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in call
await self.app(scope, receive, send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/routing.py", line 718, in call
await route.handle(scope, receive, send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
response = await func(request)
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
raw_response = await run_endpoint_function(
File "/home/zhi/miniconda3/envs/loratrain/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
return await dependant.call(**values)
File "/media/zhi/sd/Ai-test/new-lora-scripts/lora-scripts/gui.py", line 117, in create_toml_file
f.write(toml.dumps(j))
AttributeError: module 'toml' has no attribute 'dumps'

我的点训练直接报这个错

@HardySimpson
Copy link

pip install albumentations toml accelerate einops voluptuous -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants