-
Notifications
You must be signed in to change notification settings - Fork 775
Closed
Description
INFO:notebook:Inference...
2024-09-23 06:17:47,215 - INFO - 'dst' model updated: 158 of 206 variables.
INFO:maisi.controlnet.infer:Number of GPUs: 2
INFO:maisi.controlnet.infer:World_size: 1
WARNING:py.warnings:unclosed file <_io.TextIOWrapper name='./temp_work_dir_controlnet_train_demo/environment_maisi_controlnet_train.json' mode='r' encoding='UTF-8'>
WARNING:py.warnings:unclosed file <_io.TextIOWrapper name='./temp_work_dir_controlnet_train_demo/config_maisi.json' mode='r' encoding='UTF-8'>
WARNING:py.warnings:unclosed file <_io.TextIOWrapper name='./temp_work_dir_controlnet_train_demo/config_maisi_controlnet_train.json' mode='r' encoding='UTF-8'>
INFO:maisi.controlnet.infer:trained autoencoder model is not loaded.
INFO:maisi.controlnet.infer:trained diffusion model is not loaded.
INFO:maisi.controlnet.infer:set scale_factor -> 1.0.
INFO:maisi.controlnet.infer:trained controlnet is not loaded.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/Code/tutorials/generation/maisi/scripts/infer_controlnet.py", line 207, in <module>
main()
File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/workspace/Code/tutorials/generation/maisi/scripts/infer_controlnet.py", line 159, in main
check_input(None, None, None, output_size, out_spacing, None)
File "/workspace/Code/tutorials/generation/maisi/scripts/sample.py", line 378, in check_input
raise ValueError(
ValueError: The output_size[0] have to be chosen from [256, 384, 512], and output_size[2] have to be chosen from [128, 256, 384, 512, 640, 768], yet got (128, 128, 128).
E0923 06:17:50.348000 140369402987136 torch/distributed/elastic/multiprocessing/api.py:863] failed (exitcode: 1) local_rank: 0 (pid: 209184) of binary: [/usr/bin/python](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74657374227d-0040ssh-002dremote-002b7b22686f73744e616d65223a2231302e31392e3138332e313930222c2275736572223a2279756e6c6975227d.vscode-resource.vscode-cdn.net/usr/bin/python)
Traceback (most recent call last):
File "/usr/local/bin/torchrun", line 33, in <module>
sys.exit(load_entry_point('torch', 'console_scripts', 'torchrun')())
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
run(args)
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
elastic_launch(
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
scripts.infer_controlnet FAILED
------------------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2024-09-23_06:17:50
host : yunliu-MS-7D31
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 209184)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Metadata
Metadata
Assignees
Labels
No labels