Skip to content

ValueError in maisi_train_controlnet_tutorial.ipynb #1838

@KumoLiu

Description

@KumoLiu
INFO:notebook:Inference...
2024-09-23 06:17:47,215 - INFO - 'dst' model updated: 158 of 206 variables.

INFO:maisi.controlnet.infer:Number of GPUs: 2
INFO:maisi.controlnet.infer:World_size: 1
WARNING:py.warnings:unclosed file <_io.TextIOWrapper name='./temp_work_dir_controlnet_train_demo/environment_maisi_controlnet_train.json' mode='r' encoding='UTF-8'>

WARNING:py.warnings:unclosed file <_io.TextIOWrapper name='./temp_work_dir_controlnet_train_demo/config_maisi.json' mode='r' encoding='UTF-8'>

WARNING:py.warnings:unclosed file <_io.TextIOWrapper name='./temp_work_dir_controlnet_train_demo/config_maisi_controlnet_train.json' mode='r' encoding='UTF-8'>

INFO:maisi.controlnet.infer:trained autoencoder model is not loaded.
INFO:maisi.controlnet.infer:trained diffusion model is not loaded.
INFO:maisi.controlnet.infer:set scale_factor -> 1.0.
INFO:maisi.controlnet.infer:trained controlnet is not loaded.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/Code/tutorials/generation/maisi/scripts/infer_controlnet.py", line 207, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/Code/tutorials/generation/maisi/scripts/infer_controlnet.py", line 159, in main
    check_input(None, None, None, output_size, out_spacing, None)
  File "/workspace/Code/tutorials/generation/maisi/scripts/sample.py", line 378, in check_input
    raise ValueError(
ValueError: The output_size[0] have to be chosen from [256, 384, 512], and output_size[2] have to be chosen from [128, 256, 384, 512, 640, 768], yet got (128, 128, 128).
E0923 06:17:50.348000 140369402987136 torch/distributed/elastic/multiprocessing/api.py:863] failed (exitcode: 1) local_rank: 0 (pid: 209184) of binary: [/usr/bin/python](https://vscode-remote+attached-002dcontainer-002b7b22636f6e7461696e65724e616d65223a222f74657374227d-0040ssh-002dremote-002b7b22686f73744e616d65223a2231302e31392e3138332e313930222c2275736572223a2279756e6c6975227d.vscode-resource.vscode-cdn.net/usr/bin/python)
Traceback (most recent call last):
  File "/usr/local/bin/torchrun", line 33, in <module>
    sys.exit(load_entry_point('torch', 'console_scripts', 'torchrun')())
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 919, in main
    run(args)
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/run.py", line 910, in run
    elastic_launch(
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 138, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/usr/local/lib/python3.10/dist-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
scripts.infer_controlnet FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-09-23_06:17:50
  host      : yunliu-MS-7D31
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 209184)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions