Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nnUNet/PyTorch] PyTorch Libary Import Error with most recent release #1113

Open
tjhendrickson opened this issue Apr 18, 2022 · 4 comments
Open
Labels
bug Something isn't working

Comments

@tjhendrickson
Copy link

Related to nnUNet/PyTorch(s)
(e.g. GNMT/PyTorch or FasterTransformer/All)

Describe the bug

Within Docker container, typing python main.py --help produces a traceback error.

Traceback (most recent call last):
  File "main.py", line 19, in <module>
    from pytorch_lightning import Trainer, seed_everything
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/__init__.py", line 20, in <module>
    from pytorch_lightning import metrics  # noqa: E402
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/__init__.py", line 15, in <module>
    from pytorch_lightning.metrics.classification import (  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/__init__.py", line 14, in <module>
    from pytorch_lightning.metrics.classification.accuracy import Accuracy  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/classification/accuracy.py", line 18, in <module>
    from pytorch_lightning.metrics.utils import deprecated_metrics
  File "/opt/conda/lib/python3.8/site-packages/pytorch_lightning/metrics/utils.py", line 22, in <module>
    from torchmetrics.utilities.data import get_num_classes as _get_num_classes
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/opt/conda/lib/python3.8/site-packages/torchmetrics/utilities/data.py)

To Reproduce
Steps to reproduce the behavior:

  1. Create Docker image by following quick start guide on nnUNet for PyTorch
  2. "Shell" into container with sudo docker run -it nnunet:latest /bin/bash
  3. Execute main.py python main.py --help
@tjhendrickson tjhendrickson added the bug Something isn't working label Apr 18, 2022
@tjhendrickson
Copy link
Author

Downgrading torchmetrics to v0.6.0 seems to resolve the issue.

@tjhendrickson
Copy link
Author

Unfortunately after modifying the torchmetrics version I am now running into a different traceback error:

  File "main.py", line 34, in <module>
    set_affinity(int(os.getenv("LOCAL_RANK", "0")), args.gpus, mode=args.affinity)
  File "/workspace/nnunet_pyt/utils/gpu_affinity.py", line 376, in set_affinity
    set_socket_unique_affinity(gpu_id, nproc_per_node, cores, "contiguous", balanced)
  File "/workspace/nnunet_pyt/utils/gpu_affinity.py", line 263, in set_socket_unique_affinity
    os.sched_setaffinity(0, ungrouped_affinities[gpu_id])
OSError: [Errno 22] Invalid argument

This error seems to persist no matter what text I enter following the --affinity flag

@michal2409
Copy link
Contributor

Have you tried running with --affinity disabled or commenting the L32-33 in the main.py? (https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Segmentation/nnUNet/main.py#L32).

Another fix for torchmetrics is to upgrade pytorch lightning to 1.5.10 (there are issues with 1.6.0 at the moment)

1 similar comment
@michal2409
Copy link
Contributor

Have you tried running with --affinity disabled or commenting the L32-33 in the main.py? (https://github.com/NVIDIA/DeepLearningExamples/blob/master/PyTorch/Segmentation/nnUNet/main.py#L32).

Another fix for torchmetrics is to upgrade pytorch lightning to 1.5.10 (there are issues with 1.6.0 at the moment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants