Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug running docker #400

Closed
acamargosonosa opened this issue Nov 28, 2022 · 6 comments
Closed

Bug running docker #400

acamargosonosa opened this issue Nov 28, 2022 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@acamargosonosa
Copy link

Describe the bug
I am following the example of https://docs.monai.io/projects/monai-deploy-app-sdk/en/0.2.1/getting_started/tutorials/02_mednist_app.html

I was able to run everything but the part of the docker:
monai-deploy run mednist_app:latest input output

I am getting this error:

(monai) sonosa@sonosa-MS-7B17:~/2022/ProjectsAI/monai$ monai-deploy run mednist_app:latest input output_docker_gpu
/home/sonosa/anaconda3/envs/monai/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: /home/sonosa/anaconda3/envs/monai/lib/python3.7/site-packages/torchvision/image.so: undefined symbol: _ZNK3c1010TensorImpl36is_contiguous_nondefault_policy_implENS_12MemoryFormatE
warn(f"Failed to load image Python extension: {e}")
Checking dependencies...
--> Verifying if "docker" is installed...

--> Verifying if "mednist_app:latest" is available...

Checking for MAP "mednist_app:latest" locally
"mednist_app:latest" found.

Reading MONAI App Package manifest...
--> Verifying if "nvidia-docker" is installed...

/opt/conda/lib/python3.8/site-packages/scipy/init.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.5)
warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
Going to initiate execution of operator LoadPILOperator
Executing operator LoadPILOperator (Process ID: 1, Operator ID: a37656f0-65c6-46ca-8fdf-e8477bab45d2)
Done performing execution of operator LoadPILOperator

Going to initiate execution of operator MedNISTClassifierOperator
Executing operator MedNISTClassifierOperator (Process ID: 1, Operator ID: c9f285dd-3b8c-4ef9-b700-cd6ac49186a0)
/root/.local/lib/python3.8/site-packages/monai/utils/deprecate_utils.py:107: FutureWarning: <class 'monai.transforms.utility.array.AddChannel'>: Class AddChannel has been deprecated since version 0.8. please use MetaTensor data type and monai.transforms.EnsureChannelFirst instead.
warn_deprecated(obj, msg, warning_category)
/root/.local/lib/python3.8/site-packages/monai/utils/type_conversion.py:134: UserWarning: The given NumPy array is not writeable, and PyTorch does not support non-writeable tensors. This means you can write to the underlying (supposedly non-writeable) NumPy array using the tensor. You may want to copy the array to protect its data or make it writeable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /opt/pytorch/pytorch/torch/csrc/utils/tensor_numpy.cpp:175.)
tensor = torch.as_tensor(tensor, kwargs)
device found : cuda
terminate called after throwing an instance of 'c10::Error'
what(): isTuple()INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h":1397, please report a bug to PyTorch. Expected Tuple but got String
Exception raised from toTuple at /opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h:1397 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6c (0x7f27560b224c in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame Project-MONAI/MONAI#1: c10::detail::torchCheckFail(char const
, char const
, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xfa (0x7f275607da66 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame Project-MONAI/MONAI#2: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x53 (0x7f27560b0233 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame Project-MONAI/MONAI#3: + 0x4224e29 (0x7f27a23b4e29 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame Project-MONAI/MONAI#4: + 0x42253e9 (0x7f27a23b53e9 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame Project-MONAI/MONAI#5: torch::jit::SourceRange::highlight(std::ostream&) const + 0x48 (0x7f279f3d5c58 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame Project-MONAI/MONAI#6: torch::jit::ErrorReport::what() const + 0x2c3 (0x7f279f3baac3 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cpu.so)
frame Project-MONAI/MONAI#7: + 0x9ea44f (0x7f27a873344f in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame Project-MONAI/MONAI#8: + 0x9fa12d (0x7f27a874312d in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

frame Project-MONAI/MONAI#45: __libc_start_main + 0xf3 (0x7f27e722d083 in /usr/lib/x86_64-linux-gnu/libc.so.6)

ERROR: MONAI Application "mednist_app:latest" failed.

=================================================================

Please any help could be useful,
Just in case I tested the nvidia-docker and it is working very well

@wyli wyli transferred this issue from Project-MONAI/MONAI Nov 28, 2022
@dbericat
Copy link
Member

@zephyrie @vikashg @ericspod can you run and confirm this is reproducible or something with @acamargosonosa env?

It seems that MedNIST example is not updated to use the lates MONAI Core 1.x and metatensor.

Also, your NumPy version is higher than the expected by SciPy. (1.23.5 vs NumPy version >=1.16.5 and <1.23.0).

@ericspod
Copy link
Member

ericspod commented Nov 30, 2022

I was able to run the example without encountering this issue on Ubuntu 20.04, CUDA 11.4, Pytorch 1.13, MONAI 1.0.1. I setup my conda environment from scratch to run this test so it's possible your environment does have incompatible library versions in it.

However I encountered another issue with highdicom being a hard dependency in the library which I solved by adding @md.env(pip_packages=["monai"]) to the specification for the App class in the example script. The root cause is that SegmentDescription in dicom_seg_writer_operator.py relies on a member of highdicom for type annotation, if the library isn't present there is no member to use as a type. highdicom is used elsewhere in that file for critical operations so I don't think it's an optional dependency if classes can't operate without it.

CC @CPBridge @MMelQin

@ericspod ericspod added the bug Something isn't working label Nov 30, 2022
@acamargosonosa
Copy link
Author

Thanks a lot for the suggestions, I will test them and see if work for my case

@george-kuanli-peng
Copy link

I was able to run the example without encountering this issue on Ubuntu 20.04, CUDA 11.4, Pytorch 1.13, MONAI 1.0.1. I setup my conda environment from scratch to run this test so it's possible your environment does have incompatible library versions in it.

However I encountered another issue with highdicom being a hard dependency in the library which I solved by adding @md.env(pip_packages=["monai"]) to the specification for the App class in the example script. The root cause is that SegmentDescription in dicom_seg_writer_operator.py relies on a member of highdicom for type annotation, if the library isn't present there is no member to use as a type. highdicom is used elsewhere in that file for critical operations so I don't think it's an optional dependency if classes can't operate without it.

CC @CPBridge @MMelQin

I have the same issue of missing highhdicom which prevent me from importing monai.deploy. How could I workaround it?

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/monai/utils/module.py", line 199, in load_submodules
    mod = import_module(name)
  File "/usr/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/usr/local/lib/python3.8/dist-packages/monai/deploy/operators/__init__.py", line 38, in <module>
    from .dicom_seg_writer_operator import DICOMSegmentationWriterOperator
  File "/usr/local/lib/python3.8/dist-packages/monai/deploy/operators/dicom_seg_writer_operator.py", line 44, in <module>
    class SegmentDescription:
  File "/usr/local/lib/python3.8/dist-packages/monai/deploy/operators/dicom_seg_writer_operator.py", line 130, in SegmentDescription
    def to_segment_description(self, segment_number: int) -> hd.seg.SegmentDescription:
  File "/usr/local/lib/python3.8/dist-packages/monai/deploy/utils/importutil.py", line 262, in __getattr__
    raise self._exception
  File "/usr/local/lib/python3.8/dist-packages/monai/deploy/utils/importutil.py", line 221, in optional_import
    pkg = __import__(module)  # top level module
monai.deploy.utils.importutil.OptionalImportError: import highdicom (No module named 'highdicom').

@MMelQin MMelQin self-assigned this Jun 23, 2023
@laurencejackson
Copy link

@acamargosonosa, I just came up against this same error, the key part I think is what(): isTuple()INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/aten/src/ATen/core/ivalue_inl.h":1397, please report a bug to PyTorch. Expected Tuple but got String.

This suggested some issue with how pytorch was reading the torchscript file. If you enter your MAP container and check the pytorch version with pip show torch you might find like I did that the torch version you used to train and serialise the model is different from what the MAP has installed. In my case, I fixed it by using a newer base image e.g. monai-deploy package app.py -t myapp:latest -m my-model.pt -b nvcr.io/nvidia/pytorch:23.06-py3. Alternatively, I think if you set the torch version in the application's env decorator you will also end up with the right version installed.

@MMelQin
Copy link
Collaborator

MMelQin commented Jul 4, 2023

Thanks @laurencejackson for providing the resolutions!
PyTorch is indeed pre-installed and verified in the published base images, and the older version (yy:mm) images likely do not have the torch version compatible with the newer ones used to train the model.

@MMelQin MMelQin closed this as completed Sep 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants