Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

MxNet triggered Segmentation Fault when using together with Ray or PyTorch #15997

Open
YutingZhang opened this issue Aug 24, 2019 · 8 comments
Open

Comments

@YutingZhang
Copy link
Contributor

Platform: Ubuntu 16.04
MxNet version: 1.6.0 from pip (mxnet-cu90mkl --pre)

When MxNet is imported with some other commonly used library, such as ray and pytorch, segmentation fault happened.

With Ray, ray version: 0.7.3 (from pip)

import ray


@ray.remote
def foo():
	import mxnet
	return 1


def main():
	ray.init()
	a = foo.remote()
	ray.get(a)


if __name__ == '__main__':
	main()

Error messages:

(pid=13872) Fatal Python error: Segmentation fault
(pid=13872)
(pid=13872) Stack (most recent call first):
(pid=13872)   File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
(pid=13872)   File "<frozen importlib._bootstrap_external>", line 922 in create_module
(pid=13872)   File "<frozen importlib._bootstrap>", line 571 in module_from_spec
(pid=13872)   File "<frozen importlib._bootstrap>", line 658 in _load_unlocked
(pid=13872)   File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
(pid=13872)   File "<frozen importlib._bootstrap>", line 971 in _find_and_load
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/torch/__init__.py", line 80 in <module>
(pid=13872)   File "<frozen importlib._bootstrap>", line 219 in _call_with_frames_removed
(pid=13872)   File "<frozen importlib._bootstrap_external>", line 678 in exec_module
(pid=13872)   File "<frozen importlib._bootstrap>", line 665 in _load_unlocked
(pid=13872)   File "<frozen importlib._bootstrap>", line 955 in _find_and_load_unlocked
(pid=13872)   File "<frozen importlib._bootstrap>", line 971 in _find_and_load
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/pyarrow_files/pyarrow/serialization.py", line 230 in register_torch_serialization_handlers
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 1191 in _initialize_serialization
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 256 in get_serialization_context
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 327 in store_and_register
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 418 in _try_store_and_register
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 398 in put_object
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 880 in _store_outputs_in_object_store
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 967 in _process_task
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 1039 in _wait_for_and_process_task
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 1088 in main_loop
(pid=13872)   File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/workers/default_worker.py", line 98 in <module>
  File "<stdin>", line 13, in main
  File "/home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/ray/worker.py", line 2247, in get
    raise value
ray.exceptions.RayWorkerError: The worker died unexpectedly while executing this task.

With PyTorch, torch version: 0.4.1.post2

import torch
import mxnet

Error messages:

Segmentation fault: 11

Stack trace:
  [bt] (0) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/mxnet/libmxnet.so(+0x341a7e0) [0x7f31f470a7e0]
  [bt] (1) /lib/x86_64-linux-gnu/libc.so.6(+0x354b0) [0x7f326afa54b0]
  [bt] (2) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so(+0x6601d) [0x7f319843901d]
  [bt] (3) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so(+0x71c4f) [0x7f3198444c4f]
  [bt] (4) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so(+0x56439) [0x7f3198429439]
  [bt] (5) /home/ubuntu/anaconda3/envs/mxnet_p36/lib/python3.6/site-packages/onnx/onnx_cpp2py_export.cpython-36m-x86_64-linux-gnu.so(PyInit_onnx_cpp2py_export+0x10f) [0x7f319842ce3f]
  [bt] (6) python(_PyImport_LoadDynamicModuleWithSpec+0x185) [0x560a0d8a6b85]
  [bt] (7) python(+0x215d85) [0x560a0d8a6d85]
  [bt] (8) python(PyCFunction_Call+0x131) [0x560a0d7a3ae1]
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Bug

@YutingZhang
Copy link
Contributor Author

@wkcn
Copy link
Member

wkcn commented Aug 25, 2019

It is related to the third-party memory allocation libraries, such as gperftools, tcmalloc, jemalloc. There is the same problem in other frameworks.

Similar Issue: #8569

@YutingZhang
Copy link
Contributor Author

@wkcn Thanks! Good to know that. My goal is to use Ray together with MxNet. It seems Ray works fine with PyTorch.

@larroy
Copy link
Contributor

larroy commented Aug 27, 2019

Can you please provide a stack trace with symbols?

@leleamol
Copy link
Contributor

leleamol commented Aug 27, 2019

@mxnet-label-bot add [Bug, Memory, Pending Requester Info]

@WilliamOnVoyage
Copy link

Wanna check if there is any update or guidance over how to solve these issues? Does it mean mxnet and pytorch not to be imported at the same time?

@szha
Copy link
Member

szha commented Jan 29, 2021

Seems to be a solved issue for master branch / 2.0:

▶ ipython
Python 3.7.9 (default, Nov 20 2020, 23:58:42)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.9.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import torch

In [2]: import mxnet

In [3]: torch.__version__
Out[3]: '1.7.1'

In [4]: mxnet.__version__
Out[4]: '2.0.0'

@szha szha added the v1.x Targeting v1.x branch label Jan 29, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants