Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a namespaced version of jemalloc #113

Closed
xhochy opened this issue Sep 29, 2021 · 4 comments · Fixed by #147
Closed

Use a namespaced version of jemalloc #113

xhochy opened this issue Sep 29, 2021 · 4 comments · Fixed by #147

Comments

@xhochy
Copy link
Member

xhochy commented Sep 29, 2021

We are currently observing issues when using quantcore.matrix in conjunction with onnx and onnxruntime on MacOS. The call to python -c 'import onnx; import quantcore.matrix.ext.dense; import onnxruntime' fails with a bus error or segfault whereas the call DYLD_INSERT_LIBRARIES=$CONDA_PREFIX/lib/libjemalloc.dylib python -c 'import onnx; import quantcore.matrix.ext.dense; import onnxruntime' passes just fine. This indicates that using an unnamespaced jemalloc may be problematic here as the following traceback indicates:

collecting ... Process 6259 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x4efffffff7)
    frame #0: 0x000000013c0f3704 libjemalloc.2.dylib`je_free_default + 240
libjemalloc.2.dylib`je_free_default:
->  0x13c0f3704 <+240>: str    x20, [x8, w9, sxtw #3]
    0x13c0f3708 <+244>: ldr    w8, [x19, #0x200]
    0x13c0f370c <+248>: sub    w9, w8, #0x1              ; =0x1
    0x13c0f3710 <+252>: str    w9, [x19, #0x200]
Target 0: (python) stopped.
(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x4efffffff7)
  * frame #0: 0x000000013c0f3704 libjemalloc.2.dylib`je_free_default + 240
    frame #1: 0x0000000142745010 onnxruntime_pybind11_state.so`std::__1::__hash_table<std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, std::__1::__unordered_map_hasher<std::__1::type_index, std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, pybind11::detail::type_hash, pybind11::detail::type_equal_to, true>, std::__1::__unordered_map_equal<std::__1::type_index, std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, pybind11::detail::type_equal_to, pybind11::detail::type_hash, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > > > >::__rehash(unsigned long) + 76
    frame #2: 0x0000000142744dd0 onnxruntime_pybind11_state.so`std::__1::pair<std::__1::__hash_iterator<std::__1::__hash_node<std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, void*>*>, bool> std::__1::__hash_table<std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, std::__1::__unordered_map_hasher<std::__1::type_index, std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, pybind11::detail::type_hash, pybind11::detail::type_equal_to, true>, std::__1::__unordered_map_equal<std::__1::type_index, std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > >, pybind11::detail::type_equal_to, pybind11::detail::type_hash, true>, std::__1::allocator<std::__1::__hash_value_type<std::__1::type_index, std::__1::vector<bool (*)(_object*, void*&), std::__1::allocator<bool (*)(_object*, void*&)> > > > >::__emplace_unique_key_args<std::__1::type_index, std::__1::piecewise_construct_t const&, std::__1::tuple<std::__1::type_index const&>, std::__1::tuple<> >(std::__1::type_index const&, std::__1::piecewise_construct_t const&, std::__1::tuple<std::__1::type_index const&>&&, std::__1::tuple<>&&) + 480
    frame #3: 0x00000001427427dc onnxruntime_pybind11_state.so`pybind11::detail::generic_type::initialize(pybind11::detail::type_record const&) + 396
    frame #4: 0x0000000142751688 onnxruntime_pybind11_state.so`pybind11::class_<onnxruntime::ExecutionOrder>::class_<>(pybind11::handle, char const*) + 140
    frame #5: 0x00000001427513f8 onnxruntime_pybind11_state.so`pybind11::enum_<onnxruntime::ExecutionOrder>::enum_<>(pybind11::handle const&, char const*) + 52
    frame #6: 0x000000014272b5c8 onnxruntime_pybind11_state.so`onnxruntime::python::addObjectMethods(pybind11::module_&, onnxruntime::Environment&) + 296
    frame #7: 0x0000000142734e68 onnxruntime_pybind11_state.so`PyInit_onnxruntime_pybind11_state + 340
    frame #8: 0x000000010019f994 python`_imp_create_dynamic + 2412
    frame #9: 0x00000001000b40f8 python`cfunction_vectorcall_FASTCALL + 208
    frame #10: 0x000000010016bfd8 python`_PyEval_EvalFrameDefault + 30088

My suggestion would be to add an output to the jemalloc-feedstock as described in conda-forge/jemalloc-feedstock#23 that comes with a prefixed version of the library.

@MarcAntoineSchmidtQC
Copy link
Member

I'm really out of my depth with this and I'm not sure @tbenthompson has the time to work on this. Is there someone that could take care of this from the engineering team @xhochy?

@xhochy
Copy link
Member Author

xhochy commented Sep 29, 2021

Maybe @cbourjau can do this ;)

@jtilly
Copy link
Member

jtilly commented Sep 29, 2021

I'm a bit confused why this would make a difference. It is my understanding that on MacOS, jemalloc is built with --with-jemalloc-prefix="je_" by default. And it's on MacOS, where we're running into the error. Why would building the conda package with this prefix change this? Isn't jemalloc already namespaced on MacOS?

@xhochy
Copy link
Member Author

xhochy commented Oct 14, 2021

The issue here is that loading a default jemalloc build also overloads C++'s new and delete operators. In the above trace, we see a C++ deallocation that was previously done with the system allocator (see also jemalloc/jemalloc#1701) . We can workaround this issue by using a namespaced version of jemalloc that also doesn't override the C++ allocator by default. I'm adding such a jemalloc build as part of conda-forge/jemalloc-feedstock#25 with the package name jemalloc-local.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants