Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation faults with numpy 1.21 package #627

Open
bkpoon opened this issue Jun 23, 2021 · 17 comments
Open

Segmentation faults with numpy 1.21 package #627

bkpoon opened this issue Jun 23, 2021 · 17 comments

Comments

@bkpoon
Copy link
Member

bkpoon commented Jun 23, 2021

The recently released 1.21 version of numpy will cause segmentation faults with the cctbx-base conda package. This affects Python versions 3.7 through 3.9. Please use version 1.20 until this issue is resolved. Python 3.6 is using version 1.19.

Anthchirp added a commit to cctbx/dxtbx that referenced this issue Jun 23, 2021
@bkpoon bkpoon pinned this issue Jun 23, 2021
@Anthchirp
Copy link
Member

Anthchirp commented Jun 23, 2021

Can you add a numpy<1.21 constraint to the conda-forge package and yank the unconstrained release?

This currently is breaking builds all over the place.

Anthchirp added a commit to dials/dials that referenced this issue Jun 23, 2021
@bkpoon
Copy link
Member Author

bkpoon commented Jun 23, 2021

I can build a new build that adds the constraint. But we don't have to remove the old one. I'm trying to determine the underlying issue.

conda-forge/cctbx-base-feedstock#26

@ndevenish
Copy link
Contributor

Can always update it again once the problem is discovered/resolved

@bkpoon
Copy link
Member Author

bkpoon commented Jun 23, 2021

The new build that does not update to numpy 1.21 should be available. You may need to wait for the CDN to update for the package to be widely available.

dwpaley added a commit that referenced this issue Jun 25, 2021
Also unpin wxpython, no longer needed after 79108f4
@ndevenish
Copy link
Contributor

Do we know what the origin of this issue is, and is there any prospect for a fix?

@bkpoon
Copy link
Member Author

bkpoon commented Jul 7, 2021

We have seen segmentation faults with numpy before. It should be related to an initialization. Not everything that uses numpy causes a segmentation fault. I will narrow down which additional parts need an initialization.

russell-taylor pushed a commit to ReliaSolve/cctbx_project that referenced this issue Jul 13, 2021
Also unpin wxpython, no longer needed after 79108f4
@Anthchirp
Copy link
Member

Did we find out why this happened?

@bkpoon
Copy link
Member Author

bkpoon commented Sep 13, 2021

Have not had time to took further into this yet.

@dermen
Copy link
Contributor

dermen commented Oct 14, 2021

Just adding to the story, a simple segfault reproducer with a numpy 1.21 build is

from cctbx import crystal
crystal.symmetry("79,79,38,90,90,90", "P43212")

@dwpaley
Copy link
Contributor

dwpaley commented Nov 23, 2021

Just flagging this boost discussion: boostorg/python#376 which I assume is about the same issue.

Here's an excerpt from a stack trace when triggering the segfault via Derek's reproducer:

stack trace
Program received signal SIGSEGV, Segmentation fault.
PyDict_GetItemWithError () at /tmp/build/80754af9/python_1627392990942/work/Objects/dictobject.c:1371
1371	/tmp/build/80754af9/python_1627392990942/work/Objects/dictobject.c: No such file or directory.
(gdb) bt
#0  PyDict_GetItemWithError () at /tmp/build/80754af9/python_1627392990942/work/Objects/dictobject.c:1371
#1  0x00007f331e7f2cef in PyArray_GetCastingImpl ()
   from /dev/shm/dwpaley/test/conda_base/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#2  0x00007f331e7f31f8 in PyArray_GetCastSafety ()
   from /dev/shm/dwpaley/test/conda_base/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#3  0x00007f331e89284b in PyArray_EquivTypes.part.6 ()
   from /dev/shm/dwpaley/test/conda_base/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so
#4  0x00007f331eb9d472 in boost::python::numpy::equivalent (a=..., b=...) at /dev/shm/dwpaley/test/modules/boost/libs/python/src/numpy/dtype.cpp:125
#5  0x00007f331eb9dfed in boost::python::numpy::(anonymous namespace)::array_scalar_converter<int>::convertible (obj=0x7f3321983130)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/numpy/dtype.cpp:162
#6  0x00007f3320df97e8 in boost::python::converter::rvalue_from_python_stage1 (source=0x7f3321983130, converters=...)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/converter/from_python.cpp:54
#7  0x00007f3320f0d402 in boost::python::converter::arg_rvalue_from_python<int>::arg_rvalue_from_python (this=0x7ffd7f289e30, obj=0x7f3321983130)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/converter/arg_from_python.hpp:297
#8  0x00007f3320f0b47b in boost::python::arg_from_python<int>::arg_from_python (this=0x7ffd7f289e30, source=0x7f3321983130)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/arg_from_python.hpp:70
#9  0x00007f331a0f166c in boost::python::detail::caller_arity<2u>::impl<void (*)(_object*, int), boost::python::default_call_policies, boost::mpl::vector3<void, _object*, int> >::operator() (this=0x557576f5fef8, args_=0x7f33198337d0)
    at /dev/shm/dwpaley/test/modules/boost/boost/preprocessor/iteration/detail/local.hpp:37
#10 0x00007f331a0f087b in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<void (*)(_object*, int), boost::python::default_call_policies, boost::mpl::vector3<void, _object*, int> > >::operator() (this=0x557576f5fef0, args=0x7f33198337d0, kw=0x0)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/object/py_function.hpp:38
#11 0x00007f3320e083cb in boost::python::objects::py_function::operator() (this=0x557576f5ff20, args=0x7f33198337d0, kw=0x0)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/object/py_function.hpp:147
#12 0x00007f3320e06159 in boost::python::objects::function::call (this=0x557576f5ff10, args=0x7f33198337d0, keywords=0x0)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/object/function.cpp:221
#13 0x00007f3320e076df in boost::python::objects::(anonymous namespace)::bind_return::operator() (this=0x7ffd7f28a120)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/object/function.cpp:581
#14 0x00007f3320e080c4 in boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke (
    function_obj_ptr=...) at /dev/shm/dwpaley/test/modules/boost/boost/function/function_template.hpp:193
#15 0x00007f3320e1f76e in boost::function0<void>::operator() (this=0x7ffd7f28a0d0)
    at /dev/shm/dwpaley/test/modules/boost/boost/function/function_template.hpp:763
#16 0x00007f3320e1ef4c in boost::python::handle_exception_impl (f=...) at /dev/shm/dwpaley/test/modules/boost/libs/python/src/errors.cpp:25
#17 0x00007f3320e07d50 in boost::python::handle_exception<boost::python::objects::(anonymous namespace)::bind_return> (f=...)
    at /dev/shm/dwpaley/test/modules/boost/boost/python/errors.hpp:29
#18 0x00007f3320e077ba in boost::python::objects::function_call (func=0x557576f5ff10, args=0x7f33198337d0, kw=0x0)
    at /dev/shm/dwpaley/test/modules/boost/libs/python/src/object/function.cpp:622
#19 0x0000557574a1c13f in _PyObject_FastCallDict () at /tmp/build/80754af9/python_1627392990942/work/Objects/call.c:125
#20 0x0000557574a31041 in _PyObject_Call_Prepend (kwargs=0x0, args=0x7f3328d12790, obj=<optimized out>, callable=0x557576f5ff10)
    at /tmp/build/80754af9/python_1627392990942/work/Objects/call.c:906

@bkpoon
Copy link
Member Author

bkpoon commented Nov 23, 2021

Great! I was not able to reproduce Derek's crash, but I was able to find another simple way of causing the segfault. There is also an earlier discussion here.

epics-base/pvaPy#63

@dwpaley
Copy link
Contributor

dwpaley commented Nov 23, 2021

For me, using Derek's reproducer, I can avoid crashing with a couple different changes in sgtbx/boost_python/symbols.cpp:

As written now, we have:

struct space_group_symbols_wrappers
{
  typedef space_group_symbols w_t;

  static void
  wrap()
  {
    using namespace boost::python;
    typedef return_value_policy<copy_const_reference> ccr;
    class_<w_t>("space_group_symbols", no_init)
      .def(init<std::string const&, optional<std::string const&> >((
        arg("symbol"),
        arg("table_id")="")))
      .def(init<int, optional<std::string const&, std::string const&> >((
        arg("space_group_number"),
        arg("extension")="",
        arg("table_id")="")))
      .def("number", &w_t::number)
[...];
}};

If I comment out the second constructor, or if I remove optional<...> and the default args from the second constructor so that it looks like this:

class_<w_t>("space_group_symbols", no_init)
  .def(init<std::string const&, optional<std::string const&> >((
    arg("symbol"),
    arg("table_id")="")))
  .def(init<int, std::string const&, std::string const& >((
    arg("space_group_number"),
    arg("extension"),
    arg("table_id"))))

then no crash. So clearly it's something about the overload resolution for the space_group_symbols class as kinda suggested by the boost issue I linked before.

Pretty weird that numpy would have anything to do with it! I'm also curious how widespread this is: both the pattern of mixing overloaded constructors with boost optional arguments, and whether they all cause segfaults now.

@dwpaley
Copy link
Contributor

dwpaley commented Nov 30, 2021

I added a comment on the Boost issue I mentioned above (boostorg/python#376) but not sure if it gets us any closer to a fix. The issue started with changes to numpy type casting here: numpy/numpy#17401

@dwpaley
Copy link
Contributor

dwpaley commented Dec 2, 2021

It appears to be a numpy bug and I describe a possible fix here: boostorg/python#376 The problem involved dereferencing a null pointer when checking convertibility of types (like boost_python ones) that haven't implemented the new numpy casting implementation.

I'll open a numpy PR which I assume will take a while to get into a release. It's possible to build a custom numpy from sources and we can discuss if necessary, but it seems like our stuff is stable for now with the pin to 1.20...

@dwpaley
Copy link
Contributor

dwpaley commented Dec 21, 2021

This appears to be fixed as of numpy 1.21.5, which is now on conda-forge :)

@bkpoon
Copy link
Member Author

bkpoon commented Dec 21, 2021

Yeah, I've been following the discussion. But it looks like there should still be an update to Boost.Python. Thanks for getting the ball rolling!

I should be able to add Python 3.10 checks to Azure Pipelines now that there is a way forward.

And I can remove the numpy version limit in the conda package for the next release.

dwpaley added a commit that referenced this issue Jan 4, 2022
This reflects the resolution of #627 as discussed in several other issues
and PRs:

- boostorg/python#376
- numpy/numpy#20507
- numpy/numpy#20616

Leaving this "bibliography" here because the fix in numpy PR 20616 is
considered temporary; thus someday we may have to revisit this to fix the
underlying bug in boost::python.

Co-authored-by: Billy Poon <bkpoon@lbl.gov>
@bkpoon
Copy link
Member Author

bkpoon commented Jan 4, 2022

Unpinning this since there is a fix in numpy 1.21.5 and later. The nightly package builds should find any future incompatibilities (conda-forge pinnings during build and latest packages in the tests).

@bkpoon bkpoon unpinned this issue Jan 4, 2022
dwpaley added a commit that referenced this issue Jan 5, 2022
This reflects the resolution of #627 as discussed in several other issues
and PRs:

- boostorg/python#376
- numpy/numpy#20507
- numpy/numpy#20616

Leaving this "bibliography" here because the fix in numpy PR 20616 is
considered temporary; thus someday we may have to revisit this to fix the
underlying bug in boost::python.

Co-authored-by: Billy Poon <bkpoon@lbl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants