Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal runtime error: failed to initiate panic, error 5 #766

Closed
cswinter opened this issue Feb 13, 2020 · 11 comments
Closed

fatal runtime error: failed to initiate panic, error 5 #766

cswinter opened this issue Feb 13, 2020 · 11 comments

Comments

@cswinter
Copy link

🐛 Bug Reports

Pip packages generated with PyO3 and maturin fail to generate full backtraces on panic which makes it extremely difficult to debug crashes in published packages.
Only the initial stack frame in the backtrace displays a file and line number, file and line number are missing on all other frames.
Notably, this error is printed out as well which seems to indicate a serious issue with the panic handler: fatal runtime error: failed to initiate panic, error 5
This seems vaguely related: rust-lang/rust#35599

Under some conditions, the backtrace also displays <unknown> instead of any symbols but I don't have a minimal reproduction for this and it might well be caused by a different issue.

See https://github.com/cswinter/pyo3-backtrace-repro for a minimal reproduction.
Example output:

$ maturin build --rustc-extra-args="-C debuginfo=2"
...
$ pip uninstall pyo3-backtrace-repro
...
$ pip install target/wheels/pyo3_backtrace_repro-0.1.0-cp38-cp38-macosx_10_7_x86_64.whl
...
$ RUST_BACKTRACE=1 python
Python 3.8.0 (default, Nov  6 2019, 15:49:01)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> RUST_BACKTRACE=1 python
>>> from pyo3_backtrace_repro import oh_noes
>>> oh_noes()
thread '<unnamed>' panicked at 'panic', src/lib.rs:10:5
stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: std::panicking::rust_panic_with_hook
   6: std::panicking::begin_panic
   7: pyo3_backtrace_repro::foo
   8: pyo3_backtrace_repro::oh_noes
   9: pyo3_backtrace_repro::__pyo3_get_function_oh_noes::__wrap
  10: cfunction_call_varargs
  11: _PyObject_MakeTpCall
  12: call_function
  13: _PyEval_EvalFrameDefault
  14: _PyEval_EvalCodeWithName
  15: PyRun_InteractiveOneObjectEx
  16: PyRun_InteractiveLoopFlags
  17: PyRun_AnyFileExFlags
  18: Py_RunMain
  19: pymain_main
  20: main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
fatal runtime error: failed to initiate panic, error 5
[1]    80860 abort      RUST_BACKTRACE=1 python

🌍 Environment

OS: MacOS 10.14.6. Also observed under Ubuntu.
PyO3 version: 0.9.0-alpha.1, 0.8.3
maturin version: 0.7.7
Python: Python 3.8.0 via conda. Also observed under virtualenv.

@cswinter
Copy link
Author

If anyone knows what might cause this or how to debug or find a workaround I would love to know! This bug makes it impossible for me to use PyO3/maturin for any serious project, which is a shame since I've found PyO3/maturin it to be incredibly useful in all other respects.

@davidhewitt
Copy link
Member

davidhewitt commented Feb 16, 2020

@cswinter many thanks for the bug report and the repro. I plan take a look at this eventually but can't make any promises when exactly.

One possibility is that it is because the panic propagates through C / Python ffi which is technically undefined behavior. See #492 .

@cswinter
Copy link
Author

@davidhewitt Thanks for your response, this makes a lot more sense to me now! Inserting catch_unwind should be a workable solution, I've tested it out on my repro and it does eliminate the runtime error.

(Curiously, when I ran my repro under WSL the files and line numbers always display even without the catch_unwind despite the runtime error.)

@programmerjake
Copy link
Contributor

(Curiously, when I ran my repro under WSL the files and line numbers always display even without the catch_unwind despite the runtime error.)

That's because Rust currently uses the same unwind mechanism as C/C++ on x86_64 Linux. It's still undefined behavior because Rust doesn't define that it's supposed to work, it just happens to work anyway.

@davidhewitt
Copy link
Member

Thanks, good to know that addressing #492 will fix this!

@cswinter
Copy link
Author

cswinter commented Feb 19, 2020

Unfortunately, while catch_unwind prevents the runtime error, it does not actually fix the missing line numbers (at least on Mac). I have been debugging the problem further, and managed to narrow down the issue by reproducing the same failure with a minimal cffi setup that uses neither PyO3 or maturin. The root of the problem seems to be that (at least on mac) the target/debug library does not contain a binary with symbols. Cargo produces the following build artifacts:

target
└── debug
    ├── build
    ├── deps
    │   ├── libpython_cffi.dylib
    │   ├── libpython_cffi.dylib.dSYM
    │   │   └── Contents
    │   │       └── Resources
    │   │           └── DWARF
    │   │               └── libpython_cffi.dylib
    │   └── python_cffi.d
    ├── examples
    ├── incremental
    │   ...
    ├── libpython_cffi.d
    └── libpython_cffi.dylib

When loading target/debug/libpython_cffi.dylib, backtraces will fail to display properly.
However, backtraces work properly when using target/debug/deps/libpython_cffi.dylib. Specifically, the libpython_cffi.dylib.dSYM/Contents/Resources/DWARF/libpython_cffi.dylib directory structure and file need to be present.

I am still working on proving this out further by constructing a pip wheel with the symbol file included, so far it's still crashing on pip install (I think I managed to properly adjust the RECORD but I must be missing something else). Provided people agree that including those files in the wheel is the correct solution I could use some help with making the necessary changes to maturin.

I suspect we are failing to include similar symbol files under Linux but this remains to be tested.

@cswinter
Copy link
Author

I figured out how to create the pip wheel and after installation the site-packages seem to have the right structure, but backtraces are still broken so there must be another missing piece.

@cswinter
Copy link
Author

The example I have where backtraces work loads the dll with cffi:

from cffi import FFI
ffi = FFI()
lib = ffi.dlopen(...)

How does the equivalent code in PyO3/maturin work?

@cswinter
Copy link
Author

Not sure what I was doing wrong before, but after copying the dlyb and DWARF folder into the site-packages directory I have backtraces working correctly for PyO3/maturin under Mac. Will now try to reproduce the issue I saw under Linux, which I'm now thinking might actually be a separate problem.

@cswinter
Copy link
Author

So turns out the original problem I ran into was actually caused by running out of file descriptors and is unrelated to PyO3. There are already separate issues to track the unwinding and missing DWARF files on Mac so this issue can be closed now.

@davidhewitt
Copy link
Member

Awesome. Thanks so much for all the detailed investigation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants