Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Improved FPE monitoring (#2157)
Overall the goal is to not fail a job when an FPE occurs, but to mask that FPE type in the signal handler, take a stack trace, resume execution. The sequencer can then demask the type again for the next algorithm. Overall I implemented the resuming based on discussion with @stephenswat and only for x86_64 for now. It keeps stack traces, accumulates them across algorithms / events / threads, deduplicates stack traces, and can print a summary at the end, looking something like this: ``` ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.0.1, pluggy-1.0.0 rootdir: /home/pagessin/dev/acts, configfile: pytest.ini, testpaths: Examples/Python/tests plugins: pytest_check-1.0.4, rerunfailures-10.2, xdist-3.2.1 collected 227 items / 226 deselected / 1 selected Examples/Python/tests/test_fpe.py 18:10:42 Sequencer INFO Create Sequencer with -1 threads 18:10:42 Sequencer INFO Add Algorithm 'FpeMaker' 18:10:42 Sequencer INFO Processing events [0, 30) 18:10:42 Sequencer INFO Starting event loop with -1 threads 18:10:42 Sequencer INFO 0 context decorators 18:10:42 Sequencer INFO 1 sequence elements 18:10:42 Sequencer INFO 0 readers 18:10:42 Sequencer INFO 1 algorithms 18:10:42 Sequencer INFO 0 writers SIGACTION floating point divide by zero 18:10:42 Sequencer INFO finished event 0 SIGACTION floating point overflow 18:10:42 Sequencer INFO finished event 1 SIGACTION floating point invalid operation 18:10:42 Sequencer INFO finished event 2 SIGACTION floating point divide by zero SIGACTION floating point divide by zero SIGACTION floating point overflow18:10:42 Sequencer INFO finished event 15 18:10:42 Sequencer INFO finished event 7 SIGACTION floating point invalid operation 18:10:42 Sequencer INFO finished event 5 SIGACTION floating point divide by zero SIGACTION floating point overflow 18:10:42 Sequencer INFO finished event 18 SIGACTION floating point invalid operation SIGACTION floating point divide by zero 18:10:43 Sequencer INFO finished event 26 18:10:43 Sequencer INFO finished event 6 SIGACTION floating point overflow 18:10:43 Sequencer INFO finished event 16 18:10:43 Sequencer INFO finished event 22 SIGACTION floating point overflow SIGACTION floating point overflow18:10:43 Sequencer INFO finished event 28 SIGACTION floating point invalid operation 18:10:43 Sequencer INFO finished event 19 18:10:43 Sequencer INFO finished event 23 SIGACTION floating point invalid operation 18:10:43 Sequencer INFO finished event 8 SIGACTION floating point divide by zero 18:10:43 Sequencer INFO finished event 27 SIGACTION floating point invalid operation 18:10:43 Sequencer INFO finished event 17 SIGACTION floating point invalid operation SIGACTION floating point invalid operation 18:10:43 Sequencer INFO finished event 20 SIGACTION floating point divide by zero SIGACTION floating point divide by zero SIGACTION floating point divide by zero 18:10:43 Sequencer INFO finished event 9 18:10:43 Sequencer INFO finished event 24 SIGACTION floating point overflow SIGACTION floating point invalid operation SIGACTION floating point overflow SIGACTION floating point overflow SIGACTION floating point overflow 18:10:43 Sequencer INFO finished event 25 SIGACTION floating point invalid operation 18:10:43 Sequencer INFO finished event 3 SIGACTION floating point divide by zero 18:10:43 Sequencer INFO finished event 13 18:10:43 Sequencer INFO finished event 29 18:10:43 Sequencer INFO finished event 21 18:10:44 Sequencer INFO finished event 10 18:10:44 Sequencer INFO finished event 14 18:10:44 Sequencer INFO finished event 4 18:10:44 Sequencer INFO finished event 12 18:10:44 Sequencer INFO finished event 11 FPE result summary: - INTDIV: 0 - INTOVF: 0 - FLTDIV: 10 - FLTOVF: 10 - FLTUND: 0 - FLTRES: 0 - FLTINV: 10 - FLTSUB: 0 Stack traces: - FLTDIV: (10 times) 0# pybind11::cpp_function::initialize<pybind11_init_ActsPythonBindings(pybind11::module_&)::{lambda()#9}, void, , pybind11::name, pybind11::scope, pybind11::sibling>(pybind11_init_ActsPythonBindings(pybind11::module_&)::{lambda()#9}&&, void (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/pybind11.h:224 1# pybind11::cpp_function::dispatcher(_object*, _object*, _object*) at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/pybind11.h:929 2# PyCFunction_Call in /home/pagessin/dev/acts/bindvenv/bin/python3 3# _PyObject_MakeTpCall in /home/pagessin/dev/acts/bindvenv/bin/python3 4# _PyEval_EvalFrameDefault in /home/pagessin/dev/acts/bindvenv/bin/python3 5# _PyFunction_Vectorcall in /home/pagessin/dev/acts/bindvenv/bin/python3 6# 0x000000000050B23C in /home/pagessin/dev/acts/bindvenv/bin/python3 7# PyObject_CallObject in /home/pagessin/dev/acts/bindvenv/bin/python3 8# pybind11::object pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, ActsExamples::AlgorithmContext const&>(ActsExamples::AlgorithmContext const&) const at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/cast.h:1631 9# ActsExamples::IAlgorithm::internalExecute(ActsExamples::AlgorithmContext const&) at /home/pagessin/dev/acts/Examples/Framework/include/ActsExamples/Framework/IAlgorithm.hpp:51 10# ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const::{lambda(tbb::blocked_range<unsigned long> const&)#1}::operator()(tbb::blocked_range<unsigned long> const&) const at /home/pagessin/dev/acts/Examples/Framework/src/Framework/Sequencer.cpp:455 11# tbb::interface9::internal::start_for<tbb::blocked_range<unsigned long>, ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const::{lambda(tbb::blocked_range<unsigned long> const&)#1}, tbb::auto_partitioner const>::execute() at /usr/include/tbb/parallel_for.h:144 12# 0x00007F431EE37545 in /usr/lib/x86_64-linux-gnu/libtbb.so.2 13# 0x00007F431EE3780F in /usr/lib/x86_64-linux-gnu/libtbb.so.2 14# 0x00007F431EE34B68 in /usr/lib/x86_64-linux-gnu/libtbb.so.2 15# ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const at /home/pagessin/dev/acts/Examples/Framework/src/Framework/Sequencer.cpp:418 - FLTOVF: (10 times) 0# pybind11::cpp_function::initialize<pybind11_init_ActsPythonBindings(pybind11::module_&)::{lambda()#10}, void, , pybind11::name, pybind11::scope, pybind11::sibling>(pybind11_init_ActsPythonBindings(pybind11::module_&)::{lambda()#10}&&, void (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/pybind11.h:224 1# pybind11::cpp_function::dispatcher(_object*, _object*, _object*) at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/pybind11.h:929 2# PyCFunction_Call in /home/pagessin/dev/acts/bindvenv/bin/python3 3# _PyObject_MakeTpCall in /home/pagessin/dev/acts/bindvenv/bin/python3 4# _PyEval_EvalFrameDefault in /home/pagessin/dev/acts/bindvenv/bin/python3 5# _PyFunction_Vectorcall in /home/pagessin/dev/acts/bindvenv/bin/python3 6# 0x000000000050B23C in /home/pagessin/dev/acts/bindvenv/bin/python3 7# PyObject_CallObject in /home/pagessin/dev/acts/bindvenv/bin/python3 8# pybind11::object pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, ActsExamples::AlgorithmContext const&>(ActsExamples::AlgorithmContext const&) const at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/cast.h:1631 9# ActsExamples::IAlgorithm::internalExecute(ActsExamples::AlgorithmContext const&) at /home/pagessin/dev/acts/Examples/Framework/include/ActsExamples/Framework/IAlgorithm.hpp:51 10# ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const::{lambda(tbb::blocked_range<unsigned long> const&)#1}::operator()(tbb::blocked_range<unsigned long> const&) const at /home/pagessin/dev/acts/Examples/Framework/src/Framework/Sequencer.cpp:455 11# tbb::interface9::internal::start_for<tbb::blocked_range<unsigned long>, ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const::{lambda(tbb::blocked_range<unsigned long> const&)#1}, tbb::auto_partitioner const>::execute() at /usr/include/tbb/parallel_for.h:144 12# 0x00007F431EE37545 in /usr/lib/x86_64-linux-gnu/libtbb.so.2 13# 0x00007F431EE3780F in /usr/lib/x86_64-linux-gnu/libtbb.so.2 14# 0x00007F431EE34B68 in /usr/lib/x86_64-linux-gnu/libtbb.so.2 15# ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const at /home/pagessin/dev/acts/Examples/Framework/src/Framework/Sequencer.cpp:418 - FLTINV: (10 times) 0# pybind11::cpp_function::initialize<pybind11_init_ActsPythonBindings(pybind11::module_&)::{lambda()#11}, void, , pybind11::name, pybind11::scope, pybind11::sibling>(pybind11_init_ActsPythonBindings(pybind11::module_&)::{lambda()#11}&&, void (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/pybind11.h:224 1# pybind11::cpp_function::dispatcher(_object*, _object*, _object*) at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/pybind11.h:929 2# PyCFunction_Call in /home/pagessin/dev/acts/bindvenv/bin/python3 3# _PyObject_MakeTpCall in /home/pagessin/dev/acts/bindvenv/bin/python3 4# _PyEval_EvalFrameDefault in /home/pagessin/dev/acts/bindvenv/bin/python3 5# _PyFunction_Vectorcall in /home/pagessin/dev/acts/bindvenv/bin/python3 6# 0x000000000050B23C in /home/pagessin/dev/acts/bindvenv/bin/python3 7# PyObject_CallObject in /home/pagessin/dev/acts/bindvenv/bin/python3 8# pybind11::object pybind11::detail::object_api<pybind11::handle>::operator()<(pybind11::return_value_policy)1, ActsExamples::AlgorithmContext const&>(ActsExamples::AlgorithmContext const&) const at /home/pagessin/dev/acts/build/_deps/pybind11-src/include/pybind11/cast.h:1631 9# ActsExamples::IAlgorithm::internalExecute(ActsExamples::AlgorithmContext const&) at /home/pagessin/dev/acts/Examples/Framework/include/ActsExamples/Framework/IAlgorithm.hpp:51 10# ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const::{lambda(tbb::blocked_range<unsigned long> const&)#1}::operator()(tbb::blocked_range<unsigned long> const&) const at /home/pagessin/dev/acts/Examples/Framework/src/Framework/Sequencer.cpp:455 11# tbb::interface9::internal::start_for<tbb::blocked_range<unsigned long>, ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const::{lambda(tbb::blocked_range<unsigned long> const&)#1}, tbb::auto_partitioner const>::execute() at /usr/include/tbb/parallel_for.h:144 12# 0x00007F431EE37545 in /usr/lib/x86_64-linux-gnu/libtbb.so.2 13# 0x00007F431EE3780F in /usr/lib/x86_64-linux-gnu/libtbb.so.2 14# 0x00007F431EE34B68 in /usr/lib/x86_64-linux-gnu/libtbb.so.2 15# ActsExamples::Sequencer::run()::{lambda()#1}::operator()() const at /home/pagessin/dev/acts/Examples/Framework/src/Framework/Sequencer.cpp:418 18:10:44 Sequencer INFO Processed 30 events in 1.961989 s (wall clock) 18:10:44 Sequencer INFO Average time per event: 327.442344 ms/event . ----------------------------- Root file has checks ----------------------------- NOTE: Root file hash checks were skipped, enable with ROOT_HASH_CHECKS=on See https://acts.readthedocs.io/en/latest/examples/python_bindings.html#root-file-hash-regression-checks for more details ====================== 1 passed, 226 deselected in 3.49s ======================= ``` Currently, this doesn't fail the job, and the plan is to implement a masking mechanism based on the top level stack frame source file and line, as well as summation by algorithm / reader / writer, rather than just one global one. Co-authored-by: Andreas Stefl <487211+andiwand@users.noreply.github.com>
- Loading branch information