Skip to content

debug(ci): capture gdb backtrace on OverlapCheck segfault#29

Closed
olantwin wants to merge 4 commits into
mainfrom
debug/ci-backtrace
Closed

debug(ci): capture gdb backtrace on OverlapCheck segfault#29
olantwin wants to merge 4 commits into
mainfrom
debug/ci-backtrace

Conversation

@olantwin
Copy link
Copy Markdown
Contributor

@olantwin olantwin commented Jun 5, 2026

Diagnostic only — not for merge.

PR #26's OverlapCheck test segfaults on GHA ubuntu-latest after Copy 54 x=1300 y=3220 from libGeoModelXml's Gmx2Geo, but cannot be reproduced anywhere I can run gdb:

env result
local NixOS pixi env passes 25/25
Ubuntu 24.04 container (same host kernel, glibc 2.39 matching CI) passes 25/25
Same container + valgrind 0 errors, build_geometry completes
Same container + ulimit -s 2048 passes
GHA ubuntu-latest SIGSEGV

Three rounds of geomodel rebuilds via ship-conda-recipes (#15 bump, #16 -fvisibility-inlines-hidden, #17 --version-script={local: _ZNSt8__format*;}) have not helped — the rebuild that drops 4 → 0 weak _ZNSt8__format exports per geomodel lib still crashes at the identical line on CI.

This PR adds gdb to the pixi env and wraps the test step so that on failure, build_geometry is re-run under gdb --batch --ex run --ex 'thread apply all bt full'. The full backtrace lands in the action log — first frame should tell us whether the crash lives in Gmx2Geo, libstdc++ format machinery, SHiP's SHiPTimingDetInterface, or somewhere else.

Once we have the backtrace, this PR is closed and the diagnostic is reverted.

olantwin added 4 commits June 4, 2026 18:06
All dependencies are now available from the prefix.dev/ship channel
(via ship-conda-recipes), so CI no longer needs CVMFS mounts, the
CERN container, or a self-hosted runner. The new workflow runs on
ubuntu-latest with prefix-dev/setup-pixi and a single `pixi run test`
step that configures, builds, and runs ctest.
Picks up the geomodel rebuild from ShipSoft/ship-conda-recipes#15, which
restores ABI compatibility with the current conda-forge cxx-compiler
(libstdcxx >= 15). Confirmed locally that the full test suite, including
OverlapCheck, now passes without the std::format workaround from #27.
ShipSoft/ship-conda-recipes#17 added a linker version script hiding
std::__format weak template instantiations from libGeoModelXml/Write/
Read/DBManager exports. Local symbol audit now reports 0 weak
_ZNSt8__format symbols in those libraries (was 4 each on build 2),
and the full test suite — including OverlapCheck — passes in this
environment.
PR #26's OverlapCheck test segfaults on GHA ubuntu-latest after
"Copy 54 x=1300 y=3220" from libGeoModelXml's Gmx2Geo, but cannot
be reproduced locally (NixOS pixi env or Ubuntu 24.04 container
with matching glibc 2.39 — both pass 25/25; valgrind reports 0
errors). Three rounds of geomodel rebuilds in ship-conda-recipes
(build_number bump, -fvisibility-inlines-hidden, linker version
script) have not helped.

Add gdb to the pixi env and wrap the failing test step with a
diagnostic step that runs build_geometry under gdb --batch on
failure, printing the full backtrace into the action log. Once
we know the actual crash site, this commit gets reverted (or
gated behind a workflow_dispatch input) before merge.

Diagnostic commit — DO NOT MERGE TO main.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f4107aa-6745-4fc2-a056-3d51460a6f1d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch debug/ci-backtrace

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

olantwin added a commit to ShipSoft/ship-conda-recipes that referenced this pull request Jun 5, 2026
PR #17 added `-Wl,--version-script={local: _ZNSt8__format*;}` which
hid `_ZN…`-mangled function symbols. Local audit confirmed 0 weak
`_ZN` exports per geomodel lib. But CI debug PR (ShipSoft/Geometry#29)
captured a backtrace showing the crash is still in libGeoModelWrite:

  Program received signal SIGSEGV
  #0 std::__format::_Sink<char>::_M_write(…) from libGeoModelWrite.so.6
  #1 std::__format::_Formatting_scanner<…>::_M_on_chars(…) from libGeoModelWrite.so.6
  #2 std::__format::_Scanner<char>::_M_scan(…) at <format>:3942
  …
  #7 SHiPGeometry::CalorimeterFactory::buildStack(…) at CalorimeterFactory.cpp:184

`nm -D libGeoModelWrite.so.6 | grep __format` still shows 15 entries
(all `V` vague-linkage) for vtables (`_ZTV…`), typeinfo (`_ZTI…`), and
typeinfo names (`_ZTS…`) of _Sink<char>, _Buf_sink, _Fixedbuf_sink,
_Seq_sink<string>, _Scanner, _Formatting_scanner. Those are not matched
by `_ZNSt8__format*` (they start with `_ZTV`/`_ZTI`/`_ZTS`, not `_ZN`),
so build_geometry's `_Seq_sink<string>` constructor still binds its
vptr to libGeoModelWrite's vtable, and the virtual call lands in
libGeoModelWrite's compiled-with-different-headers `_M_write`.

Broaden the version script pattern to `*NSt8__format*` so vtables,
typeinfo, typeinfo names, and any future nested std::__format symbols
all get localized. Bump build.number to 4.
@olantwin olantwin closed this Jun 5, 2026
@olantwin olantwin deleted the debug/ci-backtrace branch June 5, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant