Stale-data crash family: per-slot token table + DL-range registry (re-land)#179
Merged
Conversation
Closes the Linux/glibc cross-scene stale-data crash family documented in
docs/bugs/linux_stale_scene_data_family_2026-05-11.md. The previous
session shipped a defensive NULL-file_head guard in efManagerMakeEffect
and documented four more variants for follow-up; this commit lands the
structural fix that eliminates the class at its root, validated by a
clean 15-min autonomous attract-loop run to SSB64_MAX_FRAMES=54000 with
zero crashes (previously crashed at ~90s / ~5s / ~9 min depending on
demo permutation).
The structural fix — per-slot RelocPointerTable (port/resource/
RelocPointerTable.cpp + .h):
Previously the token table maintained a single global generation that
incremented on every lbRelocInitSetup() call (every scene boundary).
All previously-minted tokens fail decode after that bump, even tokens
for intern-buffer files (mainmotion, submotion, model, special1-4,
shieldpose) whose backing memory persists across scenes. Downstream
PORT_RESOLVE returned NULL; downstream consumers (gcSetupCustomDObjs,
ftMainSetStatus joint-init, gcAddMObjForDObj) didn't always NULL-check
the result and SIGSEGV'd reading parent->child or dobjdesc->id.
New model: each table slot owns its own 12-bit generation. Tokens
carry the slot's gen at registration. Resolve checks per-slot:
slots[idx].gen == token_gen. Invalidation is range-based via the new
portRelocInvalidateRange(base, size) — only slots whose pointer falls
in the recycled range are NULL'd and their generation bumped (with
free-list reuse). Tokens for intern-buffer files don't intersect the
scene-arena range, so they stay valid forever (until the file
unloads). Tokens for arena-allocated data go stale exactly when their
backing memory recycles. lbRelocInitSetup no longer calls the
wholesale portRelocResetPointerTable; the range-based path in
port_taskman_evict_arena_caches handles it surgically.
DL-range registry (port/port_dl_ranges.{cpp,h} — new):
Defensive infrastructure tracking valid display-list memory ranges
(scene arena + reloc files). Hooked into libultraship's GFX walker
via the new game-agnostic callback API in fast/interpreter.h
(RegisterDLBoundsCheck, RegisterAddressClassifier). gfx_step and the
G_DL handler bounds-check `cmd` before deref/push; if a walker has
stepped past the end of a registered range (variant 5 — runaway DL
without a gsSPEndDisplayList terminator), the entire walk is torn
down via g_exec_stack.stop() instead of dereferencing into an
unmapped page. The classifier also feeds the SIGSEGV diag dump
(RelocPointerTable diagnostics + recent DL pushes + segment writes)
so any future stale-pointer crash that escapes the structural fix
prints actionable triage info.
Port wires this in via port_dl_ranges_init() called from PortInit
before any GFX activity. libultraship has zero compile-time symbol
dependencies on port_dl_* — all integration is via the callback API.
SIGSEGV diag dump (port_watchdog.cpp + libultraship/CrashHandler.cpp):
Both crash handlers now call Fast::DumpDLDiag() with siginfo->si_addr
before showing the dialog, so the recent-DL-pushes + segment-writes
ring buffers land in the spdlog log file even if the dialog hangs.
The diag was already in interpreter.cpp but gated on ASan; this
commit makes it always-on so it survives in shipping builds where
the heap-layout-dependent stale-pointer crashes actually manifest.
Submodule bumps:
- decomp: real game bug fix (gcDrawMObjForDObj G_ENDDL terminator),
sound consumer NULL-checks (gcSetupCustomDObjs, ftMainSetStatus),
file-scope gPortSceneHeap + arena range registration with the
DL-range registry. (See decomp commit message for details.)
- libultraship: callback API (Fast::RegisterDLBoundsCheck /
RegisterAddressClassifier / DumpDLDiag), universal correctness
fixes (gfx_pop_shader empty-check, mShaderStack per-frame reset),
always-on diag ring buffers. (See libultraship commit message.)
Validation:
- Clean SSB64_MAX_FRAMES=54000 exit (15 min game-time / 24 min real-
time autonomous attract loop), zero crashes through full
ControlDeck::ShutdownRaphnet → destruct fast3dwindow → destruct
ResourceManager shutdown.
- Previously documented variants:
* Variant 1 (Kirby Cutter / Pikachu Thunder Jolt / Shield effect):
addressed by the per-slot token table (effect_desc->file_head
tokens now resolve correctly across scene cycles for intern-
buffer files) + the consumer NULL-check at gcSetupCustomDObjs.
* Variant 2 (mnCharacters joint init, ftMainSetStatus): addressed
by the per-slot token table + the consumer NULL-check at
ftMainSetStatus.
* Variant 3 (stale MObjSub token warning at objanim.c:2869):
addressed by the per-slot token table.
* Variant 4 (segment-E DL pointer at gfx_step opcode fetch): the
new gfx_step bounds-check rejects unresolved N64-segment pointers
before deref.
* Variant 5 (DL walker running past arena end): addressed by the
gcDrawMObjForDObj G_ENDDL terminator + the gfx_step bounds-check
tearing down the walk on walked-past detection.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumps libultraship to d82208cf, which drops the over-aggressive (subAddr <= 0x0FFFFFFFu) variant-4 reject from gfx_dl_handler_common. That blanket reject was silently dropping every legitimate `gsSPDisplayList(0x012792c0)`-style segment-relative push in SSB64, hiding the result of the structural per-slot RelocPointerTable fix: no crashes because no 3D rendered. With the relaxed check (only the WALKED_PAST branch against the DL-range registry remains), 3D geometry renders normally again. Also adds hit/miss/zero counters to portRelocResolvePointerDebug with a periodic spdlog dump every 100K resolves. Used to confirm the per-slot table is actually serving valid resolves (300K hits, 0 misses across the test run) — kept in the build because the cost is two atomic increments per resolve and the diagnostic value is high when investigating future render regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JRickey
added a commit
that referenced
this pull request
May 15, 2026
Resolves PR #182 conflicts after main reverted #179 and merged #181 + #183 ahead of this branch: - decomp: cherry-picked PR #181's scene_curr fix (fddd2d3d5) onto stability-fixes' tip (4015e25e -> f9d608f11) so the merged tree has both the per-slot generational token + DObjDesc PortRefGfx work AND the C-Stick / D-Pad input gate fix. - port/bridge/framebuffer_capture.cpp: kept main's PR #183 MSAA fix (the variant 6 / container-smash work doesn't touch this). - port/port.cpp, port/port_dl_ranges.{cpp,h}, port/port_watchdog.cpp, port/resource/RelocPointerTable.{cpp,h}, port/bridge/lbreloc_bridge.cpp: restored stability-fixes' versions (re-applies the #179 infrastructure that variant 6a/6b builds on). - libultraship: kept stability-fixes' 7f673a5f (already 3 commits ahead of main's c16c03f0 with the GFX walker hooks). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-lands PR #172 (which was merged then reverted in
03ea608) rebased additively on top of currentmainso it does not regress PR #175 / PR #177 work.Effective changes vs current main
decomp: bump →b7d9291d(one commit ahead of main: adds NULL-checks/guards inftmain.c,objanim.c,objdisplay.c,taskman.c)libultraship: bump →6cd599a5(rebased ontoc16c03f0from PR Windows: kill startup console window + grab fullscreen focus on launch #175 — adds two commits: GFX walker hooks + relaxed bounds-check on top of the Win32 focus-dance commit)port_dl_ranges.cpp/h(DL-range registry),port_watchdog.cpp, overhauledRelocPointerTable.cpp/h(per-slot generation token table), extendedlbreloc_bridge.cpp, small additions toport.cppRebase notes
016fc74(the commit that got reverted). Replayed016fc74+6cf6959onto currentmainviagit rebase --onto.efcc1ad8,d82208cf) replayed ontoc16c03f0(PR Windows: kill startup console window + grab fullscreen focus on launch #175 focus-dance commit). Disjoint file sets — clean replay.port_dl_ranges.cpp/h,port_watchdog.cpp, orRelocPointerTable.cpp/h).lbreloc_bridge.cppauto-merged cleanly with main's21559a8stale-o2r-detect addition.Test plan
SSB64_MAX_FRAMES=54000clean on Linux/glibc (original verification target from PR Stale-data crash family: per-slot token table + DL-range registry #172)🤖 Generated with Claude Code