Fighter intro climax freeze-frames (closes #9)#61
Merged
Conversation
…t model Phase 1+2+3 of the climax-freeze fix. Reproduces the brief whole-screen holds that the original game / LLE emulators (Mupen64+parallel-rdp, RetroArch+parallel-n64) show during fighter-intro climaxes — caused on real hw by RDP wall-clock variance overrunning the contexts_num=2 slot machinery in syTaskmanSwitchContext, blocking the game thread for ~1 VI. Phase 1 (port/stubs/n64_stubs.c): defer PORT_INTR_SP_TASK_DONE and DP_FULL_SYNC posts by N VI periods via a delay queue flushed in port_vi_simulate_vblank. Lets sSYSchedulerCurrentTaskGfx stay live through the deferral so GfxEnd's mq attaches to the in-flight task, matching real-hw release timing. Phase 2 (port_resume_service_threads): cap game thread (id=5) to N resumes per host frame. Without this, the 8-round resume loop lets a SwitchContext-blocked tic catch up within the same host frame, defeating the visible freeze. With cap=1 default, blocked tics carry over to next host frame. Phase 3 (port/gameloop.cpp): per-DL cost walker driven by Fast3D's gbi_trace_callback. Sums tris*75 + rect_px + load_bytes (F3DEX2 opcodes 0x05/0x06/0x07 for tris, 0xE4/0xF6 for rects, 0xF3/0xF4 for loads). Compares against per-VI cycle budget (default 400000) — only DLs above the budget get N=3 deferral, rest stay at N=1. Calibrated against attract-mode cost histogram (p99=394k, max=415k) so default catches ~0.1% of DLs = climax-only freezes. Frame-pacing fix: when SwitchContext fails and task_draw is skipped, DrawAndRunGraphicsCommands isn't called → no SwapBuffers → no VSync gate → host frame returns immediately. Without this fix the loop blasts at 2x speed during contention. Now we sleep+busy-wait one VI period on 0-submit frames, plus N-1 additional held frames where INTR_VRETRACE itself is skipped so game logic also pauses (no animation catch-up after the freeze). Tracing infra (env-gated, zero overhead when off): - SSB64_TRACE_INTRO_ANIM=1 — per-tic anim_frame/speed/hitlag for the three opening intros (Yoshi/Samus/Donkey). - SSB64_TRACE_GFX_PER_VI=1 — gfx submits per host frame. - SSB64_TRACE_SWITCH_CTX=1 — SwitchContext block events + game-thread cap engagements. - SSB64_TRACE_DL_COST=1 — per-DL cost decomposition. - SSB64_OPCODE_HISTOGRAM=1 — opcode frequency dump for re-calibration. Tunables: - SSB64_RCP_CYCLE_BUDGET=N (default 400000) — lower = more freezes. - SSB64_RCP_FORCE_N=N — bypass cost model with fixed N. - SSB64_GFX_DEFER_VI=N — legacy alias for FORCE_N. - SSB64_GAME_THREAD_CAP_RESUMES=N (default 1) — 0 disables cap. - SSB64_FREEZE_HOLD_FRAMES=N (default 3) — VI periods to extend each contention freeze; longer = more visible. - SSB64_FREEZE_PACING=0 — disable pacing-correction sleep. - SSB64_FREEZE_TEST=N — diagnostic: skip every Nth Fast3D submit. User-tested: with default config produces 4 visible whole-screen freezes per attract loop (Link + Samus intro climaxes, two each), ~50ms each. Pacing correct at 60Hz. Freeze timing 1 frame late vs emulator may need budget tuning if user wants exact match. Full design + verification + tunables in docs/freeze_frame_rcp_clock_design_2026-04-26.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
JRickey
added a commit
that referenced
this pull request
May 2, 2026
…st-model" Walk main back to dd874e1 to retune freeze-frame sensitivity. The shipped per-DL cost model produces false-positive freezes during the fighter run sequence, the menus (sluggish character-select with progressive slowdown), and on Pikachu / Yoshi poses that should not freeze. Code preserved on: - agent/freeze-frame-rcp-cost-model (origin + local) — full PR - keep/main-with-pr61 (local) — snapshot of the merged state Submodule note: libultraship pointer also rolls back to e608ec0. The local submodule worktree stays at 1ec6db3 so further iteration on PresentCurrentFramebuffer() can continue without re-checkout; run \`git submodule update libultraship\` for a clean state. This reverts commit 531e0c6f. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
contexts_num=2slot machinery insyTaskmanSwitchContext, blocking the game thread for ~1 VI on real hardware. The PC port executed every DL in well under one VI, so freezes never occurred.Mechanism
port/stubs/n64_stubs.c: deferPORT_INTR_SP_TASK_DONEandDP_FULL_SYNCposts by N VI periods through a delay queue flushed inport_vi_simulate_vblank. KeepssSYSchedulerCurrentTaskGfxalive through the deferral soGfxEnd's mq attaches to the in-flight task, matching real-hw release timing.port_resume_service_threads: cap the game thread to N resumes per host frame (default 1). Without the cap the 8-round resume loop lets aSwitchContext-blocked tic catch up within the same host frame and the freeze vanishes.port/gameloop.cpp: per-DL cost walker driven by Fast3D'sgbi_trace_callback.cost = tris*75 + rect_px + load_bytes; only DLs above a per-VI cycle budget get N=3 deferral, the rest stay at N=1. Calibrated against the attract-mode histogram (p99 = 394k, max = 415k) — default budget 400000 catches ~0.1% of DLs, the climax frames only.8be5026+ LUS1ec6db3): on zero-submit VIs redraw the cached game framebuffer through the normal Fast3D window/GUI path so portrait/banner freezes don't appear one frame early or hold a stale swap-chain image.libultraship
1ec6db3(Add Fast3D current framebuffer present helper) on top of the currentssb64tip.Tunables (env-gated, zero overhead when off)
SSB64_RCP_CYCLE_BUDGETSSB64_RCP_FORCE_NSSB64_GAME_THREAD_CAP_RESUMESSSB64_FREEZE_HOLD_FRAMESSSB64_FREEZE_PACINGSSB64_TRACE_*Frame pacing: when
SwitchContextfails andtask_drawis skipped,DrawAndRunGraphicsCommandsno longer runs → noSwapBuffers→ no VSync gate. The pacing fix sleeps + busy-waits one VI period on 0-submit frames so the loop doesn't blast at 2x.Closes
Closes #9.
Docs
docs/freeze_frame_rcp_clock_design_2026-04-26.md— full design + verification.docs/bugs/attract_freeze_vi_idle_present_2026-05-01.md— idle-present rationale.docs/bugs/README.md— index entry added.Test plan
cmake --build .claude/worktrees/freeze-frame-rcp/build --target ssb64 -j 4— clean build on rebased branch.SSB64_GAME_THREAD_CAP_RESUMES=0to confirm freezes disappear (cap is load-bearing).SSB64_RCP_FORCE_N=3to confirm every gfx submit freezes (deferral is wired correctly).🤖 Generated with Claude Code