Skip to content

Fighter intro climax freeze-frames (closes #9)#61

Merged
JRickey merged 2 commits intomainfrom
agent/freeze-frame-rcp-cost-model
May 2, 2026
Merged

Fighter intro climax freeze-frames (closes #9)#61
JRickey merged 2 commits intomainfrom
agent/freeze-frame-rcp-cost-model

Conversation

@JRickey
Copy link
Copy Markdown
Owner

@JRickey JRickey commented May 2, 2026

Summary

  • Reproduces the brief whole-screen holds that real N64 hardware / LLE emulators (Mupen64+parallel-rdp, RetroArch+parallel-n64) show during fighter-intro climaxes (Link / Samus / Yoshi / DK / Kirby).
  • Root cause: RDP wall-clock variance overruns the contexts_num=2 slot machinery in syTaskmanSwitchContext, blocking the game thread for ~1 VI on real hardware. The PC port executed every DL in well under one VI, so freezes never occurred.
  • Three-phase fix (SP/DP-completion deferral + game-thread resume cap + per-DL F3DEX2 cost model) plus a Fast3D idle-present helper so zero-submit VI frames keep scanning the live framebuffer instead of presenting a stale swapchain image.

Mechanism

  1. Phase 1 — port/stubs/n64_stubs.c: defer PORT_INTR_SP_TASK_DONE and DP_FULL_SYNC posts by N VI periods through a delay queue flushed in port_vi_simulate_vblank. Keeps sSYSchedulerCurrentTaskGfx alive through the deferral so GfxEnd's mq attaches to the in-flight task, matching real-hw release timing.
  2. Phase 2 — port_resume_service_threads: cap the game thread to N resumes per host frame (default 1). Without the cap the 8-round resume loop lets a SwitchContext-blocked tic catch up within the same host frame and the freeze vanishes.
  3. Phase 3 — port/gameloop.cpp: per-DL cost walker driven by Fast3D's gbi_trace_callback. cost = tris*75 + rect_px + load_bytes; only DLs above a per-VI cycle budget get N=3 deferral, the rest stay at N=1. Calibrated against the attract-mode histogram (p99 = 394k, max = 415k) — default budget 400000 catches ~0.1% of DLs, the climax frames only.
  4. Idle present (8be5026 + LUS 1ec6db3): on zero-submit VIs redraw the cached game framebuffer through the normal Fast3D window/GUI path so portrait/banner freezes don't appear one frame early or hold a stale swap-chain image.

libultraship

Tunables (env-gated, zero overhead when off)

Var Default Effect
SSB64_RCP_CYCLE_BUDGET 400000 lower = more freezes
SSB64_RCP_FORCE_N (off) bypass cost model with fixed N
SSB64_GAME_THREAD_CAP_RESUMES 1 0 disables cap
SSB64_FREEZE_HOLD_FRAMES 3 VI periods to extend each contention freeze
SSB64_FREEZE_PACING on disable pacing-correction sleep with 0
SSB64_TRACE_* off per-tic anim, per-VI submits, SwitchContext blocks, DL cost, opcode histogram

Frame pacing: when SwitchContext fails and task_draw is skipped, DrawAndRunGraphicsCommands no longer runs → no SwapBuffers → no VSync gate. The pacing fix sleeps + busy-waits one VI period on 0-submit frames so the loop doesn't blast at 2x.

Closes

Closes #9.

Docs

  • docs/freeze_frame_rcp_clock_design_2026-04-26.md — full design + verification.
  • docs/bugs/attract_freeze_vi_idle_present_2026-05-01.md — idle-present rationale.
  • docs/bugs/README.md — index entry added.

Test plan

  • cmake --build .claude/worktrees/freeze-frame-rcp/build --target ssb64 -j 4 — clean build on rebased branch.
  • Boot to attract demo: confirm Link + Samus intro climaxes show ~50ms whole-screen freezes; rest of attract plays at 60 Hz.
  • SSB64_GAME_THREAD_CAP_RESUMES=0 to confirm freezes disappear (cap is load-bearing).
  • SSB64_RCP_FORCE_N=3 to confirm every gfx submit freezes (deferral is wired correctly).
  • Spot-check audio/camera don't drift across the freeze.

🤖 Generated with Claude Code

JRickey and others added 2 commits May 1, 2026 21:23
…t model

Phase 1+2+3 of the climax-freeze fix. Reproduces the brief whole-screen
holds that the original game / LLE emulators (Mupen64+parallel-rdp,
RetroArch+parallel-n64) show during fighter-intro climaxes — caused on
real hw by RDP wall-clock variance overrunning the contexts_num=2 slot
machinery in syTaskmanSwitchContext, blocking the game thread for ~1 VI.

Phase 1 (port/stubs/n64_stubs.c): defer PORT_INTR_SP_TASK_DONE and
DP_FULL_SYNC posts by N VI periods via a delay queue flushed in
port_vi_simulate_vblank. Lets sSYSchedulerCurrentTaskGfx stay live
through the deferral so GfxEnd's mq attaches to the in-flight task,
matching real-hw release timing.

Phase 2 (port_resume_service_threads): cap game thread (id=5) to N
resumes per host frame. Without this, the 8-round resume loop lets a
SwitchContext-blocked tic catch up within the same host frame, defeating
the visible freeze. With cap=1 default, blocked tics carry over to next
host frame.

Phase 3 (port/gameloop.cpp): per-DL cost walker driven by Fast3D's
gbi_trace_callback. Sums tris*75 + rect_px + load_bytes (F3DEX2 opcodes
0x05/0x06/0x07 for tris, 0xE4/0xF6 for rects, 0xF3/0xF4 for loads).
Compares against per-VI cycle budget (default 400000) — only DLs above
the budget get N=3 deferral, rest stay at N=1. Calibrated against
attract-mode cost histogram (p99=394k, max=415k) so default catches
~0.1% of DLs = climax-only freezes.

Frame-pacing fix: when SwitchContext fails and task_draw is skipped,
DrawAndRunGraphicsCommands isn't called → no SwapBuffers → no VSync
gate → host frame returns immediately. Without this fix the loop
blasts at 2x speed during contention. Now we sleep+busy-wait one VI
period on 0-submit frames, plus N-1 additional held frames where
INTR_VRETRACE itself is skipped so game logic also pauses (no
animation catch-up after the freeze).

Tracing infra (env-gated, zero overhead when off):
- SSB64_TRACE_INTRO_ANIM=1 — per-tic anim_frame/speed/hitlag for the
  three opening intros (Yoshi/Samus/Donkey).
- SSB64_TRACE_GFX_PER_VI=1 — gfx submits per host frame.
- SSB64_TRACE_SWITCH_CTX=1 — SwitchContext block events + game-thread
  cap engagements.
- SSB64_TRACE_DL_COST=1 — per-DL cost decomposition.
- SSB64_OPCODE_HISTOGRAM=1 — opcode frequency dump for re-calibration.

Tunables:
- SSB64_RCP_CYCLE_BUDGET=N (default 400000) — lower = more freezes.
- SSB64_RCP_FORCE_N=N — bypass cost model with fixed N.
- SSB64_GFX_DEFER_VI=N — legacy alias for FORCE_N.
- SSB64_GAME_THREAD_CAP_RESUMES=N (default 1) — 0 disables cap.
- SSB64_FREEZE_HOLD_FRAMES=N (default 3) — VI periods to extend each
  contention freeze; longer = more visible.
- SSB64_FREEZE_PACING=0 — disable pacing-correction sleep.
- SSB64_FREEZE_TEST=N — diagnostic: skip every Nth Fast3D submit.

User-tested: with default config produces 4 visible whole-screen
freezes per attract loop (Link + Samus intro climaxes, two each),
~50ms each. Pacing correct at 60Hz. Freeze timing 1 frame late vs
emulator may need budget tuning if user wants exact match.

Full design + verification + tunables in
docs/freeze_frame_rcp_clock_design_2026-04-26.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@JRickey JRickey merged commit 531e0c6 into main May 2, 2026
JRickey added a commit that referenced this pull request May 2, 2026
…st-model"

Walk main back to dd874e1 to retune freeze-frame sensitivity. The
shipped per-DL cost model produces false-positive freezes during the
fighter run sequence, the menus (sluggish character-select with
progressive slowdown), and on Pikachu / Yoshi poses that should not
freeze.

Code preserved on:
  - agent/freeze-frame-rcp-cost-model (origin + local) — full PR
  - keep/main-with-pr61 (local) — snapshot of the merged state

Submodule note: libultraship pointer also rolls back to e608ec0. The
local submodule worktree stays at 1ec6db3 so further iteration on
PresentCurrentFramebuffer() can continue without re-checkout; run
\`git submodule update libultraship\` for a clean state.

This reverts commit 531e0c6f.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fighter intro climax freeze-frames not reproducing — seeking guidance on RCP timing model

1 participant