This issue has been a real pain for me and I am at a loss on the best way forward to proceed. I have spent a lot of time diagnosing this and attempting fixes without butchering source code with varying levels of success. I could really use some guidance here.
Details per Claude:
What we're missing
On real N64 hardware (and on LLE emulators like Mupen64+parallel-rdp / RetroArch+parallel-n64), several fighter-intro animations have a brief whole-screen freeze at their climax — the moment lasts visibly longer than a single frame:
- Yoshi — tongue grab climax
- Samus — grapple beam climax
- DK — ground-smash impact
- Link — climax frame of the unsheathe
- (likely others)
These are not authored animation pauses. There's no gcSetAnimSpeed(0.0F) call, no figatree wait command, no scene-script delay. The freeze emerges naturally from real-hardware RDP timing.
On the PC port these freezes don't happen — every fighter intro plays straight through.
What we believe the mechanism is (to the extent we understand it)
syTaskmanSwitchContext (src/sys/taskman.c:905) blocks the game thread when both tscene->contexts_num=2 framebuffer slots are in flight. A slot is released on nSYTaskTypeGfxEnd → sySchedulerDpFullSync → mq fire.
On real hardware, a heavy climax DL takes longer than one VI period for the RDP to finish, so the slot release lags. On the next VI retrace the scheduler can't find a free slot, the gfx task stays paused, the game thread blocks on osRecvMesg, and the user sees the same framebuffer for an extra retrace — the freeze.
HLE plugins like GLideN64 don't reproduce this either (they don't model RDP wall-clock variance). Only LLE rasterizers do. That tells us the mechanism is fundamentally about RDP timing, not about anything in the game-side code path.
What we've tried
Existing on-main work:
- VI framebuffer rotation (commit
66a7726) — port/stubs/n64_stubs.c's osViGet*Framebuffer / osViSwapBuffer were no-op NULL returns; they now track current/next slots, with port_vi_simulate_vblank() propagating queued→current once per frame. After this sySchedulerCheckReadyFramebuffer correctly sees two of the three gSYFramebufferSets[] slots as in-use each frame.
Existing on a branch (agent/freeze-frame-rcp-cost-model, not merged):
- Phase 1 — SP/DP deferral. Defer
PORT_INTR_SP_TASK_DONE / DP_FULL_SYNC posts by N VI periods via a delay queue flushed in port_vi_simulate_vblank.
- Phase 2 — Game-thread resume cap. Cap game thread (id=5) to N resumes per host frame so SwitchContext-blocked tics don't catch up within the same host frame.
- Phase 3 — Per-DL F3DEX2 cost model. Walk each DL's GBI commands and sum
cost = tris*75 + rect_px + load_bytes (opcodes 0x05/0x06/0x07 for tris, 0xE4/0xF6 for rects, 0xF3/0xF4 for tex loads). Divide by a per-VI cycle budget (calibrated to ~400000 from an attract-demo histogram) to select N.
With Phase 3 enabled and the calibrated budget, two freezes appear per attract run (Link and Samus intros), which roughly matches the authored climax timing. But:
Where we're uncertain — and what we're asking for
We honestly don't know if our approach is right. Specific uncertainties:
-
Is the cost model the right shape at all? tris × 75 + rect_px + load_bytes is a guess. We don't have an authoritative F3DEX2 cycle-cost reference — those constants came from "what made the histogram look right", not from RDP datasheet timings. If anyone has worked out a more grounded model (or knows of one in another port: SoH, Starship, SpaghettiKart, MK64 port, etc.) we'd love a pointer.
-
Is "per-DL cost summed against a per-VI budget" the right granularity? Real RDP is more like a continuous rasterizer pipeline; a DL with many small tris and a DL with a few overdraw-heavy tris can have very different real costs. We may be approximating something we should be modeling cycle-by-cycle.
-
Should we be doing this at all in port glue, vs. in libultraship? Fast3D in libultraship already simulates the GBI command stream — would a cycle estimator belong inside Fast3D's frame-finalization, exposed as a callback the port can subscribe to? That seems more reusable for other libultraship-based ports.
-
Is there a fundamentally different approach we're not considering? Thinking specifically of:
- Trace-based replay — capture the GBI stream from a real-hardware run via everdrive or accurate emulator, store per-frame timing offsets, replay them as deferral durations on the port. Authentic, but only works for content that's been captured (loses generality for character/stage combinations not in the trace).
- Accepted-loss path — just don't have the freezes; they're a real-N64 quirk that modern hardware doesn't naturally exhibit and trying to fake them produces a worse experience than skipping them.
-
How do other LLE / accurate ports handle this? Has anyone solved the climax-freeze problem in a way that's been merged to a shipped port? If you've run into this and decided not to bother — that's also a useful data point.
Reference docs
In-tree:
Not-yet-merged design doc on the branch: docs/freeze_frame_rcp_clock_design_2026-04-26.md.
What we'd act on
If you have:
- An authoritative F3DEX2 / RDP cycle-cost reference,
- A pointer to another N64 port that solved this,
- A sketch of "you should approach this differently",
- Or just experience saying "we tried this and it made things worse" —
…please leave a comment. Even a "not worth fixing" reply with the reasoning would help us close this out one way or the other.
This issue has been a real pain for me and I am at a loss on the best way forward to proceed. I have spent a lot of time diagnosing this and attempting fixes without butchering source code with varying levels of success. I could really use some guidance here.
Details per Claude:
What we're missing
On real N64 hardware (and on LLE emulators like Mupen64+parallel-rdp / RetroArch+parallel-n64), several fighter-intro animations have a brief whole-screen freeze at their climax — the moment lasts visibly longer than a single frame:
These are not authored animation pauses. There's no
gcSetAnimSpeed(0.0F)call, no figatree wait command, no scene-script delay. The freeze emerges naturally from real-hardware RDP timing.On the PC port these freezes don't happen — every fighter intro plays straight through.
What we believe the mechanism is (to the extent we understand it)
syTaskmanSwitchContext(src/sys/taskman.c:905) blocks the game thread when bothtscene->contexts_num=2framebuffer slots are in flight. A slot is released onnSYTaskTypeGfxEnd→sySchedulerDpFullSync→ mq fire.On real hardware, a heavy climax DL takes longer than one VI period for the RDP to finish, so the slot release lags. On the next VI retrace the scheduler can't find a free slot, the gfx task stays paused, the game thread blocks on
osRecvMesg, and the user sees the same framebuffer for an extra retrace — the freeze.HLE plugins like GLideN64 don't reproduce this either (they don't model RDP wall-clock variance). Only LLE rasterizers do. That tells us the mechanism is fundamentally about RDP timing, not about anything in the game-side code path.
What we've tried
Existing on-main work:
66a7726) —port/stubs/n64_stubs.c'sosViGet*Framebuffer/osViSwapBufferwere no-opNULLreturns; they now track current/next slots, withport_vi_simulate_vblank()propagating queued→current once per frame. After thissySchedulerCheckReadyFramebuffercorrectly sees two of the threegSYFramebufferSets[]slots as in-use each frame.Existing on a branch (
agent/freeze-frame-rcp-cost-model, not merged):PORT_INTR_SP_TASK_DONE/DP_FULL_SYNCposts by N VI periods via a delay queue flushed inport_vi_simulate_vblank.cost = tris*75 + rect_px + load_bytes(opcodes 0x05/0x06/0x07 for tris, 0xE4/0xF6 for rects, 0xF3/0xF4 for tex loads). Divide by a per-VI cycle budget (calibrated to ~400000 from an attract-demo histogram) to select N.With Phase 3 enabled and the calibrated budget, two freezes appear per attract run (Link and Samus intros), which roughly matches the authored climax timing. But:
Where we're uncertain — and what we're asking for
We honestly don't know if our approach is right. Specific uncertainties:
Is the cost model the right shape at all?
tris × 75 + rect_px + load_bytesis a guess. We don't have an authoritative F3DEX2 cycle-cost reference — those constants came from "what made the histogram look right", not from RDP datasheet timings. If anyone has worked out a more grounded model (or knows of one in another port: SoH, Starship, SpaghettiKart, MK64 port, etc.) we'd love a pointer.Is "per-DL cost summed against a per-VI budget" the right granularity? Real RDP is more like a continuous rasterizer pipeline; a DL with many small tris and a DL with a few overdraw-heavy tris can have very different real costs. We may be approximating something we should be modeling cycle-by-cycle.
Should we be doing this at all in port glue, vs. in libultraship? Fast3D in libultraship already simulates the GBI command stream — would a cycle estimator belong inside Fast3D's frame-finalization, exposed as a callback the port can subscribe to? That seems more reusable for other libultraship-based ports.
Is there a fundamentally different approach we're not considering? Thinking specifically of:
How do other LLE / accurate ports handle this? Has anyone solved the climax-freeze problem in a way that's been merged to a shipped port? If you've run into this and decided not to bother — that's also a useful data point.
Reference docs
In-tree:
docs/intro_residuals_2026-04-25.md— Issue Character Select backdrop behind each port renders garbled sometimes #3 has the most thorough writeup of the mechanism + foundation workdocs/fighter_intro_animation_handoff_2026-04-13.md— earlier investigationdocs/fighter_intro_animation_investigation.mdNot-yet-merged design doc on the branch:
docs/freeze_frame_rcp_clock_design_2026-04-26.md.What we'd act on
If you have:
…please leave a comment. Even a "not worth fixing" reply with the reasoning would help us close this out one way or the other.