Skip to content

Fighter intro climax freeze-frames not reproducing — seeking guidance on RCP timing model #9

@JRickey

Description

@JRickey

This issue has been a real pain for me and I am at a loss on the best way forward to proceed. I have spent a lot of time diagnosing this and attempting fixes without butchering source code with varying levels of success. I could really use some guidance here.

Details per Claude:

What we're missing

On real N64 hardware (and on LLE emulators like Mupen64+parallel-rdp / RetroArch+parallel-n64), several fighter-intro animations have a brief whole-screen freeze at their climax — the moment lasts visibly longer than a single frame:

  • Yoshi — tongue grab climax
  • Samus — grapple beam climax
  • DK — ground-smash impact
  • Link — climax frame of the unsheathe
  • (likely others)

These are not authored animation pauses. There's no gcSetAnimSpeed(0.0F) call, no figatree wait command, no scene-script delay. The freeze emerges naturally from real-hardware RDP timing.

On the PC port these freezes don't happen — every fighter intro plays straight through.

What we believe the mechanism is (to the extent we understand it)

syTaskmanSwitchContext (src/sys/taskman.c:905) blocks the game thread when both tscene->contexts_num=2 framebuffer slots are in flight. A slot is released on nSYTaskTypeGfxEndsySchedulerDpFullSync → mq fire.

On real hardware, a heavy climax DL takes longer than one VI period for the RDP to finish, so the slot release lags. On the next VI retrace the scheduler can't find a free slot, the gfx task stays paused, the game thread blocks on osRecvMesg, and the user sees the same framebuffer for an extra retrace — the freeze.

HLE plugins like GLideN64 don't reproduce this either (they don't model RDP wall-clock variance). Only LLE rasterizers do. That tells us the mechanism is fundamentally about RDP timing, not about anything in the game-side code path.

What we've tried

Existing on-main work:

  • VI framebuffer rotation (commit 66a7726) — port/stubs/n64_stubs.c's osViGet*Framebuffer / osViSwapBuffer were no-op NULL returns; they now track current/next slots, with port_vi_simulate_vblank() propagating queued→current once per frame. After this sySchedulerCheckReadyFramebuffer correctly sees two of the three gSYFramebufferSets[] slots as in-use each frame.

Existing on a branch (agent/freeze-frame-rcp-cost-model, not merged):

  • Phase 1 — SP/DP deferral. Defer PORT_INTR_SP_TASK_DONE / DP_FULL_SYNC posts by N VI periods via a delay queue flushed in port_vi_simulate_vblank.
  • Phase 2 — Game-thread resume cap. Cap game thread (id=5) to N resumes per host frame so SwitchContext-blocked tics don't catch up within the same host frame.
  • Phase 3 — Per-DL F3DEX2 cost model. Walk each DL's GBI commands and sum cost = tris*75 + rect_px + load_bytes (opcodes 0x05/0x06/0x07 for tris, 0xE4/0xF6 for rects, 0xF3/0xF4 for tex loads). Divide by a per-VI cycle budget (calibrated to ~400000 from an attract-demo histogram) to select N.

With Phase 3 enabled and the calibrated budget, two freezes appear per attract run (Link and Samus intros), which roughly matches the authored climax timing. But:

Where we're uncertain — and what we're asking for

We honestly don't know if our approach is right. Specific uncertainties:

  1. Is the cost model the right shape at all? tris × 75 + rect_px + load_bytes is a guess. We don't have an authoritative F3DEX2 cycle-cost reference — those constants came from "what made the histogram look right", not from RDP datasheet timings. If anyone has worked out a more grounded model (or knows of one in another port: SoH, Starship, SpaghettiKart, MK64 port, etc.) we'd love a pointer.

  2. Is "per-DL cost summed against a per-VI budget" the right granularity? Real RDP is more like a continuous rasterizer pipeline; a DL with many small tris and a DL with a few overdraw-heavy tris can have very different real costs. We may be approximating something we should be modeling cycle-by-cycle.

  3. Should we be doing this at all in port glue, vs. in libultraship? Fast3D in libultraship already simulates the GBI command stream — would a cycle estimator belong inside Fast3D's frame-finalization, exposed as a callback the port can subscribe to? That seems more reusable for other libultraship-based ports.

  4. Is there a fundamentally different approach we're not considering? Thinking specifically of:

    • Trace-based replay — capture the GBI stream from a real-hardware run via everdrive or accurate emulator, store per-frame timing offsets, replay them as deferral durations on the port. Authentic, but only works for content that's been captured (loses generality for character/stage combinations not in the trace).
    • Accepted-loss path — just don't have the freezes; they're a real-N64 quirk that modern hardware doesn't naturally exhibit and trying to fake them produces a worse experience than skipping them.
  5. How do other LLE / accurate ports handle this? Has anyone solved the climax-freeze problem in a way that's been merged to a shipped port? If you've run into this and decided not to bother — that's also a useful data point.

Reference docs

In-tree:

Not-yet-merged design doc on the branch: docs/freeze_frame_rcp_clock_design_2026-04-26.md.

What we'd act on

If you have:

  • An authoritative F3DEX2 / RDP cycle-cost reference,
  • A pointer to another N64 port that solved this,
  • A sketch of "you should approach this differently",
  • Or just experience saying "we tried this and it made things worse" —

…please leave a comment. Even a "not worth fixing" reply with the reasoning would help us close this out one way or the other.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions