Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fifo: Run/sync with the GPU on command processor register access #7214

Merged
merged 1 commit into from Apr 18, 2021

Conversation

stenzek
Copy link
Contributor

@stenzek stenzek commented Jul 5, 2018

In Dolphin, we don't emulate the CPU and GPU in lock-step, for performance reasons. The CPU drives events, with the GPU emulation being one of these events which is fired periodically. This is the behavior in single core mode. Dual-core mode is completely non-deterministic here, and runs whenever Fifo::RunGpu() is called. Thus, I am mainly concerned about single-core mode.

Due to the GPU only being run every 1000 cycles or so, to a game, if it polled the command processor registers (for example, the read pointer, or distance), the GPU would appear to be stalled. Testing would suggest this is what causes FIFO resets/unknown opcodes in games such as F-Zero GX, and Rogue Squadron 3.

To work around this, each time these registers are accessed, we run the GPU for its time quantum. This way, to a game, the GPU is making progress (as it would on the console). While this isn't necessarily accurate to the hardware in terms of cycles executed, we don't really emulate GPU timings anyway, so executing a few extra GPU cycles doesn't really have any impact.

In dual core, it syncs with the GPU thread, and ensures that the GPU thread isn't too far behind. Again, this is not deterministic, but dual core isn't to begin with, and has numerous stability issues as a result.

Long-term, I'm planning on refactoring the FIFO in a way which resembles the current state of deterministic dual core - read from memory on the CPU thread, but process the commands on the GPU thread. This will lead to determinism for command processor behavior (e.g. FIFO breakpoints, hi/low watermark interrupts), except EFB copies to RAM, and BP token interrupts.

However, this is a much more significant task, with far larger chances of regressions, so I pulled this change out to fix RS3, in the meantime, at least. And perhaps new motivation to make full MMU faster ;).

@@ -490,6 +490,21 @@ static int RunGpuOnCpu(int ticks)
return -available_ticks + GPU_TIME_SLOT_SIZE;
}

void SyncGPUForRegisterAccess()
{
if (SConfig::GetInstance().bCPUThread && !s_use_deterministic_gpu_thread)

This comment was marked as off-topic.

@JMC47
Copy link
Contributor

JMC47 commented Jul 5, 2018

This gets Rogue Squadron III in-game again on Single Core only.

F-Zero GX no longer throws Unknown Opcodes all the time at boot on Single Core.

Gladius now reaches in-game (SyncGPU Dualcore only) but seems to randomly crash.

Planet 51 and Datel Discs (confirmed fixed in the bigger change mentioned in the text) remain broken with this subset of changes.

@degasus
Copy link
Member

degasus commented Jul 5, 2018

I honestly doubt that those 1000 cycles trigger a GPU reset. IIRC those GPU resets aren't because of executing the GPU only every 1000 cycles, they appeared because we've started to only execute the GPU if we have more ticks available. I think we need much faster feedback (in terms of eg interrupts), but we don't need it to execute the command buffer that fast.

else
{
// Single core - run the GPU on the CPU thread.
RunGpuOnCpu(GPU_TIME_SLOT_SIZE);

This comment was marked as off-topic.

@stenzek
Copy link
Contributor Author

stenzek commented Jul 5, 2018

@degasus I'm thinking it's more because the game is polling the CP registers (e.g. distance) and it appears to be stalled, than the GPU being too slow/fast. But I haven't checked the disassembly of said games to say for sure.

{
// Dual core - kick the GPU, wait for completion.
RunGpu();
s_gpu_mainloop.Wait();

This comment was marked as off-topic.

@stenzek stenzek force-pushed the cp-access-sync branch 2 times, most recently from 3d80308 to 944c53d Compare July 13, 2018 13:43
@stenzek
Copy link
Contributor Author

stenzek commented Jul 13, 2018

As noted on IRC, RunGpuOnCpu(0) still breaks RS3. I've dropped RunGpu, however.

@stenzek
Copy link
Contributor Author

stenzek commented Aug 29, 2018

Closing for now, as I've been working on a larger rewrite to Dolphin's fifo implementation.

@JMC47
Copy link
Contributor

JMC47 commented Jan 4, 2019

This allows Rogue Squadron III to reach in-game still (it did randomly hang on me once while loading a level, but not since.)

F-Zero GX's bootup unknown opcode on single core is gone as well. That appears to be the only noticeable changes.

@legoj15
Copy link

legoj15 commented Nov 8, 2019

What's the status on this PR?

@ghost
Copy link

ghost commented Mar 27, 2021

Gets Rogue Squadron III working on Arm Macs with single core also. Hopefully Stenzek gets a chance to come back to it at some point.

@stenzek stenzek force-pushed the cp-access-sync branch 2 times, most recently from d3d5612 to 6ea77d2 Compare April 11, 2021 07:32
@JMC47
Copy link
Contributor

JMC47 commented Apr 11, 2021

In the games I tested, which are ones affected by potential unknown opcodes/hangs in single core, games are roughly 5 - 7% lower

F-Zero GX Master - 205 fps
F-Zero GX PR7214 - 197 FPS

Star Wars: Rogue Squadron 2 Master - 51 FPS
Star Wars: Rogue Squadron 2 PR7214 - 50 FPS

Star Wars : Rogue Squadron 3 Master - DED
Star Wars: Rogue Squadron 3 PR7214 - 28 FPS in most levels

Metroid Prime 3 Master - 62 FPS
Metroid Prime 3 PR7214 - 61 FPS

These weren't all that different. Dualcore was unaffected by the performance change. How annoyed would people get if we made Single Core a bit slower?

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MayImilae
Copy link
Contributor

MayImilae commented Apr 15, 2021

Tested Star Fox Adventures in Single Core for 3 hours. No crashes or problems of any sort.

Also, 3 hours is very, very long play session for Star Fox Adventures.

@JMC47
Copy link
Contributor

JMC47 commented Apr 15, 2021

Note that this does fix the unknown opcode at startup in some games (like F-Zero GX) but not others, like Metroid Prime. There's still some more issues to solve in the future, but I do believe this is a step in the right direction. I do think that if someone is using single core, they'd want a stable experience over a performant one.

@JMC47
Copy link
Contributor

JMC47 commented Apr 15, 2021

This does not fix Littlest Petshop Europe, and other European games that crash on boot for unknown reasons. A more complete timing thing I have from 2018 does fix it though.

@ghost
Copy link

ghost commented Apr 15, 2021

Tested Star Fox Adventures in Single Core for 3 hours. No crashes or problems of any sort.

Also, 3 hours is very, very long play session for Star Fox Adventures.

I couldn't play more than 15 minutes of the game before I was bored. I honestly can not imagine playing for 3 hours.

Note that this does fix the unknown opcode at startup in some games (like F-Zero GX) but not others, like Metroid Prime. There's still some more issues to solve in the future, but I do believe this is a step in the right direction. I do think that if someone is using single core, they'd want a stable experience over a performant one.
This does not fix Littlest Petshop Europe, and other European games that crash on boot for unknown reasons. A more complete timing thing I have from 2018 does fix it though.

That really does make sense, since Stenzek did say they pulled this out of their main project to fix RS3 (and likely F-Zero GX) - not necessarily anything else.

@JMC47
Copy link
Contributor

JMC47 commented Apr 15, 2021

I know, but I thought I'd clarify as I confused myself. There are games like Metroid Prime that I thought this fixed, but in the end it was the more complete and slower one that fixed it.

}),
IsOnThread() ? MMIO::ComplexWrite<u16>([](u32, u16 val) {
Fifo::SyncGPUForRegisterAccess();
WriteHigh(fifo.CPReadPointer, val);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for only seeing it now, but it seems that the & WMASK_HI_RESTRICT was accidentally removed here?

@JMC47
Copy link
Contributor

JMC47 commented Apr 17, 2021

Latest changes doesn't change behavior, RS3 still working, F-Zero GX still working.

@dolphin-emu-bot
Copy link
Contributor

FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:

  • custom-brawl-char on ogl-lin-mesa: diff
  • fifa-street on ogl-lin-mesa: diff
  • inverted-depth-range on ogl-lin-mesa: diff
  • nfsu-reflections on ogl-lin-mesa: diff
  • rs2-skybox on ogl-lin-mesa: diff
  • custom-brawl-char on sw-lin-mesa: diff
  • fifa-street on sw-lin-mesa: diff
  • inverted-depth-range on sw-lin-mesa: diff
  • nfsu-reflections on sw-lin-mesa: diff
  • custom-brawl-char on ogl-lin-radeon: diff
  • fifa-street on ogl-lin-radeon: diff
  • inverted-depth-range on ogl-lin-radeon: diff
  • nfsu-reflections on ogl-lin-radeon: diff
  • rs2-skybox on ogl-lin-radeon: diff

automated-fifoci-reporter

@JMC47
Copy link
Contributor

JMC47 commented Apr 18, 2021

I went through a gauntlet of the most feared games of Dolphin on this Pull Request in order to see if there were any potential GPU regressions.

Compatibility

  • Star Fox Adventures -> Left on overnight, no fifo error. May also tested this for 3 hours with real gameplay
  • Xenoblade Chronicles -> Went through the cutscene gauntlet that used to throw a unknown opcode. Was unable to reproduce any issues.
  • The Last Story -> Is a pain in the ass in general. Didn't see any problems and performance wasn't affected either.
  • Battalion Wars 2 -> One of the most well known FifoBP sensitive games in Dolphin. Played through 3 missions with no crashes.
  • Star Wars Rogue Squadron 2 -> Factor 5 game. It mainly crashed when using the targeting computer in Master. That is fixed in this pull request as far as I can tell.
  • Star Wars Rogue Squadron 3 -> Is running and stable in single core. Lag is not emulated, resulting in higher "potential" framerates than on console. Requires real GPU timings.
  • Nascar Inside Line -> One of the most demanding Wii games. No performance change with this PR even in single core.
  • Metroid Prime 3 -> Seems identical to master, which is a good thing.
  • Metroid Prime 1 -> No change from master. Unfortunately still has an unknown opcode at boot.
  • F-Zero GX -> Unknown opcode at boot fixed. Is a bit slower than master but not significantly so. That slowdown actually goes away the more that's going on, so it's really only noticeable in practice with no other cars on the road.
  • Super Mario Sunshine -> No real difference that I could tell. Performance looked identical.
  • Datel Stuff -> Still broken. Not really surprising. It needed a much more aggressive version of this change
  • Resident Evil PAL -> Still broken.
  • Batman Begins -> Still runs incorrect speed in spots
  • Fire Emblem: Radiant Dawn -> Still behaves the same as master. No Fifo errors though.
  • Gladius -> Better than on Master, but still crashes every once in a while. Can be avoided with setting the Emulated CPU Clock to 60% or lower.
  • Star Wars: The Clone Wars -> No crashes observed.
  • Super Mario Galaxy 1/2 -> Nothing really expected, nothing really changed.

Performance

While I initially observed some performance differences that were a bit higher, in actual practice, heavier areas were affected less or not at all. So, my initial fear of performance being lower is offset by the fact it doesn't matter in areas where we really need extra performance. Most of the performance losses were in menus or areas where games weren't really doing much.

In Closing

I think we should hit the green button.

@JMC47 JMC47 merged commit 821e51c into dolphin-emu:master Apr 18, 2021
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants