New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fifo: Run/sync with the GPU on command processor register access #7214
Conversation
Source/Core/VideoCommon/Fifo.cpp
Outdated
| @@ -490,6 +490,21 @@ static int RunGpuOnCpu(int ticks) | |||
| return -available_ticks + GPU_TIME_SLOT_SIZE; | |||
| } | |||
|
|
|||
| void SyncGPUForRegisterAccess() | |||
| { | |||
| if (SConfig::GetInstance().bCPUThread && !s_use_deterministic_gpu_thread) | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
This gets Rogue Squadron III in-game again on Single Core only. F-Zero GX no longer throws Unknown Opcodes all the time at boot on Single Core. Gladius now reaches in-game (SyncGPU Dualcore only) but seems to randomly crash. Planet 51 and Datel Discs (confirmed fixed in the bigger change mentioned in the text) remain broken with this subset of changes. |
|
I honestly doubt that those 1000 cycles trigger a GPU reset. IIRC those GPU resets aren't because of executing the GPU only every 1000 cycles, they appeared because we've started to only execute the GPU if we have more ticks available. I think we need much faster feedback (in terms of eg interrupts), but we don't need it to execute the command buffer that fast. |
Source/Core/VideoCommon/Fifo.cpp
Outdated
| else | ||
| { | ||
| // Single core - run the GPU on the CPU thread. | ||
| RunGpuOnCpu(GPU_TIME_SLOT_SIZE); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@degasus I'm thinking it's more because the game is polling the CP registers (e.g. distance) and it appears to be stalled, than the GPU being too slow/fast. But I haven't checked the disassembly of said games to say for sure. |
Source/Core/VideoCommon/Fifo.cpp
Outdated
| { | ||
| // Dual core - kick the GPU, wait for completion. | ||
| RunGpu(); | ||
| s_gpu_mainloop.Wait(); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
3d80308
to
944c53d
Compare
|
As noted on IRC, RunGpuOnCpu(0) still breaks RS3. I've dropped RunGpu, however. |
|
Closing for now, as I've been working on a larger rewrite to Dolphin's fifo implementation. |
944c53d
to
980cb85
Compare
|
This allows Rogue Squadron III to reach in-game still (it did randomly hang on me once while loading a level, but not since.) F-Zero GX's bootup unknown opcode on single core is gone as well. That appears to be the only noticeable changes. |
|
What's the status on this PR? |
|
Gets Rogue Squadron III working on Arm Macs with single core also. Hopefully Stenzek gets a chance to come back to it at some point. |
d3d5612
to
6ea77d2
Compare
|
In the games I tested, which are ones affected by potential unknown opcodes/hangs in single core, games are roughly 5 - 7% lower F-Zero GX Master - 205 fps Star Wars: Rogue Squadron 2 Master - 51 FPS Star Wars : Rogue Squadron 3 Master - DED Metroid Prime 3 Master - 62 FPS These weren't all that different. Dualcore was unaffected by the performance change. How annoyed would people get if we made Single Core a bit slower? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Tested Star Fox Adventures in Single Core for 3 hours. No crashes or problems of any sort. Also, 3 hours is very, very long play session for Star Fox Adventures. |
|
Note that this does fix the unknown opcode at startup in some games (like F-Zero GX) but not others, like Metroid Prime. There's still some more issues to solve in the future, but I do believe this is a step in the right direction. I do think that if someone is using single core, they'd want a stable experience over a performant one. |
|
This does not fix Littlest Petshop Europe, and other European games that crash on boot for unknown reasons. A more complete timing thing I have from 2018 does fix it though. |
I couldn't play more than 15 minutes of the game before I was bored. I honestly can not imagine playing for 3 hours.
That really does make sense, since Stenzek did say they pulled this out of their main project to fix RS3 (and likely F-Zero GX) - not necessarily anything else. |
|
I know, but I thought I'd clarify as I confused myself. There are games like Metroid Prime that I thought this fixed, but in the end it was the more complete and slower one that fixed it. |
| }), | ||
| IsOnThread() ? MMIO::ComplexWrite<u16>([](u32, u16 val) { | ||
| Fifo::SyncGPUForRegisterAccess(); | ||
| WriteHigh(fifo.CPReadPointer, val); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for only seeing it now, but it seems that the & WMASK_HI_RESTRICT was accidentally removed here?
|
Latest changes doesn't change behavior, RS3 still working, F-Zero GX still working. |
|
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
|
I went through a gauntlet of the most feared games of Dolphin on this Pull Request in order to see if there were any potential GPU regressions. Compatibility
PerformanceWhile I initially observed some performance differences that were a bit higher, in actual practice, heavier areas were affected less or not at all. So, my initial fear of performance being lower is offset by the fact it doesn't matter in areas where we really need extra performance. Most of the performance losses were in menus or areas where games weren't really doing much. In ClosingI think we should hit the green button. |
In Dolphin, we don't emulate the CPU and GPU in lock-step, for performance reasons. The CPU drives events, with the GPU emulation being one of these events which is fired periodically. This is the behavior in single core mode. Dual-core mode is completely non-deterministic here, and runs whenever Fifo::RunGpu() is called. Thus, I am mainly concerned about single-core mode.
Due to the GPU only being run every 1000 cycles or so, to a game, if it polled the command processor registers (for example, the read pointer, or distance), the GPU would appear to be stalled. Testing would suggest this is what causes FIFO resets/unknown opcodes in games such as F-Zero GX, and Rogue Squadron 3.
To work around this, each time these registers are accessed, we run the GPU for its time quantum. This way, to a game, the GPU is making progress (as it would on the console). While this isn't necessarily accurate to the hardware in terms of cycles executed, we don't really emulate GPU timings anyway, so executing a few extra GPU cycles doesn't really have any impact.
In dual core, it syncs with the GPU thread, and ensures that the GPU thread isn't too far behind. Again, this is not deterministic, but dual core isn't to begin with, and has numerous stability issues as a result.
Long-term, I'm planning on refactoring the FIFO in a way which resembles the current state of deterministic dual core - read from memory on the CPU thread, but process the commands on the GPU thread. This will lead to determinism for command processor behavior (e.g. FIFO breakpoints, hi/low watermark interrupts), except EFB copies to RAM, and BP token interrupts.
However, this is a much more significant task, with far larger chances of regressions, so I pulled this change out to fix RS3, in the meantime, at least. And perhaps new motivation to make full MMU faster ;).