New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VideoCommon: Update EFB peek cache on draw done and tokens #11098
Conversation
|
Regarding the cache misses in SMG you mentioned on IRC: At least for the title screen, they are still using tokens, but they don't wait to draw after. Here's my rough understanding of how it works.These addresses are from the US version of the game.
Note that this is for the title screen only, not for the file select menu (which has multiple peeks per frame, 2 of which are at (0, 0) and one of which follows the cursor(?)). Using breakpoints, I can confirm that the Try checking if the same cache misses happen in single core - if single core is fine, try refreshing the peek cache only when the GPU thread processes the token/draw done command (although I think that should already be the case, since |
Exactly, it's already running on the GPU thread. Anything else would be pretty bad because none of this is thread safe. My current plan is to clean this up and actually invalidate unused tiles at some point. After that it should work great for Mario Galaxy when combined with "Defer EFB invalidations". |
4ce37dd
to
147b334
Compare
|
I assume this uses the option in advanced, and not defer EFB copies to RAM in the hacks section. Either way, I didn't notice much of a different on my NVIDIA graphics card. |
I think desktop GPUs are simply too fast for this to make a difference. When I profiled Mario Galaxy on my PC, WaitForCommandBuffer didn't show up in the results either but it takes up between 10-25% of the time of the Video thread on my phone. The PR massively reduces this to the point where the biggest performance issue is just thread synchronization between the emulator thread and the video thread. |
|
I tried Super Mario Galaxy on my phone, and didn't see a big performance difference there either, but I didn't have too scientific of a test. Note, Pixel 3a which may be GPU limited elsewhere? |
|
I quickly tried it on a Galaxy S10 (SD855). My non-scientific results are as follows: 5.0-17403: On that initial part of the game where you have to chase the 3 bunnies, I got some fluctuation between 50-60fps with some major hiccups here and there that brought it down to like 40fps or so (did not seem to be shader compilation, since these kept happening). Overall, FPS seemed very dependent on the exact scene that was rendering. This PR: On this build, I got a stable 60fps throughout on that same bunny chasing part, which is great! I kept playing the game for like 20 minutes and got 60fps in like 95% of the time. Both tests with OpenGL, "Synchronize GPU Thread" set to Never and real wiimotes with a Dolphin Bar |
147b334
to
2097864
Compare
|
Same here Tested on my poco X3 pro.I have not had any regression, only stability and FPS increase. |
|
Running on a M1 Mac (ARM), Monterey 12.6. Crashes with a jit error in F-Zero but not all the time. I suspect it’s when EFB CPU access is used. Run Sand Ocean and it’ll crash all the time. Character select screen isn’t drawn properly as well. Console output and backtrace: |
|
I think this invalid memory access actually happened in code that doesn't belong to the JIT but happens to be running on the same thread as the JIT. You should get a proper stack trace if you disable fastmem. (If you have the debugger enabled, you'll find this option at JIT > Disable Fastmem.) Enabling dual core may also work depending on where in the code the error is happening. EDIT: Ah, I didn't notice that there actually was a proper stack trace at the end of what you pasted. Ignore my instructions then :) |
|
@JosJuice I just updated my comment with a backtrace. Not sure how to disable fastmem from lldb so not sure if the trace is good. |
|
It looks correct. |
|
Are you sure this only happens with the PR? It crashes in the actual EFB peek code that I hardly changed. |
|
Oh what? I just checked master and the select screen is broke but Sand Ocean doesn’t crash. It does with this PR. SW Rogue Leader doesn’t start in Master too. I missed that select screen because I run a script to test performance and it runs the Sand Ocean intro. Sorry about that. |
|
Okay, I think I know why it crashes. |
2097864
to
3027228
Compare
|
The crash should be fixed now. Thanks for testing. |
|
Still crashing, same level. The other issue is related to #11094. |
Unlikely considering that I can reproduce it on x86. |
|
Doesn't crash for me anymore. Make sure you actually checked out the updated code. |
Crashes from checking out and building myself. Same thing after downloading the build from here. Another backtrace: |
|
Different crash, got it. |
Massively improves performance in Mario Galaxy on Android.
3027228
to
779fe13
Compare
|
Fixed. |
|
Great, that fixed it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JMC wants this merged so that it can be unleashed on users for testing/bug finding. The code looks good enough to me for that.
I think later on, we'll need to revisit the existing UI for "skip EFB access from CPU" and "defer EFB cache invalidation", and also how this behaves with "defer EFB cache invalidation" enabled versus disabled. But for now, this is good enough.



I assume DrawDone and or Tokens are used to synchronize with the GPU before reading back data from the EFB.
So instead of just invalidating the EFB peek cache, I take it as a hint and queue an update to all previously used EFB tiles.
If there is work between DrawDone/Token and the actual EFB read, the read hits a prepopulated cache and doesn't stall both threads and the GPU.
It helps Super Mario Galaxy, getting the game up to 60 FPS on my Snapdragon 865 phone (instead of ~40) in some scenes. Unfortunately Galaxy also likes to invalidate the peek cache after a DrawDone/Token and even if I enable DeferEFBInvalidations, the GPU still doesn't finish in time.
This is the sort of change that probably needs a lot more testing as it could impact other games. Perhabs it should be a config option?
I'm looking forward to some feedback.