Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VideoCommon: Update EFB peek cache on draw done and tokens #11098

Merged
merged 1 commit into from Oct 5, 2022

Conversation

K0bin
Copy link
Contributor

@K0bin K0bin commented Sep 26, 2022

I assume DrawDone and or Tokens are used to synchronize with the GPU before reading back data from the EFB.
So instead of just invalidating the EFB peek cache, I take it as a hint and queue an update to all previously used EFB tiles.
If there is work between DrawDone/Token and the actual EFB read, the read hits a prepopulated cache and doesn't stall both threads and the GPU.

It helps Super Mario Galaxy, getting the game up to 60 FPS on my Snapdragon 865 phone (instead of ~40) in some scenes. Unfortunately Galaxy also likes to invalidate the peek cache after a DrawDone/Token and even if I enable DeferEFBInvalidations, the GPU still doesn't finish in time.
This is the sort of change that probably needs a lot more testing as it could impact other games. Perhabs it should be a config option?

I'm looking forward to some feedback.

@K0bin K0bin marked this pull request as draft September 27, 2022 00:04
@Pokechu22
Copy link
Contributor

Regarding the cache misses in SMG you mentioned on IRC: At least for the title screen, they are still using tokens, but they don't wait to draw after.

Here's my rough understanding of how it works.

These addresses are from the US version of the game.

  • FUN_8033ee8c is the main function that is responsible for this (and a bunch of other stuff?)
  • At 8033efe8, that function calls FUN_8029fbf0, which calls FUN_8029e5c4, which calls GXSetDrawSync (at 804b9ec8 - signature DB doesn't find this), which sends the token.
  • FUN_8033ee8c then calls FUN_80344350 which does something but doesn't seem to do any graphics stuff, and then calls GXSetDither (at 8033eff4), and then a bunch of other graphics functions.
  • FUN_8033ee8c finally calls FUN_803ca76c. This does a bunch more graphics stuff, including calling GXSetZTexture (at 804bd53c, from 803ca9f0) and drawing over the entire screen (at 803caa48). The observed behavior here is clearing the depth buffer (using depth from a 2x2 texture that is all white), while leaving the color buffer unchanged.
  • Separately from all of this, eventually the token command is processed. This passes from GXTokenInterruptHandler at 804ba2d4, to a thread at 80397880 (FUN_803977f4), to FUN_8029e614, to GXPeekZ (at 804ba228). I think FUN_803977f4 uses the cursor's current position, but the title screen doesn't have the pointer enabled, so it always uses (0, 0); I'm not 100% sure of this.

Note that this is for the title screen only, not for the file select menu (which has multiple peeks per frame, 2 of which are at (0, 0) and one of which follows the cursor(?)).

Using breakpoints, I can confirm that the GXPeekZ call happens after the vertex data is written; there isn't any kind of synchronization to force it to happen beforehand, so theoretically a race condition could happen. But I think in practice, it should be fine.

Try checking if the same cache misses happen in single core - if single core is fine, try refreshing the peek cache only when the GPU thread processes the token/draw done command (although I think that should already be the case, since BPWritten is the GPU thread while LoadBPRegPreprocess is the CPU thread, and the invalidation should happen by the g_framebuffer_manager->FlagPeekCacheAsOutOfDate() line in VertexManagerBase::Flush, which should be GPU-thread only?)

@K0bin
Copy link
Contributor Author

K0bin commented Sep 29, 2022

try refreshing the peek cache only when the GPU thread processes the token/draw done command (although I think that should already be the case, since BPWritten is the GPU thread while LoadBPRegPreprocess is the CPU thread, and the invalidation should happen by the g_framebuffer_manager->FlagPeekCacheAsOutOfDate() line in VertexManagerBase::Flush, which should be GPU-thread only?)

Exactly, it's already running on the GPU thread. Anything else would be pretty bad because none of this is thread safe.
I really didn't expect the game itself to be racy like that. I guess this stuff is more common on (old) console games where you have a fixed target and can rely to some degree on timings.

My current plan is to clean this up and actually invalidate unused tiles at some point. After that it should work great for Mario Galaxy when combined with "Defer EFB invalidations".

@K0bin K0bin force-pushed the refresh-efb-cache branch 5 times, most recently from 4ce37dd to 147b334 Compare September 29, 2022 22:02
@K0bin K0bin marked this pull request as ready for review September 29, 2022 22:02
@JMC47
Copy link
Contributor

JMC47 commented Sep 29, 2022

I assume this uses the option in advanced, and not defer EFB copies to RAM in the hacks section. Either way, I didn't notice much of a different on my NVIDIA graphics card.

@K0bin
Copy link
Contributor Author

K0bin commented Sep 29, 2022

Either way, I didn't notice much of a different on my NVIDIA graphics card.

I think desktop GPUs are simply too fast for this to make a difference. When I profiled Mario Galaxy on my PC, WaitForCommandBuffer didn't show up in the results either but it takes up between 10-25% of the time of the Video thread on my phone. The PR massively reduces this to the point where the biggest performance issue is just thread synchronization between the emulator thread and the video thread.

@JMC47
Copy link
Contributor

JMC47 commented Sep 29, 2022

I tried Super Mario Galaxy on my phone, and didn't see a big performance difference there either, but I didn't have too scientific of a test.

Note, Pixel 3a which may be GPU limited elsewhere?

@golivax
Copy link

golivax commented Sep 30, 2022

I quickly tried it on a Galaxy S10 (SD855). My non-scientific results are as follows:

5.0-17403: On that initial part of the game where you have to chase the 3 bunnies, I got some fluctuation between 50-60fps with some major hiccups here and there that brought it down to like 40fps or so (did not seem to be shader compilation, since these kept happening). Overall, FPS seemed very dependent on the exact scene that was rendering.

This PR: On this build, I got a stable 60fps throughout on that same bunny chasing part, which is great! I kept playing the game for like 20 minutes and got 60fps in like 95% of the time.

Both tests with OpenGL, "Synchronize GPU Thread" set to Never and real wiimotes with a Dolphin Bar

@brujo5
Copy link

brujo5 commented Oct 1, 2022

Same here Tested on my poco X3 pro.I have not had any regression, only stability and FPS increase.

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

Running on a M1 Mac (ARM), Monterey 12.6.

Crashes with a jit error in F-Zero but not all the time. I suspect it’s when EFB CPU access is used. Run Sand Ocean and it’ll crash all the time. Character select screen isn’t drawn properly as well.

GFZE01_2022-10-01_09-28-51

macos crash report

Console output and backtrace:

50:44:097 Core/PowerPC/JitArm64/Jit.cpp:106 E[JIT]: Exception handler - Unhandled fault
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R0: 0x0000000000000000	R1: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R2: 0x0000000000000100	R3: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R4: 0x0000000000000000	R5: 0x00000000000000a0
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R6: 0x0000000000000000	R7: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R8: 0x0000000000000001	R9: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R10: 0x0000000000000002	R11: 0x020000020fc1f981
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R12: 0x0000000000000002	R13: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R14: 0x0000000000000000	R15: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R16: 0x00000001b64209b0	R17: 0x0000000210bf45a8
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R18: 0x0000000000000000	R19: 0x000060000274d320
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R20: 0x0000600000f064f0	R21: 0x0000000000000280
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R22: 0x0000000080258c20	R23: 0x00000000803e1b28
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R24: 0x00000000c8000000	R25: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R26: 0x00000000c8400000	R27: 0x00000000c8000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:33 E[JIT]: R28: 0x0000000480000000	R29: 0x0000000169153d00
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:36 E[JIT]: R30: 0x0000000100fbcd14	SP: 0x0000000169153c50
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:38 E[JIT]: Access Address: 0x0000000000000000
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:39 E[JIT]: PC: 0x0000000100fbcd18
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:41 E[JIT]: Memory Around PC
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:51 E[JIT]: 0x0000000100fbccf8: 3900e509 f85c03a8 3900e11f f85c03a8
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:51 E[JIT]: 0x0000000100fbcd08: 91008100 b85d03a1 97fffa9c 52800028
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:51 E[JIT]: 0x0000000100fbcd18: 39000008 a94b7bfd a94a4ff4 a94957f6
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:51 E[JIT]: 0x0000000100fbcd28: 910303ff d65f03c0 d10083ff a9017bfd
50:44:098 Core/PowerPC/JitArm64/JitArm64_BackPatch.cpp:54 E[JIT]: Full block: 09e50039a8035cf81fe10039a8035cf800810091a1035db89cfaff972800805208000039fd7b4ba9f44f4aa9f65749a9ff030391c0035fd6ff8300d1fd7b01a9
Process 12818 stopped
* thread #22, name = 'CPU-GPU thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000100fbcd18 Dolphin`FramebufferManager::PopulateEFBCache(bool, unsigned int, bool) + 1012
Dolphin`FramebufferManager::PopulateEFBCache:
->  0x100fbcd18 <+1012>: strb   w8, [x0]
    0x100fbcd1c <+1016>: ldp    x29, x30, [sp, #0xb0]
    0x100fbcd20 <+1020>: ldp    x20, x19, [sp, #0xa0]
    0x100fbcd24 <+1024>: ldp    x22, x21, [sp, #0x90]
Target 0: (Dolphin) stopped.
(lldb) bt
* thread #22, name = 'CPU-GPU thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000100fbcd18 Dolphin`FramebufferManager::PopulateEFBCache(bool, unsigned int, bool) + 1012
    frame #1: 0x0000000100fbcdcc Dolphin`FramebufferManager::PeekEFBDepth(unsigned int, unsigned int) + 116
    frame #2: 0x0000000101061e94 Dolphin`Renderer::AccessEFB(EFBAccessType, unsigned int, unsigned int, unsigned int) + 380
    frame #3: 0x0000000100f234a0 Dolphin`AsyncRequests::HandleEvent(AsyncRequests::Event const&) + 404
    frame #4: 0x0000000100f2484c Dolphin`AsyncRequests::PushEvent(AsyncRequests::Event const&, bool) + 64
    frame #5: 0x0000000101153ac8 Dolphin`VideoBackendBase::Video_AccessEFB(EFBAccessType, unsigned int, unsigned int, unsigned int) + 264
    frame #6: 0x0000000100ac60f0 Dolphin`PowerPC::EFB_Read(unsigned int) + 144
    frame #7: 0x0000000100ac6fe8 Dolphin`unsigned int PowerPC::ReadFromHardware<(PowerPC::XCheckTLBFlag)1, unsigned int, false>(unsigned int) + 268
    frame #8: 0x0000000100ac6eb4 Dolphin`PowerPC::Read_U32(unsigned int) + 24
    frame #9: 0x000000017860f710

@JosJuice
Copy link
Member

JosJuice commented Oct 1, 2022

I think this invalid memory access actually happened in code that doesn't belong to the JIT but happens to be running on the same thread as the JIT. You should get a proper stack trace if you disable fastmem. (If you have the debugger enabled, you'll find this option at JIT > Disable Fastmem.) Enabling dual core may also work depending on where in the code the error is happening.

EDIT: Ah, I didn't notice that there actually was a proper stack trace at the end of what you pasted. Ignore my instructions then :)

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

@JosJuice I just updated my comment with a backtrace. Not sure how to disable fastmem from lldb so not sure if the trace is good.

@JosJuice
Copy link
Member

JosJuice commented Oct 1, 2022

It looks correct.

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

Are you sure this only happens with the PR? It crashes in the actual EFB peek code that I hardly changed.

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

It’s definitely related to this PR. I’m checking other games to see if it crashes but it’s only F-Zero ATM. Have a copy of F-Zero to test? Here are my settings in case that has something to do with it.

Screen Shot 2022-10-01 at 10 49 47 AM

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

image

The character portraits are black with master too for me. Pretty sure this is a regression thats unrelated to my PR.

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

Oh what? I just checked master and the select screen is broke but Sand Ocean doesn’t crash. It does with this PR. SW Rogue Leader doesn’t start in Master too.

I missed that select screen because I run a script to test performance and it runs the Sand Ocean intro. Sorry about that.

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

Okay, I think I know why it crashes.

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

The crash should be fixed now. Thanks for testing.

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

Still crashing, same level. The other issue is related to #11094.

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

The other issue is related to #11094.

Unlikely considering that I can reproduce it on x86.

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

Doesn't crash for me anymore. Make sure you actually checked out the updated code.

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

Doesn't crash for me anymore. Make sure you actually checked out the updated code.

Crashes from checking out and building myself. Same thing after downloading the build from here.

Another backtrace:

* thread #22, name = 'CPU-GPU thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1)
    frame #0: 0x0000000100fbcd64 Dolphin`FramebufferManager::PeekEFBDepth(unsigned int, unsigned int) + 132
Dolphin`FramebufferManager::PeekEFBDepth:
->  0x100fbcd64 <+132>: ldrb   w8, [x0, #0x1]
    0x100fbcd68 <+136>: orr    w8, w8, #0x1
    0x100fbcd6c <+140>: strb   w8, [x0, #0x1]
    0x100fbcd70 <+144>: ldrb   w8, [x19, #0x11a]
Target 0: (Dolphin) stopped.
(lldb) bt
* thread #22, name = 'CPU-GPU thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x1)
  * frame #0: 0x0000000100fbcd64 Dolphin`FramebufferManager::PeekEFBDepth(unsigned int, unsigned int) + 132
    frame #1: 0x0000000101061e94 Dolphin`Renderer::AccessEFB(EFBAccessType, unsigned int, unsigned int, unsigned int) + 380
    frame #2: 0x0000000100f2341c Dolphin`AsyncRequests::HandleEvent(AsyncRequests::Event const&) + 404
    frame #3: 0x0000000100f247c8 Dolphin`AsyncRequests::PushEvent(AsyncRequests::Event const&, bool) + 64
    frame #4: 0x0000000101153ac8 Dolphin`VideoBackendBase::Video_AccessEFB(EFBAccessType, unsigned int, unsigned int, unsigned int) + 264
    frame #5: 0x0000000100ac606c Dolphin`PowerPC::EFB_Read(unsigned int) + 144
    frame #6: 0x0000000100ac6f64 Dolphin`unsigned int PowerPC::ReadFromHardware<(PowerPC::XCheckTLBFlag)1, unsigned int, false>(unsigned int) + 268
    frame #7: 0x0000000100ac6e30 Dolphin`PowerPC::Read_U32(unsigned int) + 24
    frame #8: 0x0000000282ac48d4

@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

Different crash, got it.

Massively improves performance in Mario Galaxy on Android.
@K0bin
Copy link
Contributor Author

K0bin commented Oct 1, 2022

Fixed.

@dvessel
Copy link
Contributor

dvessel commented Oct 1, 2022

Great, that fixed it!

Copy link
Contributor

@Pokechu22 Pokechu22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JMC wants this merged so that it can be unleashed on users for testing/bug finding. The code looks good enough to me for that.

I think later on, we'll need to revisit the existing UI for "skip EFB access from CPU" and "defer EFB cache invalidation", and also how this behaves with "defer EFB cache invalidation" enabled versus disabled. But for now, this is good enough.

@JMC47 JMC47 merged commit 4b6086b into dolphin-emu:master Oct 5, 2022
11 checks passed
@K0bin K0bin deleted the refresh-efb-cache branch October 8, 2022 14:06
@K0bin K0bin restored the refresh-efb-cache branch October 16, 2022 18:01
@K0bin K0bin deleted the refresh-efb-cache branch October 29, 2022 00:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
8 participants