New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Buffer Mirrors for MacOS performance #4899
Conversation
Download the artifacts for this pull request: Experimental GUI (Avalonia)GUI-less (SDL2)Only for Developers
|
On an 13" M1 MBP, this works well for me. Get about ~7FPS more than the current main branch for recent game, still under 30fps but still playable. |
Thank you for posting this! It seems to stutter slightly more severely than macos1, but fixes many issues, and once the stutters level off, 30fps in TotK is very stable (on my M1 Max, at least). Appreciate it and look forward to the rest of the upstreaming. |
Thank you. When can I expect the merge to the main channel release?
Téléchargez Outlook pour iOS<https://aka.ms/o0ukef>
…________________________________
De : jcm ***@***.***>
Envoyé : Friday, May 12, 2023 1:03:38 PM
À : Ryujinx/Ryujinx ***@***.***>
Cc : Lucas Gao ***@***.***>; Manual ***@***.***>
Objet : Re: [Ryujinx/Ryujinx] Vulkan: Buffer Mirrors for MacOS performance (PR #4899)
Thank you for posting this! It seems to stutter slightly more severely than macos1, but fixes many issues, and once the stutters level off, 30fps is very stable. Appreciate it and look forward to the rest of the upstreaming.
—
Reply to this email directly, view it on GitHub<#4899 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AC5XQ7HW7GJE35HSTEK4YB3XFZUOVANCNFSM6AAAAAAX63JWLI>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
Got one crash using this PR in ToTK, could not isolate what caused it to crash, but here is the log, hope it helps. |
This PR should probably be added to #4062? |
Tried for a few minutes and it seems to run a tiny bit smoother on M1 Pro. The depths are still about 20fps though and the ground glitches a little in that area. Will keep testing and report anything interesting :) |
c82b92e
to
ef5d45a
Compare
ef5d45a
to
aea35f4
Compare
Thank you for your efforts for Mac players, it has quite a significant efficiency boost on my M1 Pro, I wonder if you can fix the white spot issue that has been present on Mac, thanks again |
The oldest Version is the fastest for me in TOTK. 10-15fps more then the newest. |
So I got rid of my shader cache (I had been using the same for like three weeks), and I can now run the game at a relatively stable 30fps pretty much everywhere, which finally makes the depths playable! 🎉 Apart from all the glitches, this is starting to be a pretty good experience! |
Idk if this will help anyone or not, but i wanted to share my experience with this game on Ryujinx so far anyway. Hardware: M1 MacBook Air (the binned 7core gpu version) I plug it into my display and have an old heatsink just sitting in the right spot on the bottom of the laptop to keep it from thermal throttling, which works surprisingly well without a fan even: Been playing Zelda TOTK on original 1.0 version of ryujinx. I am also using "fps++30 mod" and the "VisualFixes 1008p_FXAA_OFF_AO_FIX" mod. Ran into major graphics issues in "the depths" part of the game. The floor was missing, weird angled walls that blocked any sort of view, etc. Found developer build 1.1.805 and was able to get through the depth level with out graphics issues. Updated to 1.1.808 and it seems that overall performance is reduced to around 17-23 fps. I stumbled across this thread, and tried this #4899 or Ryujinx 1.1.0+87139ac version.... and man, its working great. Solid 30fps with little to no drop and no glitches. Idk who is responsible for this build version but thank you x10. Running beautifully and smoothly. |
@coyote0770 just out of curiosity, do you have this issue? #4949 |
The white glowing stuff, yes...sometimes...The hair thing wiggin out in the menus, yes, unless Link is wearing a headband for whatever reason. The shadows of Link on the ground are messed up like that still, yes, I never even noticed it until recently when my gf pointed it out. I did have the chuchus glitch out weird on previous ryujinx versions but it seems okay now. Most of these things don't bother me as it doesn't affect gameplay. Im just happy to have it running and running smoothly on my base-tier mac :) |
46dfadd
to
7c7b055
Compare
Looks like the latest rebase has significant issues with culling and actor updates. |
GPUs don't work in "culling and actor updates", can you be more specific? |
Sure, pardon the discord links, but here are some clips from the latest CI builds (7d970aa here): https://cdn.discordapp.com/attachments/1045857828360949830/1110278475861987368/totk.mp4 https://cdn.discordapp.com/attachments/1045857828360949830/1110279509464338523/totk2.mp4 Neither of these issues were present in prior CI builds from this thread. |
7c7b055
to
5974e86
Compare
test in Fire Emblem: Three Houses. have one strange rendering issue: Screen.Recording.2023-05-23.at.23.44.04.movconfirmed it not exist in release 1.1.819 |
2ee44be
to
52799b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks! Really impressive the performance improvement this brings to some titles.
Crash just occurred running the latest build - 1.1.0+e3fcd35. In TOTK, I was using a travel medallion for the first time and teleporting to a sky view tower. I believe it crashed during the loading screen during the teleport. Log attached. Switch Firmware: 16.0.3 TOTK version: 1.1.0 |
Generally slower right now, goal is to reduce render passes in games that do inline updates Fix support buffer mirrors Reintroduce vertex buffer mirror Add storage buffer support Optimisation part 1 More optimisation Avoid useless data copies. Remove unused cbIndex stuff Properly set write flag for storage buffers. Fix minor issues Not sure why this was here. Fix BufferRangeList Fix some big issues Align storage buffers rather than getting full buffer as a range Improves mirrorability of read-only storage buffers Increase staging buffer size, as it now contains mirrors Fix some issues with buffers not updating Fix buffer SetDataUnchecked offset for one of the paths when using mirrors Fix buffer mirrors interaction with buffer textures Fix mirror rebinding Move GetBuffer calls on indirect draws before BeginRenderPass to avoid draws without render pass Fix mirrors rebase Fix rebase 2023
Similar to `Get` with a size that's too large, just treat it as a clamp.
Enabled on all platforms for the IbStreamer.
Probably macos related?
cb9312a
to
e4d4ff4
Compare
* Initial implementation of buffer mirrors Generally slower right now, goal is to reduce render passes in games that do inline updates Fix support buffer mirrors Reintroduce vertex buffer mirror Add storage buffer support Optimisation part 1 More optimisation Avoid useless data copies. Remove unused cbIndex stuff Properly set write flag for storage buffers. Fix minor issues Not sure why this was here. Fix BufferRangeList Fix some big issues Align storage buffers rather than getting full buffer as a range Improves mirrorability of read-only storage buffers Increase staging buffer size, as it now contains mirrors Fix some issues with buffers not updating Fix buffer SetDataUnchecked offset for one of the paths when using mirrors Fix buffer mirrors interaction with buffer textures Fix mirror rebinding Move GetBuffer calls on indirect draws before BeginRenderPass to avoid draws without render pass Fix mirrors rebase Fix rebase 2023 * Fix crash when using stale vertex buffer Similar to `Get` with a size that's too large, just treat it as a clamp. * Explicitly set support buffer as mirrorable * Address feedback * Remove unused fragment of MVK workaround * Replace logging for staging buffer OOM * Address format issues * Address more format issues * Mini cleanup * Address more things * Rename BufferRangeList * Support bounding range for ClearMirrors and UploadPendingData * Add maximum size for vertex buffer mirrors * Enable index buffer mirrors Enabled on all platforms for the IbStreamer. * Feedback * Remove mystery BufferCache change Probably macos related? * Fix mirrors not creating when staging buffer is empty. * Change log level to debug
This PR brings MacOS ryu up to the level of performance seen with the
macos1
build, but obviously still missing geometry shaders, etc.Problem
Guest GPU methods such as the constant buffer updater and inline2memory are used to load arbitrary data into buffer memory at any time. This can be done between draws in a render pass with no additional barriers. When a game uses a constant buffer update that changes a buffer used in a shader, the data must be uploaded each time it's used. Occasionally, either the driver or games queue many updates that are used by the next draw in sequence. Even we update a buffer between draws sometimes - the support buffer that contains things like texture scale.
Vulkan doesn't have an inline data update method like this - you need to be outside a render pass and either perform a buffer copy, update buffer command or compute write. Interrupting a render pass isn't too bad on desktop - you just need to pay for barriers to avoid changing the data while a draw is in progress, and a little bit for mostly useless Vulkan calls. However, on mobile GPUs (which are typically tiled) there is huge cost to ending a render pass.
On tiled GPUs, render passes load texture data from shared memory into tile memory, then store that back into shared memory when the render pass ends. This avoids straining shared memory bandwidth, at the cost of a full copy in and out at the start and end of each render pass. Games written with tiled GPUs in mind can sometimes ignore the data in shared memory (clear/ignore) and/or have it live entirely in tile memory to avoid storing back, but we don't get that luxury as all our textures must exist in shared memory and we have no idea how they will be used in the future.
The result is that consecutive inline updates end and begin the render pass repeatedly, which results in expensive loads and stores potentially each draw. The bigger and more numerous the render targets, the more issue this causes as the data must be copied back and forth every time. This obviously depends on the game, as some games don't read cbu results from the shader, but when it happens you will definitely feel it.
Solution
To avoid both barriers and render pass splits, we can instead "mirror" entire bindings by copying their current state into a temporary buffer and binding that instead. This trades inline updates on the GPU for data copies on the CPU and binding swaps. The buffer data we make copies of ends up being way smaller than the framebuffer copies, so a lot of memory bandwidth and time is saved in the majority of cases.
On Apple Silicon GPUs this can greatly increase performance depending on the game's behaviour.
Limitations
Misc changes