New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vulkan: Workaround slow vkCmdCopyImageToBuffer on QCom #11084
Conversation
|
Greatly improves performance on Vulkan. NES VC games run ~100% faster (EFB2RAM ones) A lot of the 3D games are still slower in Vulkan than OpenGL though. Wind Waker island overview during the opening is full speed on OpenGL and is ~25 FPS on Vulkan (Pixel 3a) So, while Vulkan performance is better and greatly improved in some cases, it's still not great for most Adrenos. |
|
Some questions from me: is there a reason users would prefer Vulkan over OGL? Does it make sense to add additional code complexity for a single platform if it doesn't push Vulkan perf over OGL which users can already use? |
|
Wind Waker runs at ~45FPS on my phone with both Vulkan and OpenGL. (with the PR)
I think it's not that much code and it's fairly contained. |
78cdea6
to
182de5e
Compare
5b67941
to
459c4eb
Compare
|
In my non scientific tests, I also noticed a massive improvement with Vulkan on a SD855 (stock Galaxy S10). Windwaker and several other games run much better. Indeed, this PR brought Vulkan very close to OpenGL, which is wonderful! I am very curious to see if PR + #11090 will make things even (e.g., in games like Mario Galaxy and Skyward Sword). Tks a lot for your work, @K0bin. I am sure the community will really appreciate it. |
459c4eb
to
5026159
Compare
d7552e2
to
21e0505
Compare
21e0505
to
81c817c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. I haven't tested it directly though.
|
I can test the latest version to make sure it works, but without other optimizations I won't see the full effect. Seems to still run, so let's start getting the optimizations merged. |


After profiling Mario Galaxy on my Snapdragon 865 phone, I realized that the video thread spends a ridiculous amount of time in
vkCmdCopyImageToBuffer.Turns out the driver allocates a temporary image every single time. After playing around for a bit, I've noticed that it doesn't do that when copying from a linear tiled image. So we can just do that blit ourselves and reuse the image.
EDIT: I should clarify, this is mostly a problem when "Store EFB copies to texture only" is disabled.