Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rasterizer_cache: Improve validation skip heuristic #69

Merged
merged 1 commit into from Apr 10, 2024

Conversation

raphaelthegreat
Copy link
Collaborator

@raphaelthegreat raphaelthegreat commented Apr 9, 2024

Most 3DS game are relatively well behaved with their VRAM usage, with few framebuffers that don't overlap with each others. However since the system has UMA architecture, nothing stops games from using VRAM in weird ways. In some cases games will try to reuse VRAM by aliasing memory as entirely different textures. This isn't that terrible to handle if the stride stays the same, something that isn't always true.

Luigi's Mansion: Dark Moon will initially use a memory region as a framebuffer with 256 pixel stride, then reuse the same region as a framebuffer with 128 pixel stride. The contents are immediately cleared afterwards so the result doesn't really matter, but this trips up the texture cache and causes a useless and expensive gpu flush per frame.

Kid Icarus also does this. Through the texture cache it shows as various reinterpretations between D24S8 and RGBA8 framebuffers with varying strides. Paper Mario: Sticker Star uses this trick to presumably render the bottom screen. It starts out with 2 color/depth framebuffers with 256 pixel strides and will then reinterpret them as 128, instead of reuse the top framebuffer with a smaller viewport, causing 2 texture flushes per frame.

Arguably the worst case of this aliasing is Spider-Man Edge of Time. I haven't measured how many flushes it causes, but it's probably more than 3 and all of them slow down the game to a crawl making in unplayable.

Citra isn't entirely helpless on that front however. The current heuristic will skip validation if part of the interval is owned by a gpu invalid surface and there is a fill surface overlapping that region:

const bool has_invalid = IntervalHasInvalidPixelFormat(params, interval);

However this heuristic is kinda busted and just happens to work by luck from what I've seen. So in this PR, I have tweaked the heuristic to consider texture strides as well. If the region is partially owned by a gpu invalidated surface that doesn't have the same stride, validation is skipped. This covers a lot of games, but not all (see the comment). In the near future I want to rewrite the validation portion of the code to fully handle texture flushes in the gpu which will allow arbitrary transforms to occur without round-tripping to the cpu.

(Without going into too much detail, the current validation routine in the texture cache suffers from high overhead and lack of flexibility in regards to validation. It will try to find specific surfaces and it if fails, it doesn't consider alternative ways of salvaging gpu data that could be otherwise very usable. A better approach would be to treat validation as a memory operation more like an image copy, with image copies being an optimization for validations that satisfy rectangular bounds)

All in all, this results in a 2x to 4x performance improvements in the games affected. The below tests were carried on my AMD iGPU at 4x resolution to simulate a more GPU bound scenario

Citra (master) Citra (PR)
dark_slow dark_fast
kid_slow kid_fast
paper_slow paper_fast
spider_slow spider_fast

@stepsy
Copy link

stepsy commented Apr 10, 2024

Kid Icarus.
PR:
kid Icarus full speed.jpg

Master:
Kid Icarus slow.jpg

Can definitely see some performance boost.
On Android. But it still does have slowdowns on some stages (especially on land) seems more "playable" compared to Master
Screenshot_20240410_113926
Screenshot_20240410_113311

Maybe it's because I have a low end phone.
Snapdragon 680
Adreno 610

@PabloMK7
Copy link
Owner

Have you tested plugins with this change? Plugins use framebuffers in weird ways, so this kind of changes may break them.

@raphaelthegreat
Copy link
Collaborator Author

Seems to work okay
Kid Icarus Uprising_10 04 24_12 07 11 993

@stepsy
Copy link

stepsy commented Apr 10, 2024

Vapecord
Screenshot_20240410_172231
Screenshot_20240410_172226

@raphaelthegreat
Copy link
Collaborator Author

Merging, as no reported regressions so far and this fixes an issue, so it should be a net positive

@raphaelthegreat raphaelthegreat merged commit 9dfe3eb into PabloMK7:master Apr 10, 2024
12 checks passed
@DonelBueno
Copy link

DonelBueno commented Apr 11, 2024

Luigi's Mansion: Dark Moon crashes on the begning right before Luigi starts being teleported to the first level. Async shader compilation is disabled.

System:

Windows 10 Pro x64
Nvidia GTX 1070, Drivers 552.12

@raphaelthegreat
Copy link
Collaborator Author

Need a log file with debug renderer 😄

@DonelBueno
Copy link

How do I enable the debug renderer?

It only happens in Vulkan, OpenGL is fine.

@stepsy
Copy link

stepsy commented Apr 12, 2024

@DonelBueno
Emulation > Configure > General > Debug
Might have to install the Vulkan SDK.
https://www.lunarg.com/vulkan-sdk/

@DonelBueno
Copy link

citra_log.txt

There you go, @gpucode

@brujo5
Copy link

brujo5 commented Apr 15, 2024

Luigi mansion 2 😋
60FPS
Poco F5
Turnip vulkan

VID_20240415_073805_869.mp4

@LuisPerss
Copy link

Hello, does this update bring performance improvements with Mario 3D Land? And how can I activate debugging so that the emulator works well, I use Vulkan.

@alberto2098kl
Copy link

Que versión de turnip utilizaste y como solucionaste lo del audio

@carlosgamer23
Copy link

Miúdo Icarus. PR: kid Icarus velocidade máxima.jpg

Mestre: Kid Icarus slow.jpg

Pode definitivamente ver algum aumento de desempenho. No Android. Mas ainda tem lentidão em alguns estágios (especialmente em terra) parece mais "jogável" em comparação com o Master Captura de tela_20240410_113926 Captura de tela_20240410_113311

Talvez seja porque tenho um telemóvel low-end. Snapdragon 680 Adreno 610

Note 12 or 13? You can use a Turnip driver from K1MCH1: https://github.com/K11MCH1/AdrenoToolsDrivers/releases?page=2

Use Qualcoom 615.77, it will help you gain speed in emulation!
Turnip 24 Rev 18 is the best for you, but the emulator does not support it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inazuma Eleven GO Shine shows incorrect background during dialogue that shows the speakers's 3D models
8 participants