New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jit: Optimize block link queries by using hash tables #9666
Conversation
Repeated erase() + iteration on a std::multimap is extremely slow. Slow enough that it causes a 7 second long stutter during some transitions in F-Zero X (a N64 VC game that triggers many, many icache invalidations). And slow enough that JitBaseBlockCache::DestroyBlock shows up on a flame graph as taking >50% of total CPU time on the CPU-GPU thread: https://i.imgur.com/vvqiFL6.png This commit optimises those block link queries by replacing the std::multimap (which is typically implemented with red-black trees) with hash tables. Master: https://i.imgur.com/vvqiFL6.png / 7s stutters (starting from 5.0-2021 and with branch following disabled) This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which is pretty close to 5.0 stable. (5.0-2021 introduced the performance regression and it is especially noticeable when branch following is disabled, which is the case for all N64 VC games since 5.0-8377.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improves N64 VC. Instant Approve.
I mean if it's not difficult to implement and maintain, I would go for it anyway 🤷♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable. Untested
That would require adding an external library, which is not trivial at the moment since the Windows builder doesn't use CMake :) I'm also not sure it's really worth it. |
That makes sense, hopefully CMake can be implemented to the builders soon. Imho, it is still worth it, because it would get Dolphin closer to being viable for speedrunning F-Zero X |
Repeated erase() + iteration on a std::multimap is extremely slow. Slow enough that it causes a 7 second long stutter during some transitions in F-Zero X (a N64 VC game that triggers many, many icache invalidations).
And slow enough that JitBaseBlockCache::DestroyBlock shows up on a flame graph as taking >50% of total CPU time on the CPU-GPU thread.
Master:
This PR optimises those block link queries by replacing the std::multimap (which is typically implemented with red-black trees) with hash tables, which seem better suited to our repeated insert/erase/iteration workload.
PR: ~0.7s stutters (10x less), which is pretty close to 5.0 stable. (5.0-2021 introduced the performance regression and it is especially noticeable when branch following is disabled, which is the case for all N64 VC games since 5.0-8377.)
I briefly considered using a faster implementation (e.g. Swiss tables) since this is a pretty hot path, but that "only" led to a 4-5% improvement (stutters were just 30ms shorter)...