New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize GetVertexSize #11066
Optimize GetVertexSize #11066
Conversation
|
I wonder if it's worth it to have an implementation of popcnt in Common which chooses implementation based on cpuid? |
60210c6
to
a31530f
Compare
|
A bit of context for why the code ended up being the way it was: originally, the only code that could get a vertex's size was this code (which was part of the generation logic for the legacy/platform agnostic vertex loader). As part of #9540, I split that off into VertexLoaderBase in 0a906f5, so that I could use it in the fifo analyzer in 73f4e57. But then as part of the same pull request, I wanted to separate components by spaces, so I generalized it into the callback-based version in 77b1cca. In #9718, I decided that just knowing the offsets wasn't enough, and ended up creating some separate code for decoding vertices for the fifo analyzer: f0f12ac, which made the original component sizes function obsolete/unused, and generally quite overcomplicated. |
a31530f
to
00454b1
Compare
|
I fixed the linter warnings and the failed unit test. Interesting back story. I was pretty surprised to find such a long hanging fruit in a project as old and mature as Dolphin. |
b78c66f
to
aeffdce
Compare
|
Oh, I just realized I've missed VertexLoader_Color, |
aeffdce
to
40ae1d3
Compare
GetComponentSizes was unused, so we simplify this and get rid of the branches.
|
Otherwise, this change look good to me. The performance improvement seemed surprising to me, though, since I thought That's 4145 + 1911120 = 1915265 on a single frame, compared to 83 times ever. This can definitely be cached in some way (I don't know if it'd be possible to reduce it to just 83 times with the current code, but it could be recalculated for each CP load instead, at least...). I'll look into doing that. |
40ae1d3
to
0049a82
Compare
0049a82
to
36eb37c
Compare
4145 + 191120 = 195265, or 200k times per frame. Still a lot, though. |
I mistyped that an embarrassingly large number of times (see edit history), and it seems like I still didn't get it right. But yeah, 200k times per frame is still a lot (even if it's less than 2M times per frame, it's still 12M/second at 60FPS). |
36eb37c
to
fdcd2b7
Compare
|
I addressed the latest review comments. |
|
I've also implemented a simple cache for the vertex sizes similar to how it's handled in the VertexLoaderManager and that gets performance up to ~140fps. Not bad at all. :D I'll open a follow up PR once this one is merged. |
|
Cache PR: #11067 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.

This increases performance in Mario Galaxy 1 on the hub space ship from ~85 FPS to ~112 FPS. (5900X, downclocked to 2.2Ghz for testing).