Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize GetVertexSize #11066

Merged
merged 2 commits into from Sep 15, 2022
Merged

Optimize GetVertexSize #11066

merged 2 commits into from Sep 15, 2022

Conversation

K0bin
Copy link
Contributor

@K0bin K0bin commented Sep 14, 2022

This increases performance in Mario Galaxy 1 on the hub space ship from ~85 FPS to ~112 FPS. (5900X, downclocked to 2.2Ghz for testing).

@shuffle2
Copy link
Contributor

I wonder if it's worth it to have an implementation of popcnt in Common which chooses implementation based on cpuid?

@Pokechu22
Copy link
Contributor

A bit of context for why the code ended up being the way it was: originally, the only code that could get a vertex's size was this code (which was part of the generation logic for the legacy/platform agnostic vertex loader). As part of #9540, I split that off into VertexLoaderBase in 0a906f5, so that I could use it in the fifo analyzer in 73f4e57. But then as part of the same pull request, I wanted to separate components by spaces, so I generalized it into the callback-based version in 77b1cca. In #9718, I decided that just knowing the offsets wasn't enough, and ended up creating some separate code for decoding vertices for the fifo analyzer: f0f12ac, which made the original component sizes function obsolete/unused, and generally quite overcomplicated.

@K0bin
Copy link
Contributor Author

K0bin commented Sep 14, 2022

I fixed the linter warnings and the failed unit test.

Interesting back story. I was pretty surprised to find such a long hanging fruit in a project as old and mature as Dolphin.

@K0bin K0bin force-pushed the vertex-size-opt branch 2 times, most recently from b78c66f to aeffdce Compare September 14, 2022 23:09
@K0bin
Copy link
Contributor Author

K0bin commented Sep 14, 2022

Oh, I just realized I've missed VertexLoader_Color,

GetComponentSizes was unused, so we simplify this and get rid
of the branches.
@Pokechu22
Copy link
Contributor

Pokechu22 commented Sep 15, 2022

Otherwise, this change look good to me. The performance improvement seemed surprising to me, though, since I thought GetVertexSize was only getting called when the vertex loader was created (so it being slow would have an impact for each new unique vertex format, but once those are all created it'd be fine; this would happen if RefreshLoader creates a new vertex loader). But, that's wrong; in b5fd35f I also added a call to it in OpcodeDecoding.h, here. That means it would get called for every single primitive command the game uses. We can see how many of them that is using graphics → advanced → show statistics:

image

That's 4145 + 1911120 = 1915265 on a single frame, compared to 83 times ever. This can definitely be cached in some way (I don't know if it'd be possible to reduce it to just 83 times with the current code, but it could be recalculated for each CP load instead, at least...). I'll look into doing that.

@kode54
Copy link

kode54 commented Sep 15, 2022

That's 4145 + 1911120 = 1915265 on a single frame

4145 + 191120 = 195265, or 200k times per frame. Still a lot, though.

@Pokechu22
Copy link
Contributor

4145 + 191120 = 195265, or 200k times per frame. Still a lot, though.

I mistyped that an embarrassingly large number of times (see edit history), and it seems like I still didn't get it right. But yeah, 200k times per frame is still a lot (even if it's less than 2M times per frame, it's still 12M/second at 60FPS).

@K0bin
Copy link
Contributor Author

K0bin commented Sep 15, 2022

I addressed the latest review comments.

@K0bin
Copy link
Contributor Author

K0bin commented Sep 15, 2022

I've also implemented a simple cache for the vertex sizes similar to how it's handled in the VertexLoaderManager and that gets performance up to ~140fps. Not bad at all. :D

I'll open a follow up PR once this one is merged.

@K0bin
Copy link
Contributor Author

K0bin commented Sep 15, 2022

Cache PR: #11067

Copy link
Contributor

@Pokechu22 Pokechu22 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@JMC47 JMC47 merged commit 32fba6d into dolphin-emu:master Sep 15, 2022
11 checks passed
@K0bin K0bin deleted the vertex-size-opt branch September 15, 2022 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants