Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize typed coordinate conversion functions for faster Windows unsafe_get_submap_at performance #75376

Merged
merged 1 commit into from
Aug 2, 2024

Conversation

akrieger
Copy link
Member

@akrieger akrieger commented Aug 1, 2024

Summary

Performance "Optimize typed coordinate conversion functions for widespread savings"

Purpose of change

I accidentally left a performance profiling run open for a couple hours. Interestingly, I noticed that what should be a basic math function, project_coord, was top of the list in self CPU at a shockingly high % of total cpu. This tickled my performance hotspot spidey sense so I went digging.

There's a variety of things going on but the basic point is the compiler was not inlining project_coord which by design is supposed to be inlined to take advantage of dead code stripping because the struct has more members than usually needed. This meant wasted computation in a very, very, hot code path.

The net win is map::ter is about 33% faster, map::furn is 25% faster, many other things called less often probably have similar wins. map::ter in particular is called in the render codepath so that makes everything just a bit snappier.

There were, unfortunately, no measurable wins with clang-cl. The generated code was identical (and fairly optimal) before/after my changes.

Describe the solution

  • Force the compiler to inline the function with compiler specific annotations.
  • Optimize the class hierarchy so that 'in bounds' points are a subclass of 'out of bounds' points (really, not-validated-in-bounds points). This allows for taking a const ob& from an ib point, a free operation, instead of a conversion/copy. In turn this allows directly returning the result of project_coord out of get_submap_unsafe
  • Other tweaks to reduce the generated asm size, like taking points by value as arguments. Tripoints fit in an SSE register, and the compiler would not optimize some of these functions more without passing by value (a const ref can still be changed by the caller or through a called function, after all).

Describe alternatives you've considered

  • Being happy with slow less optimized code.

Testing

Hacked up map_bounds_checking test to run 100 times in a loop. Instrumented it enough times to get 3 results within noise of each other, before & after.

image

I also hand expanded the template code into unsafe_get_submap_at to strip all the dead code manually and the generated assembly was functionally identical.

Additional context

There's nontrivial bulk added by two conditionals because at the bottom of the math stack are calls to divide_xy_round_to_minus_infinity. If we can get the inputs to be _ib points, then the math instead uses divide_xy_round_to_minus_infinity_non_negative which is branch free and reduces the asm from 54 lines to 35 lines and no conditional jumps. That's a longer term and harder effort though, and my immediate attempts to do this did not get any significant wins off the bat.

@github-actions github-actions bot added Code: Tests Measurement, self-control, statistics, balancing. [C++] Changes (can be) made in C++. Previously named `Code` labels Aug 1, 2024
@akrieger akrieger marked this pull request as ready for review August 1, 2024 18:12
@github-actions github-actions bot added the json-styled JSON lint passed, label assigned by github actions label Aug 1, 2024
@github-actions github-actions bot added Code: Performance Performance boosting code (CPU, memory, etc.) astyled astyled PR, label is assigned by github actions labels Aug 1, 2024
@akrieger akrieger force-pushed the pointing branch 3 times, most recently from c71f876 to 57ca0ab Compare August 1, 2024 18:27
@akrieger akrieger changed the title Optimize typed coordinate conversion functions for widespread savings mostly in unsafe_get_submap_at Optimize typed coordinate conversion functions for faster Windows unsafe_get_submap_at performane Aug 1, 2024
@akrieger akrieger changed the title Optimize typed coordinate conversion functions for faster Windows unsafe_get_submap_at performane Optimize typed coordinate conversion functions for faster Windows unsafe_get_submap_at performance Aug 1, 2024
@github-actions github-actions bot added the Code: Build Issues regarding different builds and build environments label Aug 1, 2024
@akrieger akrieger force-pushed the pointing branch 2 times, most recently from fccb21b to 214fa25 Compare August 1, 2024 22:31
@kevingranade kevingranade merged commit 0ca32b8 into CleverRaven:master Aug 2, 2024
25 checks passed
@akrieger akrieger deleted the pointing branch August 2, 2024 04:25
@github-actions github-actions bot added the BasicBuildPassed This PR builds correctly, label assigned by github actions label Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
astyled astyled PR, label is assigned by github actions BasicBuildPassed This PR builds correctly, label assigned by github actions [C++] Changes (can be) made in C++. Previously named `Code` Code: Build Issues regarding different builds and build environments Code: Performance Performance boosting code (CPU, memory, etc.) Code: Tests Measurement, self-control, statistics, balancing. json-styled JSON lint passed, label assigned by github actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants