Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save an instruction in EntityHasher #10648

Merged
merged 2 commits into from Nov 28, 2023
Merged

Conversation

scottmcm
Copy link
Contributor

Objective

Keep essentially the same structure of EntityHasher from #9903, but rephrase the multiplication slightly to save an instruction.

cc @superdump
Discord thread: https://discord.com/channels/691052431525675048/1172033156845674507/1174969772522356756

Solution

Today, the hash is

        self.hash = i | (i.wrapping_mul(FRAC_U64MAX_PI) << 32);

with i being (generation << 32) | index.

Expanding things out, we get

i | ( (i * CONST) << 32 )
= (generation << 32) | index | ((((generation << 32) | index) * CONST) << 32)
= (generation << 32) | index | ((index * CONST) << 32)  // because the generation overflowed
= (index * CONST | generation) << 32 | index

What if we do the same thing, but with + instead of |? That's almost the same thing, except that it has carries, which are actually often better in a hash function anyway, since it doesn't saturate. (| can be dangerous, since once something becomes -1 it'll stay that, and there's no mixing available.)

(index * CONST + generation) << 32 + index
= (CONST << 32 + 1) * index + generation << 32
= (CONST << 32 + 1) * index + (WHATEVER << 32 + generation) << 32 // because the extra overflows and thus can be anything
= (CONST << 32 + 1) * index + ((CONST * generation) << 32 + generation) << 32 // pick "whatever" to be something convenient
= (CONST << 32 + 1) * index + ((CONST << 32 + 1) * generation) << 32
= (CONST << 32 + 1) * index +((CONST << 32 + 1) * (generation << 32)
= (CONST << 32 + 1) * (index + generation << 32)
= (CONST << 32 + 1) * (generation << 32 | index)
= (CONST << 32 + 1) * i

So we can do essentially the same thing using a single multiplication instead of doing multiply-shift-or.

LLVM was already smart enough to merge the shifting into a multiplication, but this saves the extra or:
image
https://rust.godbolt.org/z/MEvbz4eo4

It's a very small change, and often will disappear in load latency anyway, but it's a couple percent faster in lookups:
image

(There was more of an improvement here before #10558, but with to_bits being a single qword load now, keeping things mostly as it is turned out to be better than the bigger changes I'd tried in #10605.)


Changelog

(Probably skip it)

Migration Guide

(none needed)

@ItsDoot ItsDoot added C-Performance A change motivated by improving speed, memory usage or compile times A-Utils Utility functions and types labels Nov 20, 2023
Copy link
Member

@james7132 james7132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, with the appropriate tests and documentation all the way through. One thing that did bother me about #10605 was that the stress tests seemed to show improvements that were potentially within the margin of error. I know this is asking more from an already thorough PR, but could you rerun those stress tests to see if the improvements are still there?

Copy link
Member

@mockersf mockersf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and also rebase or merge main in to get the latest change in CI)

@james7132 james7132 added the S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it label Nov 25, 2023
@alice-i-cecile
Copy link
Member

Can you please rebase / merge in main to get the latest CI check running on your PR?

@mockersf mockersf added this pull request to the merge queue Nov 28, 2023
Merged via the queue into bevyengine:main with commit a902ea6 Nov 28, 2023
22 checks passed
@scottmcm scottmcm deleted the faster-hasher branch November 28, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Utils Utility functions and types C-Performance A change motivated by improving speed, memory usage or compile times S-Ready-For-Final-Review This PR has been approved by the community. It's ready for a maintainer to consider merging it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants