Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve hash calculation cache pool #1246

Merged
merged 10 commits into from
Feb 26, 2020

Conversation

mar-kolya
Copy link
Contributor

No description provided.

Using potentially very large number for a mod is probably not very effective
Seems like this may have odd sideeffects
@mar-kolya mar-kolya requested a review from a team as a code owner February 21, 2020 17:29
@mar-kolya
Copy link
Contributor Author

Looks like this shaves ~100ms of load time on test app

@@ -140,7 +140,7 @@ final long approximateSize() {
this.loaderRef = loaderRef;
this.className = className;

hashCode = (int) (31 * this.loaderHash) ^ className.hashCode();
hashCode = 31 * this.loaderHash + className.hashCode();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice -- did these two changes reduce the collision rate?
Mostly just curious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't check, but I did notice that before equals falled down to string comparison way too often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that might have more to do with the loader hash. I think the change of the bootstrap hash is good, but I'm not so sure about the other parts.

Here, I was using a variation on FNV: https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

I suspect this can be improved, but I'm not so sure that switching from ^ to + is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think + is used by java.util.Arrays, and 'promoted' by things like Guava (java.util.Arrays has a bit different formula, but in our case I think it will produce only constant difference).

As far as I can see FNV performs all calculation on bytes and then 'collects' result in int - ensuring mixing within values. I'm not sure how different this is from current implementation in this code. Also FNV uses 'carefully selected' prime and offset.

FWIW in my quick test on 1000 iterations of hash(r.nextInt() % 10, r.nextInt() % 100) new implementation routinely produces better results (less collisions) than old one.


TypeCacheKey that = (TypeCacheKey) obj;
if (this == obj) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this case get exercised? I believe the outer layer always creates a new TypeCacheKey, so this probably doesn't get exercised much.

I did try a separate branch where I used ImmutableTypeCacheKeys for put and used a thread local MutableTypeCacheKey for look-up. That did slightly reduce allocation but the allocation from evicting and then rematerializing still dominated and there was no measurable reduction in GCs or impact on start-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think I have direct data on this... would you like me to remove this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think is helping -- probably hurting slightly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be optimized away eventually by JVM, but I've removed it anyway.

@mar-kolya mar-kolya requested a review from a team February 22, 2020 02:17

if (loaderHash != that.loaderHash) return false;
if (hashCode != that.hashCode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mostly assuming that hashCode comparisons had already been done.
The idea behind the loaderHash comparison was that it provides a fast exit for some hash collisions.

return false;
}

if (className.equals(that.className)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this part is debatable but still quite deliberate.
The loaderRef reference equivalence check was placed before the className.equals check because it is faster.

The order is reversed for the slow path because I want to avoid calling Reference.get whenever possible.
The reason being that concurrency GCs will "strengthen" the reference on a get call.
https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/References.java has a nice explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think performance wise this check is equivalent.
It was:

  • equivalence check for references
  • immediately after equals on class name in both branches

Now:

  • Equals in classname first
  • equivalence check if first passes.

So if anything we save equivalence check in some cases. Strings were compared anyway.

Copy link
Contributor

@dougqh dougqh Feb 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a fair point -- and if we put back the loaderHash check that covers a fast exit for loader mismatch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put that back

@mar-kolya mar-kolya force-pushed the mar-kolya/improve-hash-calclulation-cache-pool branch from 57ba9df to 82dd2aa Compare February 25, 2020 20:48
Copy link
Contributor

@dougqh dougqh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, change looks good to me as is. I'm assuming we're still seeing the 0.1sec start-up improvement with Spring Boot.

@mar-kolya mar-kolya merged commit 338f517 into master Feb 26, 2020
@mar-kolya mar-kolya deleted the mar-kolya/improve-hash-calclulation-cache-pool branch February 26, 2020 15:22
@randomanderson randomanderson added this to the 0.44.0 milestone Feb 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants