improve hash calculation cache pool #1246

mar-kolya · 2020-02-21T17:29:45Z

No description provided.

Using potentially very large number for a mod is probably not very effective

And this is very easy to check

Seems like this may have odd sideeffects

mar-kolya · 2020-02-21T17:39:45Z

Looks like this shaves ~100ms of load time on test app

dougqh · 2020-02-21T18:07:35Z

...ava-agent/agent-tooling/src/main/java/datadog/trace/agent/tooling/DDCachingPoolStrategy.java

@@ -140,7 +140,7 @@ final long approximateSize() {
      this.loaderRef = loaderRef;
      this.className = className;

-      hashCode = (int) (31 * this.loaderHash) ^ className.hashCode();
+      hashCode = 31 * this.loaderHash + className.hashCode();


Nice -- did these two changes reduce the collision rate?
Mostly just curious

I didn't check, but I did notice that before equals falled down to string comparison way too often.

Yes, that might have more to do with the loader hash. I think the change of the bootstrap hash is good, but I'm not so sure about the other parts.

Here, I was using a variation on FNV: https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function

I suspect this can be improved, but I'm not so sure that switching from ^ to + is a good idea.

I think + is used by java.util.Arrays, and 'promoted' by things like Guava (java.util.Arrays has a bit different formula, but in our case I think it will produce only constant difference).

As far as I can see FNV performs all calculation on bytes and then 'collects' result in int - ensuring mixing within values. I'm not sure how different this is from current implementation in this code. Also FNV uses 'carefully selected' prime and offset.

FWIW in my quick test on 1000 iterations of hash(r.nextInt() % 10, r.nextInt() % 100) new implementation routinely produces better results (less collisions) than old one.

dougqh · 2020-02-21T18:11:03Z

...ava-agent/agent-tooling/src/main/java/datadog/trace/agent/tooling/DDCachingPoolStrategy.java


-      TypeCacheKey that = (TypeCacheKey) obj;
+      if (this == obj) {


Does this case get exercised? I believe the outer layer always creates a new TypeCacheKey, so this probably doesn't get exercised much.

I did try a separate branch where I used ImmutableTypeCacheKeys for put and used a thread local MutableTypeCacheKey for look-up. That did slightly reduce allocation but the allocation from evicting and then rematerializing still dominated and there was no measurable reduction in GCs or impact on start-up.

I do not think I have direct data on this... would you like me to remove this?

Yes, I don't think is helping -- probably hurting slightly.

I think it will be optimized away eventually by JVM, but I've removed it anyway.

…pool

We have to check string equivalence regardless of classloader state

dougqh · 2020-02-25T19:01:41Z

...agent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/DDCachingPoolStrategy.java


-      if (loaderHash != that.loaderHash) return false;
+      if (hashCode != that.hashCode) {


I was mostly assuming that hashCode comparisons had already been done.
The idea behind the loaderHash comparison was that it provides a fast exit for some hash collisions.

dougqh · 2020-02-25T19:12:56Z

...agent-tooling/src/main/java/datadog/trace/agent/tooling/bytebuddy/DDCachingPoolStrategy.java

+        return false;
+      }
+
+      if (className.equals(that.className)) {


Yes, this part is debatable but still quite deliberate.
The loaderRef reference equivalence check was placed before the className.equals check because it is faster.

The order is reversed for the slow path because I want to avoid calling Reference.get whenever possible.
The reason being that concurrency GCs will "strengthen" the reference on a get call.
https://github.com/real-logic/agrona/blob/master/agrona/src/main/java/org/agrona/References.java has a nice explanation.

I think performance wise this check is equivalent.
It was:

equivalence check for references

immediately after equals on class name in both branches

Now:

Equals in classname first

equivalence check if first passes.

So if anything we save equivalence check in some cases. Strings were compared anyway.

Yes, that's a fair point -- and if we put back the loaderHash check that covers a fast exit for loader mismatch.

I've put that back

This reverts commit 50793e5.

dougqh

Okay, change looks good to me as is. I'm assuming we're still seeing the 0.1sec start-up improvement with Spring Boot.

mar-kolya added 4 commits February 21, 2020 12:22

Fix hashcode calculation in TypeCacheKey

9d7682f

Using potentially very large number for a mod is probably not very effective

TypeCacheKey are different if hash codes are different

f736c42

And this is very easy to check

Do not use zero for hashcode

d4c6d86

Seems like this may have odd sideeffects

Add equivalence check to TypeCacheKey

00c268e

mar-kolya requested a review from a team as a code owner February 21, 2020 17:29

tylerbenson requested a review from dougqh February 21, 2020 17:44

dougqh reviewed Feb 21, 2020

View reviewed changes

mar-kolya requested a review from a team February 22, 2020 02:17

mar-kolya added 4 commits February 25, 2020 10:18

Merge branch 'master' into mar-kolya/improve-hash-calclulation-cache-…

41db97e

…pool

Remove reference check from TypeCacheKey

31b5652

Simplify TypePoolCacheKey equals

aefcc47

We have to check string equivalence regardless of classloader state

Make sure that same classloaders get same weak ref

50793e5

dougqh reviewed Feb 25, 2020

View reviewed changes

Revert "Make sure that same classloaders get same weak ref"

82dd2aa

This reverts commit 50793e5.

mar-kolya force-pushed the mar-kolya/improve-hash-calclulation-cache-pool branch from 57ba9df to 82dd2aa Compare February 25, 2020 20:48

Compare loader hashes

96f74d0

dougqh approved these changes Feb 26, 2020

View reviewed changes

mar-kolya merged commit 338f517 into master Feb 26, 2020

mar-kolya deleted the mar-kolya/improve-hash-calclulation-cache-pool branch February 26, 2020 15:22

randomanderson added this to the 0.44.0 milestone Feb 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve hash calculation cache pool #1246

improve hash calculation cache pool #1246

mar-kolya commented Feb 21, 2020

mar-kolya commented Feb 21, 2020

dougqh Feb 21, 2020

mar-kolya Feb 21, 2020

dougqh Feb 24, 2020

mar-kolya Feb 25, 2020

dougqh Feb 21, 2020

mar-kolya Feb 21, 2020

dougqh Feb 24, 2020

mar-kolya Feb 25, 2020

dougqh Feb 25, 2020

dougqh Feb 25, 2020

mar-kolya Feb 25, 2020

dougqh Feb 25, 2020 •

edited

mar-kolya Feb 26, 2020

dougqh left a comment


		if (loaderHash != that.loaderHash) return false;
		if (hashCode != that.hashCode) {

improve hash calculation cache pool #1246

improve hash calculation cache pool #1246

Conversation

mar-kolya commented Feb 21, 2020

mar-kolya commented Feb 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougqh Feb 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougqh left a comment

Choose a reason for hiding this comment

dougqh Feb 25, 2020 •

edited