New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize DDCachingPoolStrategy #983
Conversation
...t/agent-tooling/src/test/groovy/datadog/trace/agent/tooling/EvictingCacheProviderTest.groovy
Outdated
Show resolved
Hide resolved
.expireAfterAccess(expireDuration, unit) | ||
.initialCapacity(100) // Per classloader, so we want a small default. | ||
.softValues() // Allow objects to be managed and removed by the VM as needed. | ||
// https://blog.jessitron.com/2011/10/06/keeping-your-cache-down-to-size/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the downside of this solution might be that reported used heap size will not go down unless JVM is under memory pressure - which may be confusing from monitoring perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may well be the current behavior that triggered this PR. We were debugging why a test application I was running has an additional 160MB in old gen when the agent is installed.
Actually, I guess another problem was that the Cleaner didn't appear to be running for the largest cache (based on breakpoints), and so the cache never evicted entries.
@@ -20,4 +22,18 @@ public static void awaitGC(final WeakReference<?> ref) throws InterruptedExcepti | |||
System.runFinalization(); | |||
} | |||
} | |||
|
|||
public static void forceSoftReferenceCollection() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may potentially have an undesired effect on the whole CI container if multiple parallel workers do this at the same time since we usually have vastly overprovisioned heap size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is another reason I was suggesting we remove that test.
I took a deeper look at the heap dump from yesterday. This is the view produced by selecting "Path to GC Objects" with all references in Eclipse MAT. There are really only 2 paths here, one from the So, technically the whole cache can be GC'ed, but because the cached values made it to old gen, the JVM isn't GCing the weak references (my reading suggested that it only does so on a Full GC). Given @mar-kolya's comment, we may run into similar issues using The existing strategy of using the cleaner task does seems sound, but we'd need to track down whether:
|
I can reproduce the large used heap (without this PR) locally by setting a large -Xmx and -Xms. I also observe that forcing a full GC causes the heap size to drop. The caches left over after gc are very small |
@mightyguava Maybe the problem you were seeing is that the cleaner was only being executed once per minute and it was being drowned out by all the other tiny caches. I my testing confirmed that the cache was getting cleaned out with the previous system. My theory is that some of the unreclaimable overhead was caused by the large default cache sizes, which is why I lowered that. It sounds like there are other problems associated with using soft references for caches, specifically around causing more work during stop-the-world collections. For that reason, I think I'm going to amend this PR to go back to using the cleaner with max size and age, but keep the other optimization. |
Ok, it sounds like there aren't any major problems without this PR and I'm not really blocked continuing work on enabling tracing. The heap usage would be fixed by a stop the world GC. That's great!
That's unlikely the case. I had the same thought yesterday, so I added a condition of |
Reduce default cache size in EvictingCacheProvider, and return a NoOp CacheProvider when classloader is going to be skipped.
ca594b3
to
988b03b
Compare
Ok, I've updated this branch with only the optimizations and reverted the change to use soft references. @mightyguava, thanks for digging into this. Let me know if you see problems persisting. |
Reduce default cache size in EvictingCacheProvider, and return a NoOp CacheProvider when classloader is going to be skipped.