Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize DDCachingPoolStrategy #983

Merged
merged 1 commit into from Sep 11, 2019
Merged

Optimize DDCachingPoolStrategy #983

merged 1 commit into from Sep 11, 2019

Conversation

tylerbenson
Copy link
Contributor

@tylerbenson tylerbenson commented Sep 10, 2019

Reduce default cache size in EvictingCacheProvider, and return a NoOp CacheProvider when classloader is going to be skipped.

@tylerbenson tylerbenson requested a review from a team as a code owner September 10, 2019 03:17
.expireAfterAccess(expireDuration, unit)
.initialCapacity(100) // Per classloader, so we want a small default.
.softValues() // Allow objects to be managed and removed by the VM as needed.
// https://blog.jessitron.com/2011/10/06/keeping-your-cache-down-to-size/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the downside of this solution might be that reported used heap size will not go down unless JVM is under memory pressure - which may be confusing from monitoring perspective.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may well be the current behavior that triggered this PR. We were debugging why a test application I was running has an additional 160MB in old gen when the agent is installed.

Actually, I guess another problem was that the Cleaner didn't appear to be running for the largest cache (based on breakpoints), and so the cache never evicted entries.

@@ -20,4 +22,18 @@ public static void awaitGC(final WeakReference<?> ref) throws InterruptedExcepti
System.runFinalization();
}
}

public static void forceSoftReferenceCollection() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may potentially have an undesired effect on the whole CI container if multiple parallel workers do this at the same time since we usually have vastly overprovisioned heap size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is another reason I was suggesting we remove that test.

@mightyguava
Copy link

I took a deeper look at the heap dump from yesterday. This is the view produced by selecting "Path to GC Objects" with all references in Eclipse MAT. There are really only 2 paths here, one from the typePoolCache and the other from the Cleaner. Both paths are via weak references.

So, technically the whole cache can be GC'ed, but because the cached values made it to old gen, the JVM isn't GCing the weak references (my reading suggested that it only does so on a Full GC).

image

Given @mar-kolya's comment, we may run into similar issues using softValues(). (the heap usage will still be reported high, even though this space can be reclaimed). Did you find that the JVM released the heap space during local testing?

The existing strategy of using the cleaner task does seems sound, but we'd need to track down whether:

  1. The Cleaner was running (breakpoints I set didn't trigger properly), but the cache entries are not being evicted, and why?
  2. The Cleaner not running for the cache, why?

@mightyguava
Copy link

(my reading suggested that it only does so on a Full GC).

I can reproduce the large used heap (without this PR) locally by setting a large -Xmx and -Xms. I also observe that forcing a full GC causes the heap size to drop. The caches left over after gc are very small

@tylerbenson
Copy link
Contributor Author

@mightyguava Maybe the problem you were seeing is that the cleaner was only being executed once per minute and it was being drowned out by all the other tiny caches. I my testing confirmed that the cache was getting cleaned out with the previous system. My theory is that some of the unreclaimable overhead was caused by the large default cache sizes, which is why I lowered that.

It sounds like there are other problems associated with using soft references for caches, specifically around causing more work during stop-the-world collections. For that reason, I think I'm going to amend this PR to go back to using the cleaner with max size and age, but keep the other optimization.

@mightyguava
Copy link

Ok, it sounds like there aren't any major problems without this PR and I'm not really blocked continuing work on enabling tracing. The heap usage would be fixed by a stop the world GC. That's great!

Maybe the problem you were seeing is that the cleaner was only being executed once per minute and it was being drowned out by all the other tiny caches.

That's unlikely the case. I had the same thought yesterday, so I added a condition of target.size() > 1 to my breakpoint. The breakpoint doesn't trigger with that condition.

Reduce default cache size in EvictingCacheProvider, and return a NoOp CacheProvider when classloader is going to be skipped.
@tylerbenson tylerbenson changed the title Use soft reference for cache instead of max size Optimize DDCachingPoolStrategy Sep 10, 2019
@tylerbenson
Copy link
Contributor Author

tylerbenson commented Sep 10, 2019

Ok, I've updated this branch with only the optimizations and reverted the change to use soft references.

@mightyguava, thanks for digging into this. Let me know if you see problems persisting.

@tylerbenson tylerbenson merged commit e895d7c into master Sep 11, 2019
@tylerbenson tylerbenson deleted the tyler/cache-soft-refs branch September 11, 2019 16:42
@tylerbenson tylerbenson added this to the 0.33.0 milestone Sep 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants