-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimisations for GarbageCleaner #31
Comments
By Jason Koch on Jan 26, 2021 12:22 Created attachment 285390 :notepad_spiral: 1_simplify_objectmarker_marking_code.patch |
By Jason Koch on Jan 26, 2021 12:23 Created attachment 285391 :notepad_spiral: 2_getPage_perform_read_outside_sync_block.patch |
By Jason Koch on Jan 26, 2021 12:23 Created attachment 285392 :notepad_spiral: 3_remove_unneeded_sync_in_ArrayIntCompressed.patch |
By Jason Koch on Jan 26, 2021 12:23 Created |
By Jason Koch on Jan 26, 2021 12:23 Created attachment 285394 :notepad_spiral: 5_use_fj_for_GarbageCleaner.patch |
By Jason Koch on Jan 26, 2021 12:23 Created attachment 285395 :notepad_spiral: 6_patch_feedback.patch |
By Jason Koch on Jan 26, 2021 15:21 I've performed some before/after of this patchset and on the hprof files I have, the GC phase uses less memory than Phase1/Phase2. In practice this means that if there is enough heapspace allocated to get through to complete Phase2, then it will be completed through to the end. If there is not enough heapspace to get through to end of Phase2, then we do not get to the point of running GC. In summary, we can be confident that this patchset does not degrade the heap behaviour of MAT parsing and also gives some clues about future direction for potential optimization. |
By Jason Koch on Mar 25, 2021 11:03 Hi Andrew / Krum - Were you able to check over this and confirm any feedback? |
By Jason Koch on Mar 30, 2021 23:42 I just realised none of these patches apply cleanly; I tried to break it up into smaller patches and made a mess of it. I'll get that fixed. I'll clean this all up and raise a separate ticket. |
By Andrew Johnson on Mar 31, 2021 05:11 I'm looking at the patches. I think the first one worked for me, but if you want to keep this bug then you can mark old patches as superseded and add new ones. |
By Jason Koch on Apr 01, 2021 00:20 Created attachment 286012 2b_fix_ensure_readdirect_multithreaded.patch this fixes issues w/ 2 :notepad_spiral: 2b_fix_ensure_readdirect_multithreaded.patch |
By Jason Koch on Apr 01, 2021 00:23 Created attachment 286013 :notepad_spiral: 4_introduce_threadlocal_cache_for_indexwriter_getpage.patch |
By Jason Koch on Apr 01, 2021 00:26 Created attachment 286014 :notepad_spiral: 7_fix_handle_garbagecleaner_overflow_on_large_dumps.patch |
By Jason Koch on Apr 01, 2021 00:28 I have amended patch 2 with a 2b. Replaced patch 4, and added a patch 7. I believe these should apply cleanly in order now and pass tests properly: 1,2+2b,3,4,5,6,7. |
By Andrew Johnson on Apr 26, 2021 02:50 I've applied the patches to my workspace and they run. It's not a large set of patches (<400 added lines) so isn't a 'large' contribution, so no CQ is needed. The Fork/join object marker is simpler than the old way, but the old way had some performance tuning, The old way:
so one question is how well the new marking works with limited memory (insufficient to hold most of the A minor improvement for fork/join might be to split very large object arrays - for example if there is I think the IndexWriter getPage SoftReference also has the possible problem where the SoftReference is cleared between two gets. To be this in for Eclipse 2021-06 we would need to finish this a few days before 2021-06 M3+3 on Wednesday, May 26, 2021. |
By Jason Koch on May 06, 2021 22:55 Thanks for the feedback. Quick update: I have some ideas, certainly we can defer / lazy load the array for FjObjectMarker into the next compute phase - and I have a draft of that working locally, it definitely reduces heap footprint during marking. I am still thinking about the locality question, and it is a good point. There are a few architectures I have in mind that might give superb or might give terrible throughput, I'm yet to really test any of them out,.. On the one hand, locality is beneficial for cache purposes. In practice however, looking at a heap I have here: 2bn objects, outbound index is 20GB in size, and current CPUs have O(10s of MB) of L3 cache, so the chance of an object ID appearing is quite small; this would favour simpler code paths to help the CPU. In addition, monitoring jmap -histo on a heap of this size indicates at peak only <300,000 FjObjectMarkers are enqueued (after I do the lazy load mentioned above), which is only a few MBs of data. But there are certainly areas where locality is likely to be more significant, and I'd need to measure and get a bit more detailed to be certain. I think the only way to be sure is to build a few alternative implementations and generate some more ideas on what would be a better solution. Ideas bubbling around, that may or may not work:
I want to take some measurements to find out why exactly the code is stalling too; it seems from simple perf the majority of the code is spent around the outbound object get, so perhaps we can look at optimising this by batching reads to the IO layer! Much to think about. |
By Andrew Johnson on May 08, 2021 04:13 The locality I was considering was also reader caching for the outbound references. IntIndex1NReader [I suppose now we have CompressedRandomAccessFile in hprof we could gzip compress the index files, but it would be slower, and most of the space saving is from compressing hprof: file name uncompressed compressed Saving Compression chunked compressed java_pid32284.0001.hprof.gz 1555306556 ] |
By Jason Koch on May 11, 2021 09:01 Created |
By Jason Koch on May 11, 2021 09:02 I've added a draft patch that swaps the int[] enqueue to a int enqueue, which reduces the memory footprint during marking. |
By Jason Koch on Jun 02, 2021 18:40 Created attachment 286512 Addresses comments (via email?) regarding enqueue of entire arrays instead of only pointers. :notepad_spiral: file_570670.txt |
By Yi Yang on Jun 08, 2023 03:17
These two patches reduce synchronization overhead of IntIndexReader, I think they are principally correct, but I have not seen any significant performance improvement.
This looks good to me
These two patches reduced the synchronization overhead of IntIndexCollector.set, but did not completely eliminate it. Considering this, do you think that if a completely lock-free ArrayIntUncompressed is provided, we can eliminate the synchronization overhead of IntIndexCollector.set/get and remove the need for these patches altogether? |
By Jason Koch on Jan 19, 2024 21:38 The only change still hanging around in here is the GarbageCleaner patch/es. I'll retest these and see whether it's worth another pr. |
| --- | --- |
| Bugzilla Link | 570670 |
| Status | ASSIGNED |
| Importance | P3 minor |
| Reported | Jan 26, 2021 11:54 EDT |
| Modified | Jan 19, 2024 21:38 EDT |
| Version | 1.11 |
| Reporter | Jason Koch |
Description
I have a series of patches that improve performance in the GarbageCleaner codebase. After a brief discussion on the mailing list I propose these patches.
All existing TCs pass. This results in a 4x speedup on a 16-core machine and significantly lower overall ('real' cpu usage) when parsing large (70+gb) hprof file with ~1 billion objects.
The text was updated successfully, but these errors were encountered: