fix: reuse Inflater/Deflater in BitCaskDiskMap to avoid JDK8 finalizer contention#18818
fix: reuse Inflater/Deflater in BitCaskDiskMap to avoid JDK8 finalizer contention#18818vamsikarnika wants to merge 4 commits into
Conversation
…finalizer contention
CompressionHandler currently allocates a new Deflater on every compressBytes()
and a new Inflater on every decompressBytes(). On JDK 8 both classes register
a Finalizer on construction. Under sustained, multi-threaded disk-map traffic
(observed during MDT/RLI compaction merging millions of records across several
Spark task threads on the same executor), the rate of zlib allocations exceeds
the rate at which the single Finalizer thread can drain its queue. Native
ZStreamRef handles pile up in old gen, the heap saturates, and G1 enters a
mixed-GC death spiral while application threads make no progress.
CompressionHandler is already held in a ThreadLocal, so a single Deflater/
Inflater pair per worker thread is sufficient. This change:
- Adds transient Deflater/Inflater fields and lazy accessors (transient so
the class remains Serializable; lazy so deserialized instances rebuild
the codecs on first use).
- Calls reset() on the cached codecs at the start of each call.
- Passes the user-supplied codecs to DeflaterOutputStream(out, def) and
InflaterInputStream(in, inf), which sets usesDefaultDeflater/Inflater
to false so close() does not call end() on the codec — the codec
survives the try-with-resources for reuse on the next call.
On-disk format, compression level, and error semantics are unchanged.
Allocation rate drops from O(records) to O(threads). On JDK 9+ this also
removes per-call Cleaner registration overhead.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR caches a Deflater/Inflater pair per worker thread in BitCaskDiskMap.CompressionHandler to avoid per-call zlib allocations and the JDK 8 finalizer pile-up they cause under sustained disk-map traffic. The reuse pattern correctly relies on the documented usesDefaultDeflater/Inflater=false behavior when passing user-supplied codecs to DeflaterOutputStream/InflaterInputStream, calls reset() at the start of each call, and uses transient fields with lazy accessors so deserialized instances rebuild the codecs. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One minor naming nit; otherwise the change is clean and the comments are appropriately non-obvious.
cc @yihua
|
|
||
| // Lazy accessors so the handler stays usable after Java deserialization | ||
| // (transient fields come back null). | ||
| private Deflater deflater() { |
There was a problem hiding this comment.
🤖 nit: having deflater() / inflater() share names with their backing fields forces the callers to reach for def and inf as local variable names (line 445 / 458) — and def is a keyword in Groovy, Kotlin, and Python, which can cause a mental stumble. Could you rename the accessors to getDeflater() / getInflater()? That would let callers write Deflater deflater = getDeflater() cleanly.
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR caches the Deflater/Inflater in the thread-local CompressionHandler to avoid per-call construction and finalizer-queue contention on JDK 8. The transient/lazy pattern preserves serialization safety, reset() defends against leftover error state, and passing the explicit codecs into DeflaterOutputStream/InflaterInputStream correctly suppresses end() on close() so the codecs survive for reuse. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This change caches per-thread Deflater/Inflater instances in CompressionHandler (held via ThreadLocal) instead of allocating fresh ones per call, which addresses JDK 8 finalizer-thread contention under heavy disk-map traffic. The reuse contract (reset() before use, external codec passed to streams so close() won't call end(), transient fields rebuilt lazily after deserialization) looks correct. No issues flagged from this automated pass — a Hudi committer or PMC member can take it from here for a final review.
cc @yihua
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #18818 +/- ##
=============================================
+ Coverage 57.27% 68.91% +11.63%
- Complexity 18650 29092 +10442
=============================================
Files 1956 2509 +553
Lines 107076 139477 +32401
Branches 13255 17116 +3861
=============================================
+ Hits 61329 96117 +34788
+ Misses 39943 35609 -4334
- Partials 5804 7751 +1947
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
CompressionHandler currently allocates a new Deflater on every compressBytes() and a new Inflater on every decompressBytes(). On JDK 8 both classes register a Finalizer on construction. Under sustained, multi-threaded disk-map traffic (observed during MDT/RLI compaction merging millions of records across several Spark task threads on the same executor), the rate of zlib allocations exceeds the rate at which the single Finalizer thread can drain its queue. Native ZStreamRef handles pile up in old gen, the heap saturates, and G1 enters a mixed-GC death spiral while application threads make no progress.
CompressionHandler is already held in a ThreadLocal, so a single Deflater/ Inflater pair per worker thread is sufficient. This change:
On-disk format, compression level, and error semantics are unchanged. Allocation rate drops from O(records) to O(threads). On JDK 9+ this also removes per-call Cleaner registration overhead.
Describe the issue this Pull Request addresses
Summary and Changelog
Impact
Risk Level
Documentation Update
Contributor's checklist