Replace IntObjectHashMap with dense array for field related map to reduce heap usage#16201
Open
HUSTERGS wants to merge 1 commit into
Open
Replace IntObjectHashMap with dense array for field related map to reduce heap usage#16201HUSTERGS wants to merge 1 commit into
HUSTERGS wants to merge 1 commit into
Conversation
Signed-off-by: gesong.samuel <gesong.samuel@bytedance.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This change adds
ReadOnlyDenseIntObjectMap, a compact read-only representation forIntObjectHashMapinstances whose keys are non-negative dense integers. It stores values directly in anObject[]indexed by key, avoiding the separateint[]keys table used byIntObjectHashMap.This is useful for codec metadata maps keyed by
FieldInfo.number. These maps are built while reading segment metadata and then only queried afterwards.The new representation is only selected through
maybeWrap(...)when it removes meaningful table slack. By default, wrapping requires at least 30% fewer value slots. If keys are sparse, negative, or values are null, the originalIntObjectHashMapis kept.Motivation
Several codec readers keep per-field metadata maps keyed by
FieldInfo.number. After previous changes from field-name keyed maps toIntObjectHashMap, these maps no longer retain field name strings as keys, but they still keep an open-addressed hash table with both anint[]keys table and anObject[]values table. I've seen other PRs try to reduce heap usage of these maps, like #13961 #13327 #13368This patch is motivated by a huge cluster in production. On one node we can have around 20k open segments, and each segment has 400+ fields. Most of these fields are keyword-like fields, so they are both indexed and have doc values.
For 400+ fields,
IntObjectHashMaptypically allocates a 1024-slot table, plus the extra slot used for key0, so both arrays have 1025 entries. If field numbers are dense enough thatmaxFieldNumber + 1is around half of the hash table size, the dense read-only representation replaces:int[1025]Object[1025]with approximately:
Object[512]Assuming compressed object pointers, this is roughly:
int[1025]: ~4.0 KBObject[1025]: ~4.0 KBObject[512]: ~2.0 KBSo the saving is about 6 KB per converted map, excluding the referenced values themselves.
In this workload, the main maps affected per segment are:
Lucene103BlockTreeTermsReader.fieldMapfor indexed fieldsPerFieldDocValuesFormat.FieldsReader.fieldsfor doc-values fieldsLucene90DocValuesProducermetadata map for the dominant doc-values type(for example sorted/sorted-set keyword fields)
This gives a rough estimate of about:
3 maps * 6 KB * 20,000 segments ~= 360,000 KB, or around350 MBof heap reduction on such anode.
The exact saving depends on field-number density and on how fields are distributed across doc-values types. If field numbers are denser, the saving can be slightly higher; if fields are split across multiple smaller maps or field numbers are sparse,
maybeWrapkeeps the originalIntObjectHashMap.