-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Index merging without garbage #4622
Comments
Thank you for summarizing the motivation. Can you quantify the benefit from this series of patches? Could you also please lay out what backwards-incompatible API changes (if any) are expected as part of this series of patches? |
I don't know how much space and time it will save, because this series of patches is not finished yet, and not tested. Aggregations in extensions, which are supposed to be used for indexing rollup, must implement |
This issue has been marked as stale due to 280 days of inactivity. |
This issue has been closed due to lack of activity. If you think that |
Current state
Currently the data in several partial (or just one - for transformations) indexes is transformed during merged in the following way:
IncrementalIndex
) >--> sorting dimension value indexed, aka unsortedToSorted
--> optionally, reordering dims
// here array elements are the same objects as at the previous step, but
Object[]
arrays are new, if reordering or dims and/or metrics is actually required--> another one reindexing, based on merged dictionary
--> final merge.
Here,
Object[]
elements are eitherint[]
(DimensionSelector),Long
,Double
orFloat
(numeric ColumnValueSelectors, correspondingly).So in the process of merge, each entry generates 2-3 extra
Rowboat
objects, 4-7 newObject[]
arrays, and N (the number of string dimensions) * 2 newint[]
arrays, and new boxed primitive objects, if merging is done withQueryableIndex
as a source.Garbage-free approach
Rowboat
contains an array of ColumnValueSelector objects, representing the stream of dimensions, and another array of ColumnValueSelector objects, representing a stream of metrics, both "under cursor". WhenQueryableIndex
is used as source for merging, the existingCursor
andColumnValueSelectorFactory
infrastructure is reused with minimal modifications.0->1 and 2-3 conversions, as described above, implemented as ColumnValueSelector transformations, without creating new arrays, boxed primitives, etc. 1->2 transformation is essentially a no-op: create a Rowboat object with array of ColumnValueSelectors, ordered differently.
The text was updated successfully, but these errors were encountered: