Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index merging without garbage #4622

Closed
leventov opened this issue Aug 1, 2017 · 4 comments
Closed

Index merging without garbage #4622

leventov opened this issue Aug 1, 2017 · 4 comments

Comments

@leventov
Copy link
Member

leventov commented Aug 1, 2017

Current state

Currently the data in several partial (or just one - for transformations) indexes is transformed during merged in the following way:

  1. Iterator < TimeAndDims + Object[] metrics (entry in IncrementalIndex) >
    --> sorting dimension value indexed, aka unsortedToSorted
  2. Iterator < Rowboat (Object[] dims, Object[] metrics) >
    --> optionally, reordering dims
  3. Iterator < Rowboat (Object[] dims, Object[] metrics) >
    // here array elements are the same objects as at the previous step, but Object[] arrays are new, if reordering or dims and/or metrics is actually required

    --> another one reindexing, based on merged dictionary
  4. Iterator < Rowboat (Object[] dims, Object[] metrics) >
    --> final merge.

Here, Object[] elements are either int[] (DimensionSelector), Long, Double or Float (numeric ColumnValueSelectors, correspondingly).

So in the process of merge, each entry generates 2-3 extra Rowboat objects, 4-7 new Object[] arrays, and N (the number of string dimensions) * 2 new int[] arrays, and new boxed primitive objects, if merging is done with QueryableIndex as a source.

Garbage-free approach

Rowboat contains an array of ColumnValueSelector objects, representing the stream of dimensions, and another array of ColumnValueSelector objects, representing a stream of metrics, both "under cursor". When QueryableIndexis used as source for merging, the existing Cursor and ColumnValueSelectorFactory infrastructure is reused with minimal modifications.

0->1 and 2-3 conversions, as described above, implemented as ColumnValueSelector transformations, without creating new arrays, boxed primitives, etc. 1->2 transformation is essentially a no-op: create a Rowboat object with array of ColumnValueSelectors, ordered differently.

@leventov leventov added this to the 0.11.0 milestone Aug 1, 2017
@leventov leventov self-assigned this Aug 1, 2017
@leventov leventov changed the title Index transformation and merging without garbage Index merging without garbage Aug 1, 2017
@gianm
Copy link
Contributor

gianm commented Aug 15, 2017

Thank you for summarizing the motivation.

Can you quantify the benefit from this series of patches?

Could you also please lay out what backwards-incompatible API changes (if any) are expected as part of this series of patches?

@leventov
Copy link
Member Author

I don't know how much space and time it will save, because this series of patches is not finished yet, and not tested.

Aggregations in extensions, which are supposed to be used for indexing rollup, must implement AggregatorFactory.makeAggregateCombiner(): https://github.com/druid-io/druid/pull/4676/files#diff-90e2d51a725b4d59f09e2f8b740b7f37R70

@github-actions
Copy link

github-actions bot commented Jun 4, 2023

This issue has been marked as stale due to 280 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If this issue is still
relevant, please simply write any comment. Even if closed, you can still revive the
issue at any time or discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jun 4, 2023
@github-actions
Copy link

github-actions bot commented Jul 2, 2023

This issue has been closed due to lack of activity. If you think that
is incorrect, or the issue requires additional review, you can revive the issue at
any time.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants