Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal] Multi-column filtering, let DimensionSpec handle extraction functions exclusively #3378

Closed
jon-wei opened this issue Aug 19, 2016 · 4 comments

Comments

@jon-wei
Copy link
Contributor

jon-wei commented Aug 19, 2016

Multi-column filtering, let DimensionSpec handle extraction functions exclusively

To support filters that accept multi-column inputs, this proposal suggests that:

DimensionSpec

The following methods are added to DimensionSpec:

Object getColumnValueSelector(ColumnSelectorFactory columnSelectorFactory);
ValueType getOutputType(ColumnSelectorFactory columnSelectorFactory);
BitmapIndex getBitmapIndex(BitmapIndexSelector selector);
  • Change DimensionSpec to accept list of dimension names, instead of a single name
  • Instead of using ColumnSelectorFactory.makeDimensionSelector(), column reading code will call DimensionSpec.getColumnValueSelector(columnSelectorFactory) and use getOutputType() to determine the type of the selector
  • DimensionSpec uses ColumnSelectorFactory to create a view of a dimension ("virtual column"), represented as a DimensionSelector, LongColumnSelector, etc.
  • The DimensionSpec creates a new "Column Value Selector" that composes DimensionSelectors, LongColumnSelectors, etc. for individual columns from the ColumnSelectorFactory, applying the extraction function, into a new "virtual column"
  • If the output type is string, the returned object will be a DimensionSelector. If the output type is long, the returned object will be a LongColumnSelector.
  • decorate() becomes a private method, wrapping a delegate selector when getColumnValueSelector() is called
  • can remove getExtractionFn(), extractionFn is no longer needed by/exposed to reader code
  • Use ExtractionDimensionSpec for grouping/filtering on "virtual columns" derived from multiple input columns, as well as any dimensions with extraction functions applied
  • Use DefaultDimensionSpec for reading from the base columns without transformations
  • RegexFilteredDimensionSpec and ListFilteredDimensionSpec can just pass new method calls to delegate spec
  • LookupDimensionSpec: replace this usage with ExtractionDimensionSpec and the right extraction function?
  • Bitmap retrieval is also moved to DimensionSpec, it can return null for cases where bitmap is not valid (e.g., with a multiple column extraction fn, or other extraction fn that makes the indexes unusable)
  • For bitmap retrieval instead of calling BitmapIndexSelector.getBitmapIndex(dimension), call dimensionSpec.getBitmapIndex(BitmapIndexSelector)

ValueMatcherFactory

  public ValueMatcher makeValueMatcher(DimensionSpec dimensionSpec, Comparable value);
  public ValueMatcher makeValueMatcher(DimensionSpec dimensionSpec, DruidPredicateFactory predicateFactory);
  • The methods above are changed to accept a DimensionSpec instead of a dimension name
  • use DimensionSpec.getColumnValueSelector() instead of ColumnSelectorFactory.makeDimensionSelector()

Query engines

  • use DimensionSpec.getColumnValueSelector() instead of ColumnSelectorFactory.makeDimensionSelector()

ColumnSelectorFactory

public boolean isDescending();
  • Add isDescending() so ExtractionDimensionSpec can create SingleScanTimeDimSelector for __time
  • Move extractionFn application out of ColumnSelectorFactory
  • ColumnSelectorFactory is now responsible for returning single column selectors on real in-segment columns, with no value transformations

Filters:

  • Accept a DimensionSpec instead of a dimension name
  • Instead of checking for the presence of an extraction function, filters use the preservesOrdering and extractionType properties of the DimensionSpec (e.g., for BoundFilter optimizations)
  • Filter acts on the materialized "virtual column" expressed by the DimensionSpec
  • Move extractionFn application outside of predicates, rely on DimensionSelector to have already applied value transformations instead
  • Could write an arbitrary filter by using a multi-column extractionFn that returns a boolean value, use selector filter on "true"
  • Could support extraction functions that return longs, floats, etc. in the future

Related Topics/Issues/PRs:

@fjy fjy added this to the 0.9.3 milestone Aug 30, 2016
@gianm
Copy link
Contributor

gianm commented Feb 13, 2017

Un-milestoning this from 0.10.0.

@jon-wei reading through this it looks like some of it has been done already as part of PRs in 0.10.0 and some of it has not. (And actually some of it it looks like we decided to go in a different direction: like filters not taking dimensionSpecs.)

Do you think we should keep it open and scope down to what still makes sense given what has already been done? Or close and reimagine and open new proposals?

@gianm gianm removed this from the 0.10.0 milestone Feb 13, 2017
@vogievetsky
Copy link
Contributor

@jon-wei is this still needed given the expression stuff that exists now?

@stale
Copy link

stale bot commented Mar 15, 2020

This issue has been marked as stale due to 280 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale bot added the stale label Mar 15, 2020
@stale
Copy link

stale bot commented Jul 2, 2020

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

@stale stale bot closed this as completed Jul 2, 2020
seoeun25 added a commit to seoeun25/incubator-druid that referenced this issue Feb 25, 2022
seoeun25 pushed a commit to seoeun25/incubator-druid that referenced this issue Feb 25, 2022
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants