Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use sorted index based filtering only for dictionary encoded column #6288

Merged
merged 2 commits into from
Nov 30, 2020

Conversation

siddharthteotia
Copy link
Contributor

@siddharthteotia siddharthteotia commented Nov 25, 2020

Currently we build sorted index only if the column is dictionary encoded. However, when we write isSorted in on-disk segment metadata, we write on the basis of pre-index stats collector. So, for a sorted column without dictionary, segment metadata will still indicate column as sorted

properties.setProperty(getKeyFor(column, IS_SORTED), String.valueOf(columnIndexCreationInfo.isSorted()));

During query processing, when we create filter operator, we check the data source metadata to see if the column is sorted and create sorted index based filter operator. However, using this operator for any sorted raw column will lead to the following error stack since we end up using a raw value based predicate evaluator for a dictionary based filter operator.

The solution is to do the additional check on data source to see if the column is dictionary encoded or not

java.lang.UnsupportedOperationException
181762         at org.apache.pinot.core.operator.filter.predicate.BaseRawValueBasedPredicateEvaluator.getMatchingDictIds(BaseRawValueBasedPredicateEvaluator.java:40)
181763         at org.apache.pinot.core.operator.filter.SortedIndexBasedFilterOperator.getNextBlock(SortedIndexBasedFilterOperator.java:68)
181764         at org.apache.pinot.core.operator.filter.SortedIndexBasedFilterOperator.getNextBlock(SortedIndexBasedFilterOperator.java:35)
181765         at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)
181766         at org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:62)
181767         at org.apache.pinot.core.operator.DocIdSetOperator.getNextBlock(DocIdSetOperator.java:35)
181768         at org.apache.pinot.core.operator.BaseOperator.nextBlock(BaseOperator.java:49)
181769         at org.apache.pinot.core.operator.ProjectionOperator.getNextBlock(ProjectionOperator.java:57)
181770         at org.apache.pinot.core.operator.ProjectionOperator.getNextBlock(ProjectionOperator.java:30)`
```

only for sorted column with dictionary
@codecov-io
Copy link

codecov-io commented Nov 25, 2020

Codecov Report

Merging #6288 (a6535ca) into master (1beaab5) will increase coverage by 7.56%.
The diff coverage is 68.27%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #6288      +/-   ##
==========================================
+ Coverage   66.44%   74.01%   +7.56%     
==========================================
  Files        1075     1252     +177     
  Lines       54773    61203    +6430     
  Branches     8168     8864     +696     
==========================================
+ Hits        36396    45300    +8904     
+ Misses      15700    12988    -2712     
- Partials     2677     2915     +238     
Flag Coverage Δ
integration 45.82% <51.88%> (?)
unittests 65.21% <45.12%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ot/broker/broker/AllowAllAccessControlFactory.java 100.00% <ø> (ø)
.../helix/BrokerUserDefinedMessageHandlerFactory.java 52.83% <0.00%> (-13.84%) ⬇️
...ker/routing/instanceselector/InstanceSelector.java 100.00% <ø> (ø)
.../main/java/org/apache/pinot/client/Connection.java 44.44% <0.00%> (-4.40%) ⬇️
...not/common/assignment/InstancePartitionsUtils.java 78.57% <ø> (+5.40%) ⬆️
.../apache/pinot/common/exception/QueryException.java 90.27% <ø> (+5.55%) ⬆️
...pinot/common/function/AggregationFunctionType.java 98.27% <ø> (-1.73%) ⬇️
.../pinot/common/function/DateTimePatternHandler.java 83.33% <ø> (ø)
...ot/common/function/FunctionDefinitionRegistry.java 88.88% <ø> (+44.44%) ⬆️
...org/apache/pinot/common/function/FunctionInfo.java 100.00% <ø> (ø)
... and 1033 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a6e094...a6535ca. Read the comment docs.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Optional) IMO it is cleaner if we check the dictionary when initializing the data source metadata since we don't support raw sorted index. In ImmutableDataSource.java, you can change the constructor of ImmutableDataSourceMetadata to _sorted = columnMetadata.isSorted() && columnMetadata.hasDictionary(). When we support raw sorted index in the future, we can change it back. Wdyt?

Predicate.Type predicateType = predicateEvaluator.getPredicateType();
if (predicateType == Predicate.Type.RANGE) {
if (dataSource.getDataSourceMetadata().isSorted()) {
if (dataSource.getDataSourceMetadata().isSorted() && (dataSource.getDictionary() != null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
if (dataSource.getDataSourceMetadata().isSorted() && (dataSource.getDictionary() != null)) {
if (dataSource.getDataSourceMetadata().isSorted() && dataSource.getDictionary() != null) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -59,7 +65,7 @@ public static BaseFilterOperator getLeafFilterOperator(PredicateEvaluator predic
} else if (predicateType == Predicate.Type.REGEXP_LIKE) {
return new ScanBasedFilterOperator(predicateEvaluator, dataSource, numDocs);
} else {
if (dataSource.getDataSourceMetadata().isSorted()) {
if (dataSource.getDataSourceMetadata().isSorted() && (dataSource.getDictionary() != null)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
if (dataSource.getDataSourceMetadata().isSorted() && (dataSource.getDictionary() != null)) {
if (dataSource.getDataSourceMetadata().isSorted() && dataSource.getDictionary() != null) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@siddharthteotia siddharthteotia merged commit 3eb0f9c into apache:master Nov 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants