Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update ARRAY_OVERLAP to plan with ArrayContainsElement for ARRAY columns #15451

Merged

Conversation

clintropolis
Copy link
Member

Description

Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in #15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.

For query:

SELECT string1, long1 FROM foo WHERE ARRAY_OVERLAP("multi-string3", ARRAY[100, 200]) GROUP BY 1,2
Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt     Score    Error  Units
SqlExpressionBenchmark.querySql       39           5000000      auto        false  avgt    5  5119.577 ± 13.764  ms/op

After:

Benchmark                        (query)  (rowsPerSegment)  (schema)  (vectorize)  Mode  Cnt    Score   Error  Units
SqlExpressionBenchmark.querySql       39           5000000      auto        false  avgt    5  149.264 ± 1.216  ms/op

This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Copy link
Contributor

@abhishekagarwal87 abhishekagarwal87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty great numbers.

if (simpleExtractionExpr.isDirectColumnAccess() && simpleExtractionExpr.isArray()) {
// To convert this expression filter into an OR of ArrayContainsElement filters, we need to extract all array
// elements.
if (expr.isLiteral()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if its an empty array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, so i guess it will fail a precondition check when it makes the OR filter because fields will be empty...

However, it currently isn't really possible to make an empty array literal because ARRAY[] isn't accepted by calcite (or the native druid array expression), so I'm not really sure how important it is to handle this case either.

I can update this (and ArrayContainsOperatorConversion) to return null in this case since we don't really handle it, or true i guess would also probably be correct?

@@ -149,6 +153,42 @@ public DimFilter toDruidFilter(
);
}
}

// if the input is a direct array column, we can use sweet array filter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// if the input is a direct array column, we can use sweet array filter
// if the input is a direct array column, we can use sweet array contains filter

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, i just copied this section from ArrayContainsOperatorConversion and switched AND to OR

@abhishekagarwal87 abhishekagarwal87 merged commit 5ce4aab into apache:master Nov 30, 2023
82 of 83 checks passed
@clintropolis clintropolis deleted the array-overlap-with-indexes branch November 30, 2023 05:41
yashdeep97 pushed a commit to yashdeep97/druid that referenced this pull request Dec 1, 2023
…mns (apache#15451)

Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in apache#15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
yashdeep97 pushed a commit to yashdeep97/druid that referenced this pull request Dec 1, 2023
…mns (apache#15451)

Updates ARRAY_OVERLAP to use the same ArrayContainsElement filter added in apache#15366 when filtering ARRAY typed columns so that it can also use indexes like ARRAY_CONTAINS.
@LakshSingla LakshSingla added this to the 29.0.0 milestone Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants