Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Jul 16, 2024

Fixes #13595 by making ARRAY_TO_MV a non deterministic function.

As defined in SqlOperator Calcite javadoc, a function is deterministic if and only if

Returns whether a call to this operator is guaranteed to always return the same result given the same operands; true is assumed by default.

Which is not true in the case of ARRAY_TO_MV, whose semantic is actually not implementable in Calcite because it actually depends what is being done with the value returned.

@codecov-commenter
Copy link

codecov-commenter commented Jul 16, 2024

Codecov Report

Attention: Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 61.94%. Comparing base (59551e4) to head (5934ed3).
Report is 764 commits behind head on master.

Files Patch % Lines
...e/pinot/calcite/sql/PinotSqlTransformFunction.java 60.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13618      +/-   ##
============================================
+ Coverage     61.75%   61.94%   +0.19%     
+ Complexity      207      198       -9     
============================================
  Files          2436     2555     +119     
  Lines        133233   140716    +7483     
  Branches      20636    21862    +1226     
============================================
+ Hits          82274    87173    +4899     
- Misses        44911    46906    +1995     
- Partials       6048     6637     +589     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 35.15% <77.77%> (-26.56%) ⬇️
java-21 61.84% <77.77%> (+0.21%) ⬆️
skip-bytebuffers-false 61.93% <77.77%> (+0.19%) ⬆️
skip-bytebuffers-true 61.81% <77.77%> (+34.08%) ⬆️
temurin 61.94% <77.77%> (+0.19%) ⬆️
unittests 61.94% <77.77%> (+0.19%) ⬆️
unittests1 46.47% <77.77%> (-0.42%) ⬇️
unittests2 27.68% <0.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@yashmayya yashmayya added bugfix multi-stage Related to the multi-stage query engine labels Jul 16, 2024
@yashmayya
Copy link
Contributor

As defined in SqlOperator Calcite javadoc, a function is deterministic if and only if

Returns whether a call to this operator is guaranteed to always return the same result given the same operands; true is assumed by default.

Which is not true in the case of ARRAY_TO_MV

Technically, ARRAY_TO_MV is deterministic right since it is guaranteed to always return the same result given the same operands? Essentially, we're never going to get a different result for something like ARRAY_TO_MV(RandomAirports) = 'MFR' when applied to the same row. Although this does seem to fix the issue described in #13595, so as long as there are no other side effects, I guess we could live with this (especially considering that the ARRAY_TO_MV function itself is a bit of a hack).

@gortiz
Copy link
Contributor Author

gortiz commented Jul 16, 2024

Technically, ARRAY_TO_MV is deterministic right since it is guaranteed to always return the same result given the same operands? Essentially, we're never going to get a different result for something like ARRAY_TO_MV(RandomAirports) = 'MFR' when applied to the same row

No. ARRAY_TO_MV is not deterministic. The definition is that the result is the same for the same inputs. That semantic is what give Calcite permission to simplify ARRAY_TO_MV(RandomAirports) = 'MFR' AND ARRAY_TO_MV(RandomAirports) = 'GTR' as false. Given ARRAY_TO_MV(RandomAirports) is deterministic and MFR is different than GTR, it means that AND must always evaluate to false.

What you are saying is that ARRAY_TO_MV(RandomAirports) = 'MFR' is deterministic, which is true, but there what is deterministic is =, not ARRAY_TO_MV. Anyway, ARRAY_TO_MV semantics are ill defined. That function is actually doesn't have semantic, is used just to trick Calcite validator. In fact ARRAY_TO_MV cannot be executed in V1 and it is removed before sending the query in leaf stage. What V1 sees is RandomAirports = 'MFR' and V1 includes some historical semantic for = applied to multi-values and scalars

@yashmayya
Copy link
Contributor

That makes sense, thanks for the explanation!

@Jackie-Jiang Jackie-Jiang merged commit 65907b7 into apache:master Jul 16, 2024
rajagopr pushed a commit to rajagopr/pinot that referenced this pull request Jul 17, 2024
ege-st pushed a commit to ege-st/pinot that referenced this pull request Aug 1, 2024
(cherry picked from commit 65907b7)

# Conflicts:
#	pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/MultiStageEngineIntegrationTest.java
@npawar npawar added the v1v2 label Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix multi-stage Related to the multi-stage query engine v1v2

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multi-stage engine might break MV filter semantic

5 participants