[CALCITE-7474] LAST in MATCH_RECOGNIZE might return wrong result#4889
[CALCITE-7474] LAST in MATCH_RECOGNIZE might return wrong result#4889snuyanzin wants to merge 3 commits intoapache:mainfrom
LAST in MATCH_RECOGNIZE might return wrong result#4889Conversation
| final BinaryExpression lastIndex = | ||
| Expressions.subtract( | ||
| Expressions.call(rows, BuiltInMethod.COLLECTION_SIZE.method), | ||
| Expressions.constant(1)); |
| final RexPatternFieldRef ref = (RexPatternFieldRef) node; | ||
| final RexPatternFieldRef newRef = | ||
| new RexPatternFieldRef(ref.getAlpha(), | ||
| ref.getIndex(), | ||
| translator.typeFactory.createTypeWithNullability(ref.getType(), | ||
| true)); | ||
| final Expression expression = translator.translate(newRef, NullAs.NULL); | ||
| setInputGetterIndex(translator, null); | ||
| return expression; | ||
| } else { | ||
| // Alpha != "*" so we have to search for a specific one to find and use that, if found | ||
| // otherwise pick the last one | ||
| setInputGetterIndex(translator, | ||
| Expressions.call(BuiltInMethod.MATCH_UTILS_LAST_WITH_SYMBOL.method, | ||
| Expressions.constant(alpha), rows, symbols, i)); | ||
|
|
||
| // Important, unbox the node / expression to avoid NullAs.NOT_POSSIBLE | ||
| final RexPatternFieldRef ref = (RexPatternFieldRef) node; | ||
| final RexPatternFieldRef newRef = | ||
| new RexPatternFieldRef(ref.getAlpha(), | ||
| ref.getIndex(), | ||
| translator.typeFactory.createTypeWithNullability(ref.getType(), | ||
| true)); | ||
| final Expression expression = translator.translate(newRef, NullAs.NULL); | ||
| setInputGetterIndex(translator, null); |
There was a problem hiding this comment.
since in both branches the code is almost same, extracted it
| */ | ||
| public static <E> int lastWithSymbol(String symbol, List<E> rows, List<String> symbols, | ||
| int startIndex) { | ||
| public static <E> int lastWithSymbolOrDefault(String symbol, List<E> rows, List<String> symbols, |
There was a problem hiding this comment.
good question, why have rows if they are not needed? It's not like you need to implement a specific interface.
Since you are renaming a public function (which may not be ok), maybe you can just use a new one.
There was a problem hiding this comment.
maybe you can just use a new one.
good point, reverted changes for existing and created a new one with removed unused args
| final Expression expression = translator.translate(newRef, NullAs.NULL); | ||
| setInputGetterIndex(translator, null); | ||
| return expression; | ||
| Expressions.call(BuiltInMethod.MATCH_UTILS_LAST_WITH_SYMBOL_OR_DEFAULT.method, |
There was a problem hiding this comment.
Are the last two arguments always the same? Then why are they needed?
Does that function have any other callers?
There was a problem hiding this comment.
reduced amount of args
Does that function have any other callers?
in Calcite yes, there is no other callers, not sure about downstream projects if any
as mentioned above, reverted changes to existing method and created a new one
| } | ||
|
|
||
| /** | ||
| * Returns the row with the highest index whose corresponding symbol matches, |
There was a problem hiding this comment.
There is no parameter called 'row'. Last symbol?
There was a problem hiding this comment.
yes, also changed for the original method
|
|
||
| !ok | ||
|
|
||
| # Test Simple LAST with expanded column name |
There was a problem hiding this comment.
Can you please say in a comment whether this is validated on another engine?
There was a problem hiding this comment.
I tested for Oracle, Snowflake, BigQuery
BigQuery is able to process it and return the result
Oracle and Snowflake fail on validation that only vars from PATTERN are allowed, however they still allow usage of non expanded names
also put this in jira
There was a problem hiding this comment.
Yes, but if this is in a comment people will trust this test. We have had incorrect test results previously.
There was a problem hiding this comment.
I'm open to do same as Oracle, Snowflake as well
however then not clear why it should work for non expanded columns
|



Jira Link
CALCITE-7474
Changes Proposed
The issue (described in jira) is that there was an assumption that in
MEASURESif starts with*then should use the last line if present, otherwise filter by symbol fromPATTERNIn reality it might happen that symbols from
PATTERNare not used however there is expanded nameThe PR aligns that