Skip to content

[CALCITE-7474] LAST in MATCH_RECOGNIZE might return wrong result#4889

Open
snuyanzin wants to merge 3 commits intoapache:mainfrom
snuyanzin:calcite7474
Open

[CALCITE-7474] LAST in MATCH_RECOGNIZE might return wrong result#4889
snuyanzin wants to merge 3 commits intoapache:mainfrom
snuyanzin:calcite7474

Conversation

@snuyanzin
Copy link
Copy Markdown
Contributor

Jira Link

CALCITE-7474

Changes Proposed

The issue (described in jira) is that there was an assumption that in MEASURES if starts with * then should use the last line if present, otherwise filter by symbol from PATTERN
In reality it might happen that symbols from PATTERN are not used however there is expanded name
The PR aligns that

final BinaryExpression lastIndex =
Expressions.subtract(
Expressions.call(rows, BuiltInMethod.COLLECTION_SIZE.method),
Expressions.constant(1));
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not used

Comment on lines -4187 to -4210
final RexPatternFieldRef ref = (RexPatternFieldRef) node;
final RexPatternFieldRef newRef =
new RexPatternFieldRef(ref.getAlpha(),
ref.getIndex(),
translator.typeFactory.createTypeWithNullability(ref.getType(),
true));
final Expression expression = translator.translate(newRef, NullAs.NULL);
setInputGetterIndex(translator, null);
return expression;
} else {
// Alpha != "*" so we have to search for a specific one to find and use that, if found
// otherwise pick the last one
setInputGetterIndex(translator,
Expressions.call(BuiltInMethod.MATCH_UTILS_LAST_WITH_SYMBOL.method,
Expressions.constant(alpha), rows, symbols, i));

// Important, unbox the node / expression to avoid NullAs.NOT_POSSIBLE
final RexPatternFieldRef ref = (RexPatternFieldRef) node;
final RexPatternFieldRef newRef =
new RexPatternFieldRef(ref.getAlpha(),
ref.getIndex(),
translator.typeFactory.createTypeWithNullability(ref.getType(),
true));
final Expression expression = translator.translate(newRef, NullAs.NULL);
setInputGetterIndex(translator, null);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since in both branches the code is almost same, extracted it

*/
public static <E> int lastWithSymbol(String symbol, List<E> rows, List<String> symbols,
int startIndex) {
public static <E> int lastWithSymbolOrDefault(String symbol, List<E> rows, List<String> symbols,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question, why have rows if they are not needed? It's not like you need to implement a specific interface.

Since you are renaming a public function (which may not be ok), maybe you can just use a new one.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can just use a new one.

good point, reverted changes for existing and created a new one with removed unused args

final Expression expression = translator.translate(newRef, NullAs.NULL);
setInputGetterIndex(translator, null);
return expression;
Expressions.call(BuiltInMethod.MATCH_UTILS_LAST_WITH_SYMBOL_OR_DEFAULT.method,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the last two arguments always the same? Then why are they needed?
Does that function have any other callers?

Copy link
Copy Markdown
Contributor Author

@snuyanzin snuyanzin Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reduced amount of args

Does that function have any other callers?

in Calcite yes, there is no other callers, not sure about downstream projects if any
as mentioned above, reverted changes to existing method and created a new one

}

/**
* Returns the row with the highest index whose corresponding symbol matches,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no parameter called 'row'. Last symbol?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, also changed for the original method


!ok

# Test Simple LAST with expanded column name
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please say in a comment whether this is validated on another engine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested for Oracle, Snowflake, BigQuery
BigQuery is able to process it and return the result
Oracle and Snowflake fail on validation that only vars from PATTERN are allowed, however they still allow usage of non expanded names
also put this in jira

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but if this is in a comment people will trust this test. We have had incorrect test results previously.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to do same as Oracle, Snowflake as well
however then not clear why it should work for non expanded columns

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants