Skip to content

[CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule#4840

Open
yashlimbad wants to merge 1 commit intoapache:mainfrom
yashlimbad:correlate_subquery_fix
Open

[CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule#4840
yashlimbad wants to merge 1 commit intoapache:mainfrom
yashlimbad:correlate_subquery_fix

Conversation

@yashlimbad
Copy link
Copy Markdown

@yashlimbad yashlimbad commented Mar 18, 2026

Jira Link

CALCITE-7442

Changes

Adjust offset of correlated variable inside subquery when pushing filter via FilterJoinRule.java

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch 4 times, most recently from 53f3132 to fb20a82 Compare March 18, 2026 16:08
@xiedeyantu
Copy link
Copy Markdown
Member

There is an error in the CI that needs to be resolved.

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch from fb20a82 to e6ebfcc Compare March 19, 2026 02:58
@yashlimbad
Copy link
Copy Markdown
Author

Updated the Code.

@sonarqubecloud
Copy link
Copy Markdown

@xiedeyantu xiedeyantu requested a review from Copilot March 19, 2026 07:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a decorrelation/planning correctness issue (CALCITE-7442) where a correlated variable’s field index inside a subquery can become incorrect after FilterJoinRule pushes filters.

Changes:

  • Add a regression test that exercises FILTER_INTO_JOIN + FILTER_SUB_QUERY_TO_CORRELATE and then decorrelates, asserting expected plans.
  • Extend FilterJoinRule to propagate correlation-variable sets when creating pushed-down filters (and when keeping an above-join filter).
  • Enhance RelOptUtil.classifyFilters shifting logic so correlated field accesses inside RexSubQuery.rel can be adjusted during filter pushdown.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
core/src/test/java/org/apache/calcite/sql2rel/RelDecorrelatorTest.java Adds a regression test covering correlated variable index behavior through filter pushdown + decorrelation.
core/src/main/java/org/apache/calcite/rel/rules/FilterJoinRule.java Tracks correlation variables when constructing new Filters after pushdown.
core/src/main/java/org/apache/calcite/plan/RelOptUtil.java Extends filter-shifting to also adjust correlated field accesses inside subqueries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@xiedeyantu
Copy link
Copy Markdown
Member

I'm not sure if these comments will be helpful, as I'm not very familiar with this specific area. If someone more knowledgeable doesn't review this PR within the next few days, I'll give it a try myself.

@yashlimbad
Copy link
Copy Markdown
Author

some comments were looking helpful so I updated code now will check by running build if it goes through and push accordingly! 😄 it's fine if someone reviews code in few days, but I feel @julianhyde would be the best to review this because I see only his commit on RelOptUtil which I updated which is called from FilterJoinRule

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch 3 times, most recently from 2c9af44 to 815c57d Compare March 19, 2026 11:26
@yashlimbad
Copy link
Copy Markdown
Author

yashlimbad commented Mar 19, 2026

sometimes the tests passes sometimes fails randomly with below error in CI

Execution failed for task ':arrow:test'.
> Could not resolve all dependencies for configuration ':arrow:jacocoAgent'.
   > Could not load module metadata from /home/jenkins/.gradle/caches/modules-2/metadata-2.106/descriptors/org.jacoco/org.jacoco.agent/0.8.11/26c913274550a0b2221f47a0fe2d2358/descriptor.bin

even tho gradlew build is passed
it will be good if this CI run is consistent

@xiedeyantu xiedeyantu added the request review request a review from committers/contributors label Mar 19, 2026
@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch 2 times, most recently from e35441e to 55e4b1d Compare March 20, 2026 05:01
@yashlimbad
Copy link
Copy Markdown
Author

Hey @xiedeyantu ,
I think no one has reviewed the PR yet.
so I request you, can you please review this PR?

@xiedeyantu
Copy link
Copy Markdown
Member

Sorry, could I get back to you in a few days? I'm currently on vacation and don't have access to a computer for debugging.

@yashlimbad
Copy link
Copy Markdown
Author

sure, no problem!

@caicancai
Copy link
Copy Markdown
Member

@yashlimbad fix correlated variable's index inside subquery Your PR headline doesn't seem to match the Jira headline.

@yashlimbad yashlimbad changed the title [CALCITE-7442] fix correlated variable's index inside subquery [CALCITE-7442] Getting Wrong index of Correlated variable inside Subquery after FilterJoinRule Mar 25, 2026
@yashlimbad
Copy link
Copy Markdown
Author

my bad. updated @caicancai !

Copy link
Copy Markdown
Member

@caicancai caicancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two simple comments

* @param adjustments the amount to adjust each field by
* @param offset the amount to shift field accesses by when
* rewriting correlated subqueries
* @param correlateVariableChild the child relation providing the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment format here is strange.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name is big here, that's why it's going into it's doc. any suggestions on formatting?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, I'll take a look at other Calcite code later to see if there are any good solutions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps changing to a shorter, more concise variable name would solve the problem. 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, will get back on this tomorrow

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a shorter variable name to make formatting of comments nice is not a good reason.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, I'll look into whether there's a better way to handle this tomorrow.

@Override public RexNode visitSubQuery(RexSubQuery subQuery) {
boolean[] update = {false};
List<RexNode> clonedOperands = visitList(subQuery.operands, update);
if (update[0]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there might be an issue with the EXISTS subquery, but I'm not entirely sure. Could you add a similar test?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXISTS fails similar to IN clause, nice catch! thanks for this. will work on it

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed and added test

subQuery = subQuery.clone(subQuery.getType(), clonedOperands);
final Set<CorrelationId> variablesSet = RelOptUtil.getVariablesUsed(subQuery.rel);
if (!variablesSet.isEmpty() && correlateVariableChild != null) {
CorrelationId id = Iterables.getOnlyElement(variablesSet);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you assuming there's only one correlation variable?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, updating whole variable set now

@yashlimbad yashlimbad force-pushed the correlate_subquery_fix branch from 55e4b1d to 4f4341c Compare March 26, 2026 09:02
@caicancai
Copy link
Copy Markdown
Member

I have some questions that I might need to confirm with debugging. I will try my best to complete the review this week.

@yashlimbad
Copy link
Copy Markdown
Author

Great! thank you @caicancai

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

request review request a review from committers/contributors

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants