Skip to content

Revert "[GLUTEN-8966][VL] Propagate HashAggregate's ignoreNullKeys when possible"#10852

Merged
zml1206 merged 2 commits intoapache:mainfrom
zml1206:revert-8967
Oct 11, 2025
Merged

Revert "[GLUTEN-8966][VL] Propagate HashAggregate's ignoreNullKeys when possible"#10852
zml1206 merged 2 commits intoapache:mainfrom
zml1206:revert-8967

Conversation

@zml1206
Copy link
Contributor

@zml1206 zml1206 commented Oct 9, 2025

What changes are proposed in this pull request?

Reverts #8967.

Spark optimize rule InferFiltersFromConstraints will extract filter IsNotNull, rule PushDownPredicates will push down IsNotNull through aggregate. So the data processed by aggregate is already filtered, adding ignoreNullKeys will incur additional computational cost for velox.

How was this patch tested?

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Oct 9, 2025
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added the DOCS label Oct 9, 2025
@github-actions
Copy link

github-actions bot commented Oct 9, 2025

Run Gluten Clickhouse CI on x86

@zml1206
Copy link
Contributor Author

zml1206 commented Oct 9, 2025

Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems good to me.

@jackylee-ch
Copy link
Contributor

Thanks for clarifying. I just tested the test case added in #8967 on vanilla Spark — isNotNull is added after the file scan. So we shouldn’t worry about ignoreNullKeys right now.

@jackylee-ch
Copy link
Contributor

@WangGuangxin Are there any other cases that shows we require the ignoreNullKeys? If not, we might revert this PR.

@zhztheplayer
Copy link
Member

@beliefer @zml1206 I noticed this PR and #10811 are for similar purposes. Maybe it's a coincidence?

@zml1206
Copy link
Contributor Author

zml1206 commented Oct 10, 2025

@beliefer @zml1206 I noticed this PR and #10811 are for similar purposes. Maybe it's a coincidence?

Yes, the purpose is the same, but I think it is more appropriate to remove it. Even if there are other cases, it would be more appropriate to optimize it in the spark optimizer.

@beliefer
Copy link
Contributor

@zhztheplayer @zml1206 Whether reverting or modifying, I think it's okay.

@WangGuangxin
Copy link
Contributor

@beliefer @zml1206 Get. I'm ok to revert it

Copy link
Contributor

@jackylee-ch jackylee-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your great work!

@zml1206
Copy link
Contributor Author

zml1206 commented Oct 11, 2025

Thanks for review @jackylee-ch @zhztheplayer @WangGuangxin @beliefer , Merge into main.

@zml1206 zml1206 merged commit 359f38c into apache:main Oct 11, 2025
59 checks passed
wForget pushed a commit to wForget/gluten that referenced this pull request Oct 15, 2025
@zml1206 zml1206 deleted the revert-8967 branch December 9, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core DOCS VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants