Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans #43808

Closed
wants to merge 3 commits into from

Conversation

ahshahid
Copy link

What changes were proposed in this pull request?

Implementing equals and hashCode in the InMemoryBatchScan and InMemoryV2FilterBatchScan so that the pushed runtime filters are taken into account.

Why are the changes needed?

These test classes need the change so that the issue of re-use of exchange not happening in AQE when runtime filters are pushed to scan levels, can be reproduced. Currently either the tests are not actually validating the reuse of exchange happening when dynamic pruning expressions are pushed to scan levels in partitioned key based joins, or they are false passing because the equality of the test scans are not taking into account runtime filters.

Does this PR introduce any user-facing change?

No

How was this patch tested?

This is test class change., It is possibe that the existing tests may fail till the dependent fixes are made in source code.

Was this patch authored or co-authored using generative AI tooling?

no

@github-actions github-actions bot added the SQL label Nov 14, 2023
@HyukjinKwon HyukjinKwon changed the title SPARK-45926 : Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [SPARK-45926] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans Nov 15, 2023
@HyukjinKwon HyukjinKwon changed the title [SPARK-45926] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans [SPARK-45926][SQL] Implementing equals and hashCode which takes into account pushed runtime filters , in InMemoryTable related scans Nov 15, 2023
@HyukjinKwon
Copy link
Member

cc @ulysses-you and @peter-toth FYI

@ahshahid
Copy link
Author

@HyukjinKwon thanks for correcting the titles of the PRs. will take care next time..

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making a PR, @ahshahid .

Do you happen to know why we didn't catch the following? Do you think you can provide a test coverage for that?

If these classes implement equals and hashCode taking into account the pushed runtime filters, we would see that TPCDS Q14b which should ideally be reusing the exchange containing Union , is not happening due to multiple bugs which surface in AQE.

@ahshahid
Copy link
Author

ahshahid commented Nov 15, 2023

@dongjoon-hyun I think the reason for not catching the issue of reuse of exchange is a mix of multiple things

  1. Spark is not testing with any concrete DataSourceV2 implementation. ( like iceberg)
  2. The simulation of DataSourceV2 impl using InMemoryTableScan is buggy because of equals / hashcode not taking into account pushed runtime filters, as a result any reuse of exchange bug would not be caught ( i.e mismatch of cached exchange plans would not be detected, giving a false assurance of re-use0
  3. If I am not wrong, the tpcds tests are run using Hive as DataSource and not sure if it supports push down of runtime filters.
  4. The bug in AQE only shows in TPCDS if table are partitioned and equi join involves partitioning column. I am not sure if right now various tpcds tests use partitioned table or not.

Yes I have been able to reproduce the issue using InMemoryTableScans as DataSourceV2 impl for tpcds tests. I will checkin a prototype test for reproducing the bug using q14b and if needed all other queries of tpcds can be run.

Though I ought to point out that while running my test I also hit the issue of computeStats being called twice ( which throws error only in testing). I have not debugged that... yet. And not sure if the assertion of computeStats occuring only once is maintainable..

@dongjoon-hyun
Copy link
Member

Thank you so much for the details and a upcoming prototype test case.

For that issue, you can file a new JIRA issue.

Though I ought to point out that while running my test I also hit the issue of computeStats being called twice ( which throws error only in testing). I have not debugged that... yet. And not sure if the assertion of computeStats occuring only once is maintainable..

@beliefer
Copy link
Contributor

It seems this PR is unrelated to runtime filter. I guess you mean is DS V2 filter pushdown

@ahshahid
Copy link
Author

ahshahid commented Nov 21, 2023

Hi @dongjoon-hyun added bug test in PR SPARK-45866

Copy link

github-actions bot commented Mar 1, 2024

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Mar 1, 2024
@github-actions github-actions bot closed this Mar 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants