New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11661] [SQL] Still pushdown filters returned by unhandledFilters. #9634
Conversation
// When a filter is pushed to Parquet, Parquet can apply it to eveyr row. | ||
// So, we can check the number of rows returned from the Parquet | ||
// to make sure our filter pushdown work. | ||
assert(childRDD.count == 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liancheng Probably we need to have more tests. Also, is it the right place for the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I open an separate issue for this and try to work on this if it is right place (after resolving this issue)?
I would like to add some codes for ORC and Parquet I wrote already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. What issue is that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry lack of language skills..
I mean the test code (written above) checks if filters are really pushed down or not, which I thought is enough for this issue.
So, after this issue is resolved, I would like to change ParquetFilterSuite
and add some tests for ORC, maybe at a separate issue, I would like to open, that has the purpose for properly testing Parquet and ORC (maybe titled as refactoring.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see. No problem!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make a single issue for this but just made separate since for ORC separate test might be needed since there is no such OrcFilterPredicateSuite
(I think) due to the difficulty to check objects that are pushed down.
Filed here for Parquet and ORC.
https://issues.apache.org/jira/browse/SPARK-11676
https://issues.apache.org/jira/browse/SPARK-11677
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@HyukjinKwon Thanks for handling the tests!
test this please |
Test build #45667 has finished for PR 9634 at commit
|
Test build #45671 has finished for PR 9634 at commit
|
test this please |
Test build #45679 has finished for PR 9634 at commit
|
test this please |
// translated contains all filters that have been converted to the public Filter interface. | ||
// We should always push them to the data source no matter whether the data source can apply | ||
// a filter to every row or not. | ||
(unrecognizedPredicates ++ unhandledPredicates, translated.map(_._2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to make translated.map(_._2)
more explicit and readable as follows, since this part is particularly tricky to understand:
// Translated data source filters, no matter `relation` can handle them or not
val (_, translatedFilters) = translated.unzip
// ...
(unrecognizedPredicates ++ unhandledPredicates, translatedFilters)
|
@@ -336,4 +336,29 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex | |||
} | |||
} | |||
} | |||
|
|||
test("actual filter push down") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just a nit (maybe only for me) though.. Should this be such as "SPARK-11661 Still pushdown filters returned by unhandledFilters"?
Test build #45687 has finished for PR 9634 at commit
|
.queryExecution | ||
.executedPlan.asInstanceOf[org.apache.spark.sql.execution.Filter] | ||
.child. | ||
execute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: .child.
looks weird
Test build #45696 has finished for PR 9634 at commit
|
Test build #45694 has finished for PR 9634 at commit
|
Test build #45701 has finished for PR 9634 at commit
|
Thanks, merging to master and branch-1.6. |
https://issues.apache.org/jira/browse/SPARK-11661 Author: Yin Huai <yhuai@databricks.com> Closes #9634 from yhuai/unhandledFilters. (cherry picked from commit 14cf753) Signed-off-by: Cheng Lian <lian@databricks.com>
https://issues.apache.org/jira/browse/SPARK-11661 Author: Yin Huai <yhuai@databricks.com> Closes apache#9634 from yhuai/unhandledFilters.
https://issues.apache.org/jira/browse/SPARK-11661