Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11661] [SQL] Still pushdown filters returned by unhandledFilters. #9634

Closed
wants to merge 6 commits into from
Closed

[SPARK-11661] [SQL] Still pushdown filters returned by unhandledFilters. #9634

wants to merge 6 commits into from

Conversation

yhuai
Copy link
Contributor

@yhuai yhuai commented Nov 11, 2015

// When a filter is pushed to Parquet, Parquet can apply it to eveyr row.
// So, we can check the number of rows returned from the Parquet
// to make sure our filter pushdown work.
assert(childRDD.count == 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liancheng Probably we need to have more tests. Also, is it the right place for the test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I open an separate issue for this and try to work on this if it is right place (after resolving this issue)?

I would like to add some codes for ORC and Parquet I wrote already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. What issue is that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry lack of language skills..

I mean the test code (written above) checks if filters are really pushed down or not, which I thought is enough for this issue.

So, after this issue is resolved, I would like to change ParquetFilterSuite and add some tests for ORC, maybe at a separate issue, I would like to open, that has the purpose for properly testing Parquet and ORC (maybe titled as refactoring.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I see. No problem!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to make a single issue for this but just made separate since for ORC separate test might be needed since there is no such OrcFilterPredicateSuite (I think) due to the difficulty to check objects that are pushed down.

Filed here for Parquet and ORC.

https://issues.apache.org/jira/browse/SPARK-11676
https://issues.apache.org/jira/browse/SPARK-11677

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon Thanks for handling the tests!

@yhuai
Copy link
Contributor Author

yhuai commented Nov 11, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45667 has finished for PR 9634 at commit 17832e0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45671 has finished for PR 9634 at commit 2ac188e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 11, 2015

test this please

@SparkQA
Copy link

SparkQA commented Nov 11, 2015

Test build #45679 has finished for PR 9634 at commit 2ac188e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor Author

yhuai commented Nov 11, 2015

test this please

// translated contains all filters that have been converted to the public Filter interface.
// We should always push them to the data source no matter whether the data source can apply
// a filter to every row or not.
(unrecognizedPredicates ++ unhandledPredicates, translated.map(_._2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to make translated.map(_._2) more explicit and readable as follows, since this part is particularly tricky to understand:

// Translated data source filters, no matter `relation` can handle them or not
val (_, translatedFilters) = translated.unzip

// ...
(unrecognizedPredicates ++ unhandledPredicates, translatedFilters)

@liancheng
Copy link
Contributor

testPruningAndFiltering test cases in SimpleHadoopFsRelationSuite also need to be updated (all the pushedFilters field).

@@ -336,4 +336,29 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex
}
}
}

test("actual filter push down") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just a nit (maybe only for me) though.. Should this be such as "SPARK-11661 Still pushdown filters returned by unhandledFilters"?

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45687 has finished for PR 9634 at commit 2ac188e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.queryExecution
.executedPlan.asInstanceOf[org.apache.spark.sql.execution.Filter]
.child.
execute()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: .child. looks weird

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45696 has finished for PR 9634 at commit 85a7cf6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45694 has finished for PR 9634 at commit f2c01a5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 12, 2015

Test build #45701 has finished for PR 9634 at commit 01509bf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

Thanks, merging to master and branch-1.6.

asfgit pushed a commit that referenced this pull request Nov 12, 2015
https://issues.apache.org/jira/browse/SPARK-11661

Author: Yin Huai <yhuai@databricks.com>

Closes #9634 from yhuai/unhandledFilters.

(cherry picked from commit 14cf753)
Signed-off-by: Cheng Lian <lian@databricks.com>
@asfgit asfgit closed this in 14cf753 Nov 12, 2015
dskrvk pushed a commit to dskrvk/spark that referenced this pull request Nov 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants