Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4798][SQL] A new set of Parquet testing API and test suites #3644

Closed
wants to merge 5 commits into from

Conversation

liancheng
Copy link
Contributor

This PR provides a set Parquet testing API (see trait ParquetTest) that enables developers to write more concise test cases. A new set of Parquet test suites built upon this API are added and aim to replace the old ParquetQuerySuite. To avoid potential merge conflicts, old testing code are not removed yet. The following classes can be safely removed after most Parquet related PRs are handled:

  • ParquetQuerySuite
  • ParquetTestData

Review on Reviewable

private[spark] def unsetConf(key: String) {
settings -= key
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in ParquetTest.withSQLConf.

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24249 has started for PR 3644 at commit 83edb00.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24250 has started for PR 3644 at commit ee17d7b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24250 has finished for PR 3644 at commit ee17d7b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ParquetTest

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24250/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24249 has finished for PR 3644 at commit 83edb00.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ParquetTest

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24249/
Test PASSed.

@liancheng
Copy link
Contributor Author

Although it passed Jenkins, the first failure is rather weird. It seems that partitions collected via SchemaRDD.collect() can sometimes be out of order.

@liancheng
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24251 has started for PR 3644 at commit ee17d7b.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 9, 2014

Test build #24251 has finished for PR 3644 at commit ee17d7b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ParquetTest

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24251/
Test PASSed.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24304 has started for PR 3644 at commit 3bb8731.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24304 has finished for PR 3644 at commit 3bb8731.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ParquetTest

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24304/
Test FAILed.

@liancheng
Copy link
Contributor Author

While collecting data from a Parquet based SchemaRDD, the underlying Parquet split may be out of order, thus caused occasional test failures.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24314 has started for PR 3644 at commit 800e745.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Dec 10, 2014

Test build #24314 has finished for PR 3644 at commit 800e745.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ParquetTest

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24314/
Test PASSed.

@marmbrus
Copy link
Contributor

Thanks for doing this! We should use some of these helper functions in the other tests :)

Merged to master.

@asfgit asfgit closed this in 3b395e1 Dec 17, 2014
@liancheng liancheng deleted the parquet-tests branch December 18, 2014 04:19
asfgit pushed a commit that referenced this pull request Dec 30, 2014
This is a follow-up of #3367 and #3644.

At the time #3644 was written, #3367 hadn't been merged yet, thus `IsNull` and `IsNotNull` filters are not covered in the first version of `ParquetFilterSuite`. This PR adds corresponding test cases.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3748)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #3748 from liancheng/test-null-filters and squashes the following commits:

1ab943f [Cheng Lian] IsNull and IsNotNull Parquet filter test case for boolean type
bcd616b [Cheng Lian] Adds Parquet filter pushedown tests for IsNull and IsNotNull
asfgit pushed a commit that referenced this pull request Jan 21, 2015
This PR removes the deprecated `ParquetQuerySuite`, renamed `ParquetQuerySuite2` to `ParquetQuerySuite`, and refactored changes introduced in #4115 to `ParquetFilterSuite` . It is a follow-up of #3644.

Notice that test cases in the old `ParquetQuerySuite` have already been well covered by other test suites introduced in #3644.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4116)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes #4116 from liancheng/remove-deprecated-parquet-tests and squashes the following commits:

f73b8f9 [Cheng Lian] Removes deprecated Parquet test suite
bomeng pushed a commit to Huawei-Spark/spark that referenced this pull request Jan 22, 2015
This PR removes the deprecated `ParquetQuerySuite`, renamed `ParquetQuerySuite2` to `ParquetQuerySuite`, and refactored changes introduced in apache#4115 to `ParquetFilterSuite` . It is a follow-up of apache#3644.

Notice that test cases in the old `ParquetQuerySuite` have already been well covered by other test suites introduced in apache#3644.

<!-- Reviewable:start -->
[<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/4116)
<!-- Reviewable:end -->

Author: Cheng Lian <lian@databricks.com>

Closes apache#4116 from liancheng/remove-deprecated-parquet-tests and squashes the following commits:

f73b8f9 [Cheng Lian] Removes deprecated Parquet test suite
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants