Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-33169][SQL][TESTS] Check propagation of datasource options to underlying file system for built-in file-based datasources #30067

Closed
wants to merge 11 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Oct 16, 2020

What changes were proposed in this pull request?

  1. Add the common trait CommonFileDataSourceSuite with tests that can be executed for all built-in file-based datasources.
  2. Add a test CommonFileDataSourceSuite to check that datasource options are propagated to underlying file systems as Hadoop configs.
  3. Mix CommonFileDataSourceSuite to AvroSuite, OrcSourceSuite, TextSuite, JsonSuite, CSVSuiteand toParquetFileFormatSuite`.
  4. Remove duplicated tests from AvroSuite and from OrcSourceSuite.

Why are the changes needed?

To improve test coverage and test all built-in file-based datasources.

Does this PR introduce any user-facing change?

No

How was this patch tested?

By running the affected test suites.

@MaxGekk
Copy link
Member Author

MaxGekk commented Oct 16, 2020

@cloud-fan @HyukjinKwon Please, take a look at it when you have time.

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34501/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34501/

@SparkQA
Copy link

SparkQA commented Oct 16, 2020

Test build #129896 has finished for PR 30067 at commit dc7142a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.apache.spark.sql.catalyst.plans.SQLHelper
import org.apache.spark.sql.test.SQLTestData

// The trait contains tests for all file-based data sources.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some comments here about FileBasedDataSourceSuite vs CommonFileDataSourceSuite by linking properly?

For example, we can say like: CommonFileDataSourceSuite requires to run the same codes in all file based data sources. FileBasedDataSourceSuite is a more loose place where we put the tests with some variants/differences.

We should also better mention that libsvm is not included.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I included libsvm as well.


// The trait contains tests for all file-based data sources.
trait CommonFileDataSourceSuite {
self: QueryTest with AnyFunSuite with SQLTestData with SQLHelper =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just simply self-type QueryTest alone without other traits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check that.

Copy link
Member Author

@MaxGekk MaxGekk Oct 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requiring QueryTest is too much. I made it more simple.

@SparkQA
Copy link

SparkQA commented Oct 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34564/

@SparkQA
Copy link

SparkQA commented Oct 18, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34564/

@SparkQA
Copy link

SparkQA commented Oct 18, 2020

Test build #129958 has finished for PR 30067 at commit 1a22f1d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 18, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34566/

@SparkQA
Copy link

SparkQA commented Oct 18, 2020

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34566/

@SparkQA
Copy link

SparkQA commented Oct 18, 2020

Test build #129960 has finished for PR 30067 at commit e579d48.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • trait CommonFileDataSourceSuite extends SQLHelper

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good otherwise.

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34594/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Test build #129987 has finished for PR 30067 at commit a397ead.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34594/

@HyukjinKwon
Copy link
Member

retest this please

@HyukjinKwon
Copy link
Member

I am going to merge this. The tests all passed at https://github.com/apache/spark/runs/1271560150, and the last change is only comments.

@HyukjinKwon
Copy link
Member

Merged to master.

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34602/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34602/

@SparkQA
Copy link

SparkQA commented Oct 19, 2020

Test build #129996 has finished for PR 30067 at commit a397ead.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk MaxGekk deleted the ds-options-common-test branch December 11, 2020 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants