New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23341][SQL] define some standard options for data source v2 #20535

Closed
wants to merge 4 commits into
base: master
from

Conversation

Projects
None yet
6 participants
@cloud-fan
Contributor

cloud-fan commented Feb 7, 2018

What changes were proposed in this pull request?

Each data source implementation can define its own options and teach its users how to set them. Spark doesn't have any restrictions about what options a data source should or should not have. It's possible that some options are very common and many data sources use them. However different data sources may define the common options(key and meaning) differently, which is quite confusing to end users.

This PR defines some standard options that data sources can optionally adopt: path, table and database.

How was this patch tested?

a new test case.

@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan
Contributor

cloud-fan commented Feb 7, 2018

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 7, 2018

Test build #87168 has finished for PR 20535 at commit c9009d8.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 7, 2018

Test build #87168 has finished for PR 20535 at commit c9009d8.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.
@rdblue

This comment has been minimized.

Show comment
Hide comment
@rdblue

rdblue Feb 7, 2018

Contributor

This should move the standard options to DataSourceV2Relation to avoid needing to instantiate DataSourceOptions wherever the relation is created.

Contributor

rdblue commented Feb 7, 2018

This should move the standard options to DataSourceV2Relation to avoid needing to instantiate DataSourceOptions wherever the relation is created.

@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 8, 2018

Contributor

This should move the standard options to DataSourceV2Relation to avoid needing to instantiate DataSourceOptions wherever the relation is created.

@rdblue We don't have this problem now, so I'd like to not touch DataSourceV2Relation here and rethink about it when the problem really comes out.

Contributor

cloud-fan commented Feb 8, 2018

This should move the standard options to DataSourceV2Relation to avoid needing to instantiate DataSourceOptions wherever the relation is created.

@rdblue We don't have this problem now, so I'd like to not touch DataSourceV2Relation here and rethink about it when the problem really comes out.

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87186 has finished for PR 20535 at commit 86bcda9.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87186 has finished for PR 20535 at commit 86bcda9.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87187 has finished for PR 20535 at commit 3e8f71b.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87187 has finished for PR 20535 at commit 3e8f71b.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87191 has finished for PR 20535 at commit 6644e49.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87191 has finished for PR 20535 at commit 6644e49.

  • This patch fails to generate documentation.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87194 has finished for PR 20535 at commit e92b6b2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87194 has finished for PR 20535 at commit e92b6b2.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Feb 8, 2018

Contributor

retest this please

Contributor

cloud-fan commented Feb 8, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Feb 8, 2018

Test build #87207 has finished for PR 20535 at commit e92b6b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Feb 8, 2018

Test build #87207 has finished for PR 20535 at commit e92b6b2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 9, 2018

Test build #89069 has finished for PR 20535 at commit c811d72.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Apr 9, 2018

Test build #89069 has finished for PR 20535 at commit c811d72.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 10, 2018

Test build #89098 has finished for PR 20535 at commit c5e403c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Apr 10, 2018

Test build #89098 has finished for PR 20535 at commit c5e403c.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Apr 10, 2018

Contributor

retest this please

Contributor

cloud-fan commented Apr 10, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 10, 2018

Test build #89116 has finished for PR 20535 at commit c5e403c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Apr 10, 2018

Test build #89116 has finished for PR 20535 at commit c5e403c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@gengliangwang

LGTM

@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Apr 16, 2018

Contributor

retest this please

Contributor

cloud-fan commented Apr 16, 2018

retest this please

@SparkQA

This comment has been minimized.

Show comment
Hide comment
@SparkQA

SparkQA Apr 16, 2018

Test build #89382 has finished for PR 20535 at commit c5e403c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

SparkQA commented Apr 16, 2018

Test build #89382 has finished for PR 20535 at commit c5e403c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@cloud-fan

This comment has been minimized.

Show comment
Hide comment
@cloud-fan

cloud-fan Apr 18, 2018

Contributor

thanks, merging to master!

Contributor

cloud-fan commented Apr 18, 2018

thanks, merging to master!

@asfgit asfgit closed this in 310a8cd Apr 18, 2018

Dataset.ofRows(sparkSession, DataSourceV2Relation.create(
ds, extraOptions.toMap ++ sessionOptions,
ds, extraOptions.toMap ++ sessionOptions + pathsOption,

This comment has been minimized.

@gatorsmile

gatorsmile Apr 23, 2018

Member

issue an exception when extraOptions("path") is not empty?

@gatorsmile

gatorsmile Apr 23, 2018

Member

issue an exception when extraOptions("path") is not empty?

This comment has been minimized.

@cloud-fan

cloud-fan Apr 24, 2018

Contributor

Basically we may have duplicated entries in session configs and DataFrameReader/Writer options, not only path. The rule is, DataFrameReader/Writer options should overwrite session configs.

cc @jiangxb1987 can you submit a PR to explicitly document it in SessionConfigSupport?

@cloud-fan

cloud-fan Apr 24, 2018

Contributor

Basically we may have duplicated entries in session configs and DataFrameReader/Writer options, not only path. The rule is, DataFrameReader/Writer options should overwrite session configs.

cc @jiangxb1987 can you submit a PR to explicitly document it in SessionConfigSupport?

pepinoflo added a commit to pepinoflo/spark that referenced this pull request May 15, 2018

[SPARK-23341][SQL] define some standard options for data source v2
## What changes were proposed in this pull request?

Each data source implementation can define its own options and teach its users how to set them. Spark doesn't have any restrictions about what options a data source should or should not have. It's possible that some options are very common and many data sources use them. However different data sources may define the common options(key and meaning) differently, which is quite confusing to end users.

This PR defines some standard options that data sources can optionally adopt: path, table and database.

## How was this patch tested?

a new test case.

Author: Wenchen Fan <wenchen@databricks.com>

Closes apache#20535 from cloud-fan/options.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment