Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941

Closed
wants to merge 1 commit into from

Conversation

yikf
Copy link
Contributor

@yikf yikf commented Jun 21, 2022

What changes were proposed in this pull request?

The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats

Why are the changes needed?

example:

spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create

before

gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.snappy.parquet ...

after

gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.zstd.parquet ...

Does this PR introduce any user-facing change?

No

How was this patch tested?

new test

@github-actions github-actions bot added the SQL label Jun 21, 2022
@yikf
Copy link
Contributor Author

yikf commented Jun 21, 2022

Could you please take a look when you have a time, thanks @cloud-fan

@@ -531,6 +532,15 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSparkSession with Befo
assert(table.properties === (Map("provider" -> "foo") ++ defaultOwnership).asJava)
}

test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") {
spark.range(10).writeTo("table_name").option("compression", "zstd").using("parquet").create()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we test with InMemoryV1Provider? We may migrate the file source to v2 completely in the future and then this test won't test the v1 fallback anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") {
val provider = classOf[InMemoryV1Provider].getName
val oldConf = spark.sessionState.conf.getConf(SQLConf.USE_V1_SOURCE_LIST)
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can use withSQLConf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") {
val provider = classOf[InMemoryV1Provider].getName

withSQLConf((SQLConf.USE_V1_SOURCE_LIST.key, provider)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at other tests that use InMemoryV1Provider, it seems we don't need to set USE_V1_SOURCE_LIST to trigger v1 fallback?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, Other tests trigger v1 fallback without set USE_V1_SOURCE_LIST , AFAIK,

  • Other tests aim to test the read/write process, and the InMemoryV1Provider is actually a v2 format, and we trigger v1 fallback at the newScanBuilder & newWriteBuilder layer.
  • This test in PR needs to be fallback to V1 when the table is created, so we need to set USE_V1_SOURCE_LIST(see: isV2Provider)

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.3/3.2!

@cloud-fan cloud-fan closed this in e5b7fb8 Jun 23, 2022
cloud-fan pushed a commit that referenced this pull request Jun 23, 2022
…rage properties if fallback to v1

### What changes were proposed in this pull request?

The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats

### Why are the changes needed?

example:

`spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create`

**before**

gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ...

**after**

gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ...

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
new test

Closes #36941 from Yikf/writeV2option.

Authored-by: Yikf <yikaifei1@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e5b7fb8)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Jun 23, 2022
…rage properties if fallback to v1

The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats

example:

`spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create`

**before**

gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ...

**after**

gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ...

No

new test

Closes #36941 from Yikf/writeV2option.

Authored-by: Yikf <yikaifei1@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e5b7fb8)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
…rage properties if fallback to v1

The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats

example:

`spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create`

**before**

gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ...

**after**

gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ...

No

new test

Closes apache#36941 from Yikf/writeV2option.

Authored-by: Yikf <yikaifei1@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e5b7fb8)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants