[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941

yikf · 2022-06-21T11:48:46Z

What changes were proposed in this pull request?

The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats

Why are the changes needed?

example:

spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create

before

gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.snappy.parquet ...

after

gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.zstd.parquet ...

Does this PR introduce any user-facing change?

No

How was this patch tested?

new test

yikf · 2022-06-21T11:54:05Z

Could you please take a look when you have a time, thanks @cloud-fan

cloud-fan · 2022-06-21T14:56:17Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala

@@ -531,6 +532,15 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSparkSession with Befo
    assert(table.properties === (Map("provider" -> "foo") ++ defaultOwnership).asJava)
  }

+  test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") {
+    spark.range(10).writeTo("table_name").option("compression", "zstd").using("parquet").create()


can we test with InMemoryV1Provider? We may migrate the file source to v2 completely in the future and then this test won't test the v1 fallback anymore.

cloud-fan · 2022-06-22T05:12:59Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala

+  test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") {
+    val provider = classOf[InMemoryV1Provider].getName
+    val oldConf = spark.sessionState.conf.getConf(SQLConf.USE_V1_SOURCE_LIST)
+      try {


nit: we can use withSQLConf

…allback to v1

AmplabJenkins · 2022-06-22T11:07:18Z

Can one of the admins verify this patch?

cloud-fan · 2022-06-22T14:57:56Z

sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala

+  test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") {
+    val provider = classOf[InMemoryV1Provider].getName
+
+    withSQLConf((SQLConf.USE_V1_SOURCE_LIST.key, provider)) {


looking at other tests that use InMemoryV1Provider, it seems we don't need to set USE_V1_SOURCE_LIST to trigger v1 fallback?

Yea, Other tests trigger v1 fallback without set USE_V1_SOURCE_LIST , AFAIK,

Other tests aim to test the read/write process, and the InMemoryV1Provider is actually a v2 format, and we trigger v1 fallback at the newScanBuilder & newWriteBuilder layer.

This test in PR needs to be fallback to V1 when the table is created, so we need to set USE_V1_SOURCE_LIST(see: isV2Provider)

cloud-fan · 2022-06-23T05:04:01Z

thanks, merging to master/3.3/3.2!

…rage properties if fallback to v1 ### What changes were proposed in this pull request? The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats ### Why are the changes needed? example: `spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create` **before** gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ... **after** gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ... ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test Closes #36941 from Yikf/writeV2option. Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e5b7fb8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…rage properties if fallback to v1 The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats example: `spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create` **before** gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ... **after** gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ... No new test Closes #36941 from Yikf/writeV2option. Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e5b7fb8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…rage properties if fallback to v1 The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats example: `spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create` **before** gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ... **after** gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ... No new test Closes apache#36941 from Yikf/writeV2option. Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e5b7fb8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>

github-actions bot added the SQL label Jun 21, 2022

cloud-fan reviewed Jun 21, 2022

View reviewed changes

cloud-fan reviewed Jun 22, 2022

View reviewed changes

cloud-fan approved these changes Jun 22, 2022

View reviewed changes

SPARK-39543 writeOption should be passed to storage properties when f…

fa482a0

…allback to v1

cloud-fan reviewed Jun 22, 2022

View reviewed changes

cloud-fan closed this in e5b7fb8 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941

[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941

yikf commented Jun 21, 2022

yikf commented Jun 21, 2022

cloud-fan Jun 21, 2022

yikf Jun 22, 2022

cloud-fan Jun 22, 2022

yikf Jun 22, 2022

AmplabJenkins commented Jun 22, 2022

cloud-fan Jun 22, 2022

yikf Jun 23, 2022

cloud-fan commented Jun 23, 2022

[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941

[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941

Conversation

yikf commented Jun 21, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

yikf commented Jun 21, 2022

cloud-fan Jun 21, 2022

Choose a reason for hiding this comment

yikf Jun 22, 2022

Choose a reason for hiding this comment

cloud-fan Jun 22, 2022

Choose a reason for hiding this comment

yikf Jun 22, 2022

Choose a reason for hiding this comment

AmplabJenkins commented Jun 22, 2022

cloud-fan Jun 22, 2022

Choose a reason for hiding this comment

yikf Jun 23, 2022

Choose a reason for hiding this comment

cloud-fan commented Jun 23, 2022