-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-39543] The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1 #36941
Conversation
Could you please take a look when you have a time, thanks @cloud-fan |
@@ -531,6 +532,15 @@ class DataFrameWriterV2Suite extends QueryTest with SharedSparkSession with Befo | |||
assert(table.properties === (Map("provider" -> "foo") ++ defaultOwnership).asJava) | |||
} | |||
|
|||
test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") { | |||
spark.range(10).writeTo("table_name").option("compression", "zstd").using("parquet").create() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we test with InMemoryV1Provider
? We may migrate the file source to v2 completely in the future and then this test won't test the v1 fallback anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") { | ||
val provider = classOf[InMemoryV1Provider].getName | ||
val oldConf = spark.sessionState.conf.getConf(SQLConf.USE_V1_SOURCE_LIST) | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can use withSQLConf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Can one of the admins verify this patch? |
test("SPARK-39543 writeOption should be passed to storage properties when fallback to v1") { | ||
val provider = classOf[InMemoryV1Provider].getName | ||
|
||
withSQLConf((SQLConf.USE_V1_SOURCE_LIST.key, provider)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking at other tests that use InMemoryV1Provider
, it seems we don't need to set USE_V1_SOURCE_LIST
to trigger v1 fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, Other tests trigger v1 fallback without set USE_V1_SOURCE_LIST
, AFAIK,
- Other tests aim to test the read/write process, and the
InMemoryV1Provider
is actually a v2 format, and we trigger v1 fallback at thenewScanBuilder
&newWriteBuilder
layer. - This test in PR needs to be fallback to V1 when the table is created, so we need to set
USE_V1_SOURCE_LIST
(see: isV2Provider)
thanks, merging to master/3.3/3.2! |
…rage properties if fallback to v1 ### What changes were proposed in this pull request? The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats ### Why are the changes needed? example: `spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create` **before** gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ... **after** gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ... ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? new test Closes #36941 from Yikf/writeV2option. Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e5b7fb8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…rage properties if fallback to v1 The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats example: `spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create` **before** gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ... **after** gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ... No new test Closes #36941 from Yikf/writeV2option. Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e5b7fb8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…rage properties if fallback to v1 The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats example: `spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create` **before** gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.**snappy**.parquet ... **after** gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.**zstd**.parquet ... No new test Closes apache#36941 from Yikf/writeV2option. Authored-by: Yikf <yikaifei1@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e5b7fb8) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
The option of DataFrameWriterV2 should be passed to storage properties if fallback to v1, to support something such as compressed formats
Why are the changes needed?
example:
spark.range(0, 100).writeTo("t1").option("compression", "zstd").using("parquet").create
before
gen: part-00000-644a65ed-0e7a-43d5-8d30-b610a0fb19dc-c000.snappy.parquet ...
after
gen: part-00000-6eb9d1ae-8fdb-4428-aea3-bd6553954cdd-c000.zstd.parquet ...
Does this PR introduce any user-facing change?
No
How was this patch tested?
new test