[SPARK-37575][SQL][FOLLOWUP] Add legacy flag for the breaking change of write null value in csv to unquoted empty string#36110
Conversation
|
Hi @cloud-fan @MaxGekk @wayneguow would you take a look? Thank you! |
|
I'm not sure this is a breaking change. It changes what to write to CSV for null values, which changes the query result (a correctness bug fix) but doesn't break queries. |
|
Can one of the admins verify this patch? |
|
After discussion with @cloud-fan , we still would like to treat it as a breaking change that needs a legacy flag. Hi @cloud-fan, could you review this PR? Thank you! The test job running: https://github.com/anchovYu/spark/actions/runs/2168649483 |
|
thanks, merging to master/3.3! |
…of write null value in csv to unquoted empty string ### What changes were proposed in this pull request? Add a legacy flag `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv` for the breaking change introduced in #34853 and #34905 (followup). The flag is disabled by default, so the null values written as csv will output an unquoted empty string. When the legacy flag is enabled, the null will output quoted empty string. ### Why are the changes needed? The original commit is a breaking change, and breaking changes should be encouraged to add a flag to turn it off for smooth migration between versions. ### Does this PR introduce _any_ user-facing change? With the default value of the conf, there is no user-facing difference. If users turn this conf off, they can restore the pre-change behavior. ### How was this patch tested? Through unit tests. Closes #36110 from anchovYu/flags-null-to-csv. Authored-by: Xinyi Yu <xinyi.yu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 965f872) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
|
Ah sorry I forgot about migration guide. @anchovYu can you open a followup PR to update https://github.com/apache/spark/pull/34905/files#diff-b6c16c7a1a4e60bacda4674b52f9eecde3c8a86f281fb3ef21052267f94c1f01R55 ? |
|
Oh sorry will do! |
|
The followup migration guide update: #36268 |
…acy flag for the breaking change of write null value in csv to unquoted empty string ### What changes were proposed in this pull request? This is a follow-up of updating the migration guide for #36110 which adds a legacy flag to restore the pre-change behavior. It also fixes a typo in the previous flag description. ### Why are the changes needed? The flag needs to be documented. ### Does this PR introduce _any_ user-facing change? It changes the migration doc for users. ### How was this patch tested? No tests Closes #36268 from anchovYu/flags-null-to-csv-migration-guide. Authored-by: Xinyi Yu <xinyi.yu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…acy flag for the breaking change of write null value in csv to unquoted empty string ### What changes were proposed in this pull request? This is a follow-up of updating the migration guide for #36110 which adds a legacy flag to restore the pre-change behavior. It also fixes a typo in the previous flag description. ### Why are the changes needed? The flag needs to be documented. ### Does this PR introduce _any_ user-facing change? It changes the migration doc for users. ### How was this patch tested? No tests Closes #36268 from anchovYu/flags-null-to-csv-migration-guide. Authored-by: Xinyi Yu <xinyi.yu@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit a67acba) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Add a legacy flag
spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsvfor the breaking change introduced in #34853 and #34905 (followup).The flag is disabled by default, so the null values written as csv will output an unquoted empty string. When the legacy flag is enabled, the null will output quoted empty string.
Why are the changes needed?
The original commit is a breaking change, and breaking changes should be encouraged to add a flag to turn it off for smooth migration between versions.
Does this PR introduce any user-facing change?
With the default value of the conf, there is no user-facing difference.
If users turn this conf off, they can restore the pre-change behavior.
How was this patch tested?
Through unit tests.