Skip to content

[SPARK-37575][SQL][FOLLOWUP] Add legacy flag for the breaking change of write null value in csv to unquoted empty string#36110

Closed
anchovYu wants to merge 3 commits intoapache:masterfrom
anchovYu:flags-null-to-csv
Closed

[SPARK-37575][SQL][FOLLOWUP] Add legacy flag for the breaking change of write null value in csv to unquoted empty string#36110
anchovYu wants to merge 3 commits intoapache:masterfrom
anchovYu:flags-null-to-csv

Conversation

@anchovYu
Copy link
Contributor

@anchovYu anchovYu commented Apr 7, 2022

What changes were proposed in this pull request?

Add a legacy flag spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv for the breaking change introduced in #34853 and #34905 (followup).

The flag is disabled by default, so the null values written as csv will output an unquoted empty string. When the legacy flag is enabled, the null will output quoted empty string.

Why are the changes needed?

The original commit is a breaking change, and breaking changes should be encouraged to add a flag to turn it off for smooth migration between versions.

Does this PR introduce any user-facing change?

With the default value of the conf, there is no user-facing difference.
If users turn this conf off, they can restore the pre-change behavior.

How was this patch tested?

Through unit tests.

@github-actions github-actions bot added the SQL label Apr 7, 2022
@anchovYu anchovYu changed the title [WIP][SPARK-37575][SQL][FOLLOWUP] Add legacy flag for the breaking change of write null value in csv to unquoted empty string [SPARK-37575][SQL][FOLLOWUP] Add legacy flag for the breaking change of write null value in csv to unquoted empty string Apr 8, 2022
@anchovYu anchovYu marked this pull request as ready for review April 8, 2022 00:07
@anchovYu
Copy link
Contributor Author

anchovYu commented Apr 8, 2022

Hi @cloud-fan @MaxGekk @wayneguow would you take a look? Thank you!

@cloud-fan
Copy link
Contributor

I'm not sure this is a breaking change. It changes what to write to CSV for null values, which changes the query result (a correctness bug fix) but doesn't break queries.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@anchovYu
Copy link
Contributor Author

After discussion with @cloud-fan , we still would like to treat it as a breaking change that needs a legacy flag. Hi @cloud-fan, could you review this PR? Thank you!

The test job running: https://github.com/anchovYu/spark/actions/runs/2168649483

@cloud-fan
Copy link
Contributor

thanks, merging to master/3.3!

@cloud-fan cloud-fan closed this in 965f872 Apr 15, 2022
cloud-fan pushed a commit that referenced this pull request Apr 15, 2022
…of write null value in csv to unquoted empty string

### What changes were proposed in this pull request?

Add a legacy flag `spark.sql.legacy.nullValueWrittenAsQuotedEmptyStringCsv` for the breaking change introduced in #34853 and #34905 (followup).

The flag is disabled by default, so the null values written as csv will output an unquoted empty string. When the legacy flag is enabled, the null will output quoted empty string.

### Why are the changes needed?
The original commit is a breaking change, and breaking changes should be encouraged to add a flag to turn it off for smooth migration between versions.

### Does this PR introduce _any_ user-facing change?
With the default value of the conf, there is no user-facing difference.
If users turn this conf off, they can restore the pre-change behavior.

### How was this patch tested?
Through unit tests.

Closes #36110 from anchovYu/flags-null-to-csv.

Authored-by: Xinyi Yu <xinyi.yu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 965f872)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan
Copy link
Contributor

Ah sorry I forgot about migration guide. @anchovYu can you open a followup PR to update https://github.com/apache/spark/pull/34905/files#diff-b6c16c7a1a4e60bacda4674b52f9eecde3c8a86f281fb3ef21052267f94c1f01R55 ?

@anchovYu
Copy link
Contributor Author

Oh sorry will do!

@anchovYu
Copy link
Contributor Author

The followup migration guide update: #36268

cloud-fan pushed a commit that referenced this pull request Apr 20, 2022
…acy flag for the breaking change of write null value in csv to unquoted empty string

### What changes were proposed in this pull request?
This is a follow-up of updating the migration guide for #36110 which adds a legacy flag to restore the pre-change behavior.
It also fixes a typo in the previous flag description.

### Why are the changes needed?
The flag needs to be documented.

### Does this PR introduce _any_ user-facing change?
It changes the migration doc for users.

### How was this patch tested?
No tests

Closes #36268 from anchovYu/flags-null-to-csv-migration-guide.

Authored-by: Xinyi Yu <xinyi.yu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Apr 20, 2022
…acy flag for the breaking change of write null value in csv to unquoted empty string

### What changes were proposed in this pull request?
This is a follow-up of updating the migration guide for #36110 which adds a legacy flag to restore the pre-change behavior.
It also fixes a typo in the previous flag description.

### Why are the changes needed?
The flag needs to be documented.

### Does this PR introduce _any_ user-facing change?
It changes the migration doc for users.

### How was this patch tested?
No tests

Closes #36268 from anchovYu/flags-null-to-csv-migration-guide.

Authored-by: Xinyi Yu <xinyi.yu@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit a67acba)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants