Skip to content

[SPARK-56429][DOCS] Clarify differences between nullValue and emptyValue CSV options#55405

Open
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-56429-csv-docs
Open

[SPARK-56429][DOCS] Clarify differences between nullValue and emptyValue CSV options#55405
yadavay-amzn wants to merge 1 commit intoapache:masterfrom
yadavay-amzn:fix/SPARK-56429-csv-docs

Conversation

@yadavay-amzn
Copy link
Copy Markdown

@yadavay-amzn yadavay-amzn commented Apr 18, 2026

What changes were proposed in this pull request?

Update the CSV data source documentation to clarify how nullValue, emptyValue, and nanValue differ from each other:

  • nullValue: when this exact string is encountered in the CSV input, Spark treats the field as SQL NULL.
  • emptyValue: when a quoted empty string ("") is encountered, Spark substitutes this value instead. Only applies to string type columns.
  • nanValue: when this string is encountered, Spark treats it as NaN for float/double columns.

Why are the changes needed?

The previous descriptions used the same pattern ("Sets the string representation of ...") for all three options, which is misleading because nullValue matches input to produce null, while emptyValue specifies the output value to substitute. See SPARK-56429.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Documentation-only change.

Was this patch authored or co-authored using generative AI tooling?

Yes.

@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-56429-csv-docs branch from fcd8c53 to 1c8fcd5 Compare April 18, 2026 05:31
…lue CSV options

Update the CSV data source documentation to better explain how
nullValue, emptyValue, and nanValue differ from each other and
when each option applies.
@yadavay-amzn yadavay-amzn force-pushed the fix/SPARK-56429-csv-docs branch from 1c8fcd5 to 11627ee Compare April 21, 2026 01:38
@waterlx
Copy link
Copy Markdown
Contributor

waterlx commented Apr 23, 2026

@yadavay-amzn Thanks for the patch! May you review my comment to see if it makes any sense to you?

@yadavay-amzn
Copy link
Copy Markdown
Author

@yadavay-amzn Thanks for the patch! May you review my comment to see if it makes any sense to you?

@waterlx replied to JIRA comment asking for feedback from the previous developers who worked on this.

Looked to me like PR #22234 is unrelated to SPARK-56429, so we still need this PR to fix the issue. May be I'm missing something?

@waterlx
Copy link
Copy Markdown
Contributor

waterlx commented Apr 30, 2026

@yadavay-amzn Thanks for the patch! May you review my comment to see if it makes any sense to you?

@waterlx replied to JIRA comment asking for feedback from the previous developers who worked on this.

Looked to me like PR #22234 is unrelated to SPARK-56429, so we still need this PR to fix the issue. May be I'm missing something?

@yadavay-amzn Sorry if I made you confused and I did not quite get you.
Yes, PR #22234 does not relate to this PR you are working on, and I do think your PR is great to remove the vagueness of current doc, especially for new comers (like me).

I am not sure if my second comment made you confused, sorry for that if it did. Please allow me to explain:
There are 2 cases if specifing.options("nullValue", "NA")
(1) NA in CSV will be read as null.
(2) NA quoted by the "quote" specification, for example "NA", in CSV, will be also read as null. "quote" is also an option which be specified when reading CSV. See https://spark.apache.org/docs/latest/sql-data-sources-csv.html

My point is may you consider adding (2) into your PR?. Making any sense to you?

@waterlx
Copy link
Copy Markdown
Contributor

waterlx commented Apr 30, 2026

@mmolimar It is regarding #22234 about nullValue when reading CSV. I happened to find that the nullValue specification quoted by quote specification, will be also read as null. Say .options("nullValue", "NA") makes "NA" (quote defaults to a double quote) being read as null, in additional to NA. May you comment if it works as your original design?

@MaxGekk @HyukjinKwon May you review this PR by @yadavay-amzn, since you also worked on or reviewed #22234. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants