feat(encoding/csv): handle CSV byte-order marks #3143
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
bom
option to CSVStringifyOptions
, which prependsU+FEFF
(BOM/byte-order mark) to the outputtrimLeadingSpace
option is not setThis is largely for compatibility with MS Excel's handling of CSV files containing Unicode text:
文,字\r\n
is displayed asæ–‡ å—
, rather than the expected文 字
.For comparison, Google Sheets:
The current behavior of
std/encoding/csv
is to treat the BOM as a literal character when reading and always omit it when writing. This PR makes the new default behavior the same as what Google Sheets does (at least for UTF-8 files) — read either with or without BOM, but still always omit BOM when writing. Alternatively, by supplying the newbom: true
option tostringify
, a BOM is also prepended when writing.For completeness, some other possible behaviors could be:
trimLeadingSpace
or a dedicatedtrimBom
option). This seems like a bad idea, as certain valid UTF-8 CSVs generated by Excel would cause a parser error (e.g.${BYTE_ORDER_MARK}"a""b"\r\n
), while others would contain invalid data in the first cell (e.g./\D/.test(parse(`${BYTE_ORDER_MARK}123\r\n`)[0][0]) == true
).bom
inStringifyOptions
totrue
. Advantage would be ensuring round-trip compatibility with Excel given default options. However, that would be a breaking change, plus it seems uncalled for to choose a default solely due to Excel's bad behavior.