Skip to content

feature: auto-quote zero padded numeric values in CSV #3597

@kulnor

Description

@kulnor

I regularly run into CSV files containing zero-padded numbers that are not properly quoted. These can be identifiers, multi-digit codes, a US zip code, etc. For example:

person_id, us_zip,  education_level
0001,00123,03
0002,01234,08
0003,12345,99

When read by software, these are interpreted as a numeric data type (rightfully so), which then drops the leading zeros, causing various issues (invalid values, data loss, mismatched codes, incorrect data type, etc.).

The proper behaviour is to surround these with quotes, like:

person_id, us_zip,  education_level
"0001","00123","03"
"0002","01234","08"
"0003","12345","99"

So... could such a quality-of-life utility be added to our favorite QSV toolkit? It would essentially parse a CSV file and either update it or create a new properly encoded version.

I unfortunately do not think there is a fix for this in `\tab or other delimited files (of fixed ascii), the option then would be to convert to CSV (which such a utility may also properly take care of?).

Beyond this, parsers and other inference routines could be instructed to take this into account (which typically requires a double pass or top rows scan).

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions