Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[encoding/csv] Make the columns option more permissive while parsing #3225

Closed
devingfx opened this issue Mar 1, 2023 · 6 comments
Closed
Labels
enhancement New feature or request

Comments

@devingfx
Copy link

devingfx commented Mar 1, 2023

Describe the bug

If columns option is given to the parser (or the same with skipFirstRow: true) and a line doesn't contains the good number of columns an error is thrown and the whole process abord.
Sometimes files doesn't contains all the columns, especilly without quotes and with TSV separator, and it's ok and good practice to reduce the file size...

Steps to Reproduce

const content = `A B C
ok ok ok
still ok
 not first

first-only
`
// even with a line with empty columns should be parsed

try {
    let dataBeWithoutError = CSV.parse( content, { separator: ' ', skipFirstRow: true })
}
catch( happyness )
{
    console.error( 'cry you fool! and do it by hand' )
}

Expected behavior

Put undefined on undefined columns and don't throw !!

[
    { A: 'ok', B: 'ok', C: 'ok' },
    { A: 'still', B: 'ok', C: undefined },
    { A: '', B: 'not', C: 'first' },
    { A: undefined, B: undefined, C: undefined },
    { A: 'first-only', B: undefined, C: undefined },
]

Notice I distinguish empty strings columns and undefined columns like the 1st one of 3rd line

@devingfx devingfx added bug Something isn't working needs triage labels Mar 1, 2023
@kt3k kt3k added enhancement New feature or request and removed bug Something isn't working needs triage labels Mar 7, 2023
@sigmaSd
Copy link
Contributor

sigmaSd commented Mar 13, 2023

According to the spec https://csv-spec.org/ (4th point)

Each record MUST contain the same number of fields

@sigmaSd
Copy link
Contributor

sigmaSd commented Mar 13, 2023

I don't expect JSON.parse to parse invalid json and the same is with csv parser, my opinion is that the user should preprocess the file to make it a valid csv before trying to parse it

@timreichen
Copy link
Contributor

I agree with @sigmaSd. Throwing an error when trying to parse an invalid string is the correct and expected behavior.

@luk3skyw4lker
Copy link
Contributor

Yeah, I'm with @sigmaSd and @timreichen

This isn't a good and proper behavior when parsing files, if any data is invalid the package should throw an error to warn the user that the data type is invalid. This treatment should be done before any parsing.

@jgusta
Copy link

jgusta commented Apr 26, 2023

Shouldn't a valid empty field parse to an empty string, rather than undefined, anyway?

All fields are always strings. CSV itself does not support type casting to integers, floats, booleans, or anything else. It is not a CSV library’s responsibility to type cast input CSV data.

@iuioiua
Copy link
Collaborator

iuioiua commented Dec 8, 2023

I agree with the others here. The implementation shouldn't be retrofitted to work with incorrect data. Instead, the data needs to be correct. Thank you @sigmaSd, @timreichen and @luk3skyw4lker.

@iuioiua iuioiua closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants