Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Escaping unescaped quotes in quoted strings modifies the data #326

Closed
bcalco opened this issue Sep 10, 2023 · 2 comments
Closed

Escaping unescaped quotes in quoted strings modifies the data #326

bcalco opened this issue Sep 10, 2023 · 2 comments

Comments

@bcalco
Copy link

bcalco commented Sep 10, 2023

Running the 'input' command on a CSV with malformed quoted strings fixes them enough that they are able to be processed but modifies the data inappropriately.

For example, the following problematic column value in one of our test files:

"Choices "contact us" email address"

Note: the two spaces between "Choices" and "contact us" are in the original data.

Gets changed to:

"Choices contact us"" email address"""

But it should be:

"Choices ""contact us"" email address"

The command being run is:

xsv input <malformed-file> -o <target-file>

This is a very consistent error that, although allowing processing of the data (i.e. conformant parsers now accept the files), it subtly (and unacceptably) changes it in the process.

@BurntSushi
Copy link
Owner

I understand the request, but it's not reasonable to support. If your data is malformed, then that's the problem you should fix. It being malformed makes it impossible for xsv to choose a correct interpretation in every case and exposing options to control how different classes of malformed data are interpreted is not something I'm interested in doing.

although allowing processing of the data

This is the goal that xsv has.

@BurntSushi BurntSushi closed this as not planned Won't fix, can't repro, duplicate, stale Sep 10, 2023
@bcalco
Copy link
Author

bcalco commented Sep 10, 2023

The CSV parser author told me the same thing. lol.

The issue is, I don't own the data - I'm consuming third party data. So I have to find a way to scrub it. But I understand your position.

If I run into a case where the changes xsv made render it more broken, or introduced a new error, then I'll file a new ticket.

Thanks for the prompt reply, anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants