Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing through comments with --pass-comments may modify comments in CSV data #1346

Open
kusalananda opened this issue Aug 6, 2023 · 5 comments

Comments

@kusalananda
Copy link

When parsing a CSV data set that contains comments (lines starting with #), the --pass-comments option should ("I expect it to") cause Miller to pass the comment through unchanged. However, if such a comment line contains the input field delimiter the comment is changed to include quotes.

Example command:

mlr --csv --pass-comments cat <<'EXAMPLE_DATA'
field1,field2
a,b
# that was the first record
c,d
# that was the second record, and there is no more data
EXAMPLE_DATA

This outputs the following (note the changed comment line at the end):

field1,field2
a,b
# that was the first record
c,d
# that was the second record," and there is no more data"

Version tested is mlr version 6.8.0-dev for openbsd/amd64/go1.20.1.

@johnkerl johnkerl changed the title Passing through comments with --pass-comments may modify comments Passing through comments with --pass-comments may modify comments Aug 9, 2023
@johnkerl
Copy link
Owner

@kusalananda I wish I had a better answer here.

The way CSV records are detected are by handing them off to the CSV-parser library. And what it gives back is its own output, with no access to the underlying/original data.

In Miller 5 and below, when Miller was in C, I wrote my own CSV parser and could do whatever with it. In Miller 6 and above, which is in Go, I chose to use the https://pkg.go.dev/encoding/csv package, because hey, standard libraries are better, right? But in fact I've had to do all manner of workarounds for weird issues with it. It tempts me to go back and hand-roll my own CSV parser in Go --- basically, porting the Miller <= 5 CSV parser from C to Go.

That would be a fun project, and I'd enjoy it a lot. However, it would take a bit of time ....

@johnkerl johnkerl self-assigned this Aug 19, 2023
@johnkerl johnkerl removed their assignment Aug 19, 2023
@kusalananda
Copy link
Author

Hi there John,

My initial reason for submitting this issue here was me noticing it in passing while helping someone with an issue on the Unix&Linux StackExchange site (where I am a moderator and a frequent proponent of Miller). I noticed that the user's comment was modified in the way I mentioned here when I applied my suggested Miller solution to their data.

So, since it's an issue that I noticed in the result of a one-off short throwaway command when applied to what I think is example data, I'm not personally invested in or concerned about this. It's more of a "by the way, did you know" sort of bug report.

If you want to re-implement the CSV parser in our own Go code (after making sure no other CSV parsing library provides you with any hooks you can use for solving this), go ahead and do that. That might be an interesting project, but it may also give you more future headaches. I don't know.

Thanks, and be well!

@johnkerl
Copy link
Owner

@kusalananda thanks! I do want to make this change -- at some point in time -- when time allows. And I really appreciate you surfacing it. The software should not mutate comments. :)

@janxkoci
Copy link

Why not report the issue upstream so the library can be fixed instead? And when it's updated, miller can get the fix by updating its dependency.

@johnkerl
Copy link
Owner

@janxkoci 🤔 generally I would agree, but here, I'm using the Go CSV-parsing library in a way it really wasn't intended ... I think this codemod should stay in Miller.

@johnkerl johnkerl changed the title Passing through comments with --pass-comments may modify comments Passing through comments with --pass-comments may modify comments in CSV data Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants