Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an option to write without CSV escapes? #107

Open
jondegenhardt opened this issue Nov 9, 2017 · 8 comments
Open

Is there an option to write without CSV escapes? #107

jondegenhardt opened this issue Nov 9, 2017 · 8 comments

Comments

@jondegenhardt
Copy link

jondegenhardt commented Nov 9, 2017

A question: Is there an option to perform output without the CSV escape syntax? This would be to generate a more strict TSV format, without escapes.

I don't see this, and the documentation is pretty good. I'd just like to make sure I haven't missed something. There are a number of options to the fmt command that provide control over the escaping used, but I didn't see one turning it off.

Some examples:

$ # fmt -t will change the delimiter and drop surrounding quotes (without -quote-always)
$ echo '"abc","def"' | xsv fmt -t $'\t'
abc	def

$ # Escapes are generated if a field contains a quote
$ echo '"abc","d""ef"' | xsv fmt -t $'\t'
abc	"d""ef"

$ # In tsv the result would be:
$ #    abc	d"ef

$# Similarly with embedded field and record separators (tab/newline).
$# In TSV they are disallowed, and might be replaced by a space when encountered.
$ echo $'"abc","d\tef"' | xsv fmt -t $'\t'
abc	"d	ef"

$ # In the above, the embedded tab character was retained.

Again, I'm only asking if there is an option I haven't found. In the examples above the fmt command is doing exactly what it says, which is to change the CSV delimiter character.

@BurntSushi
Copy link
Owner

@jondegenhardt Thanks for the detailed question! I do not believe there is any such option. In fact, the underlying CSV writer doesn't support it, so that's how I know there isn't any such option. The CSV writer options are here: https://docs.rs/csv/1.0.0-beta.5/csv/struct.WriterBuilder.html --- we might consider changing escape to accept an Option<u8>, and when it and double_quote are disabled, then no escaping is performed. We would also need to add a --quote-never option I suppose.

The last bit is silently changing \t and \n into something else, which gets more complicated.

My estimation is that this is a bit of an awkward fit for xsv at the moment.

@jondegenhardt
Copy link
Author

Very good, thanks for the detailed response. The CSV doc reference is helpful.

@unhammer
Copy link

unhammer commented Apr 5, 2019

I was about to open a new issue about this, cf. comments from #67 (comment) and down, but I see this has been closed already. @jondegenhardt if you still have a need for --quote-never, xsv 0.13.0 seems to do this if you pass in the ASCII character 1, though as it's not documented anywhere I guess it comes with no guarantees :-)

$ printf 'user\tutterance\njoe\tSay "hi"\n'|xsv select -d $'\t' utterance \
    | xsv fmt -t $'\t'
utterance
"Say ""hi"""

$ printf 'user\tutterance\njoe\tSay "hi"\n'|xsv select -d $'\t' utterance \
    | xsv fmt --quote $'\1' -t $'\t'
utterance
Say "hi"

printf 'user\tutterance\njoe\tSay "hi"\n'|xsv select -d $'\t' utterance \
   | xsv fmt --quote $'\1' -t $'\t' \
   | grep -c $'\1'
0

@BurntSushi
Copy link
Owner

@unhammer Just to clarify, using the ASCII byte 1 only works because it presumably does not appear in your input anywhere. If it did, then it would need to be quoted. Moreover, if your input contained a field that spanned multiple lines, then it would also need to be quoted.

I'm not sure why this was closed. The underlying CSV writer does support it, so I think this is as easy as adding a new --quote-never flag and hooking it up.

@BurntSushi BurntSushi reopened this Apr 5, 2019
@unhammer
Copy link

unhammer commented Apr 5, 2019

Aha, thanks for the clarification, good to know the exact dangers involved.

@jondegenhardt
Copy link
Author

Reason I closed it was that my question had been answered. Didn't mean to suggest the feature would not be useful.

@bosr
Copy link

bosr commented Oct 14, 2019

Hi, I think --quote-never would help in the aforementioned cases. Do you have a plan to implement this?

Thanks!

(and congrats for the tool, it's great)

@LemmingAvalanche
Copy link

A question: Is there an option to perform output without the CSV escape syntax? This would be to generate a more strict TSV format, without escapes.

What’s the behavior when a tab or newline is encountered in the data? Are they just converted to spaces (the example says “and might be replaced by a space when encountered”)? Or should the program error out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants