-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
option to ignore whitespace stripping within quotes #109
Comments
Sorry for the slow response here. @nickrobinson251, as I consider this, it seems to me that it's actually incorrect in our original I think I perhaps just got too over-eager in that first implementation and didn't realize that we're over-stripping. Does that make sense? Do we think there would still be a use-case for stripping the whitespace inside quotes? I guess there are sometimes cases where a csv writer just quotes everything and you might want to also be able to strip inside quotes, but that feels like an additional option we'd want to be opt-in, since you're electing to ignore stuff inside your quotes. |
Yeah, unfortunately i have this use-case, so it does exist 😞 I agree stripping whitespace within quotes should be opt-in and not the default for what |
Fixes #109. As noted in that issue, stripping whitespace *within* quoted strings, IMO, should be considered a bug, since one of the primary reasons for quoting strings in various applications is to delineate the exact characters that make up the string. This PR fixes `stripwhitespace` to preserve whitepace encountered within strings, and only strip whitespace for non-quoted strings (leading or trailing) and leading/trailing around quoted fields. On the other hand, there are legitimate use-cases for also stripping whitespace within quoted strings, so we add a new opt-in `stripquoted` keyword argument that allows the additional precision of also stripping whitespace inside quotes. Note that passing `stripquoted=true` implies `stripwhitespace=true`, so it can be considered a "stronger" version of `stripewhitespace`.
Alright, PR is up (#112) to fix |
* Add new stripquoted keyword arg and fix stripwhitespace Fixes #109. As noted in that issue, stripping whitespace *within* quoted strings, IMO, should be considered a bug, since one of the primary reasons for quoting strings in various applications is to delineate the exact characters that make up the string. This PR fixes `stripwhitespace` to preserve whitepace encountered within strings, and only strip whitespace for non-quoted strings (leading or trailing) and leading/trailing around quoted fields. On the other hand, there are legitimate use-cases for also stripping whitespace within quoted strings, so we add a new opt-in `stripquoted` keyword argument that allows the additional precision of also stripping whitespace inside quotes. Note that passing `stripquoted=true` implies `stripwhitespace=true`, so it can be considered a "stronger" version of `stripewhitespace`. * Update src/Parsers.jl Co-authored-by: Nick Robinson <npr251@gmail.com> * Update test/runtests.jl Co-authored-by: Nick Robinson <npr251@gmail.com> Co-authored-by: Nick Robinson <npr251@gmail.com>
Hey @quinnj,
I'm rewriting our data loading at the moment, migrating to
Parsers.jl
.My request is more or less the opposite of #106: For CSV parsing, it would be great to provide an option that allows us strip whitespaces around unquoted fields, but leave it within quotes.
For example, a CSV
should Ideally parse into
["A", "B", "C", "D"]
for the first line andfor the second.
Would it be straightforward to add that as an option?
The text was updated successfully, but these errors were encountered: