Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure escaping scalar #13

Merged
merged 5 commits into from Mar 26, 2020
Merged

Configure escaping scalar #13

merged 5 commits into from Mar 26, 2020

Conversation

josh
Copy link
Contributor

@josh josh commented Mar 24, 2020

Hi @dehesa, thanks for publishing this library and keeping it super well documented. The code looks great.

I'm currently trying to parse a foreign TSV file that includes unescaped " characters in it's fields. I did see escapingScalar was exposed as an internal setting but wasn't publicly configurable. For my use case, disabling field escaping worked perfect. And it would be great to expose this ability on the reader configuration.

I added a new enum for "escaping strategy". My use case fits into the new .none case. And the default case is .doubleQuote. Though, I was wondering if we should definite the options narrowly and just provide the two, or allow for any character to be used as a escaping pair. Maybe there's a single quoted escaped CSV out there? Would love to hear your thoughts.

Thanks!
@josh

@dehesa
Copy link
Owner

dehesa commented Mar 25, 2020

Hi @josh, thank you for your kind words and taking the time to send a PR.

Exposing the escaping scalar in the API is the right thing to do and your code looks great. I would agree with you about generalizing the case to allow other escaping scalars (like single quotes). In that light, I would like to ask you the following:

  • Strategy.Escaping: Remove the ExpressibleByBooleanLiteral adoption and add ExpressibleByUnicodeScalarLiteral instead.
  • CSVReader.Setting: Check that self.trimCharacters doesn't contain the escaping scalar and if so throw a new error. That is something I forgot to check last time and may produce unintended behavior.
  • Please send the PR to develop instead of master.

Also, if the escaping scalar configuration is exposed in the CSVReader, we should do the same in CSVWriter 😄 so it has parity. I am happy to do that if you don't want to do it. In any case thank you very much on your thoughtful PR and code.

Regards,
Marcos

@dehesa dehesa assigned dehesa and josh and unassigned dehesa Mar 25, 2020
@dehesa dehesa added this to In Progress in Features Mar 25, 2020
@dehesa dehesa added the enhancement New feature or request label Mar 25, 2020
@josh
Copy link
Contributor Author

josh commented Mar 25, 2020

Exposing the escaping scalar in the API is the right thing to do and your code looks great. I would agree with you about generalizing the case to allow other escaping scalars (like single quotes). In that light, I would like to ask you the following:

👍 All sound good. Let me get on that today. Thanks for the feedback!!

Also, if the escaping scalar configuration is exposed in the CSVReader, we should do the same in CSVWriter 😄 so it has parity. I am happy to do that if you don't want to do it. In any case thank you very much on your thoughtful PR and code.

I was thinking about that as well and I'm happy to attempt. How do you think we should handle the .none case on Writer when a delimiter is present in the field? Should we output an invalid field as is? Or maybe throw some kind of error like unescapedDelimiter.

let input = [
    ["title"],
    ["This will be \"ok\" since there's no comma"],
    ["This field would be , invalid"]
]
try CSVWriter.serialize(rows: input) { $0.escapingStrategy = .none }

@dehesa
Copy link
Owner

dehesa commented Mar 25, 2020

How do you think we should handle the .none case on Writer when a delimiter is present in the field? Should we output an invalid field as is? Or maybe throw some kind of error like unescapedDelimiter.

Just throw a new CSVError<CSVWriter> similar to:

.init(.invalidInput, reason: "...", help: "...", userInfo: ["Invalid character": delimiter])

@josh josh changed the base branch from master to develop March 25, 2020 20:29
@josh
Copy link
Contributor Author

josh commented Mar 25, 2020

@dehesa pushed some changes. Please have a look. Thanks for all the suggestions so far.

@dehesa dehesa merged commit 511096c into dehesa:develop Mar 26, 2020
Features automation moved this from In Progress to Done Mar 26, 2020
@dehesa
Copy link
Owner

dehesa commented Mar 26, 2020

Hey @josh

I have reviewed the PR and it is great, thank you! I merged your changes and add some details that I forgot to mention before:

  • CSVReader.Settings initializer also checks the delimiter scalars are not included in the trim set.
  • CSVWriter.lowlevelWrite(field:) has been modified a bit to:
    • separate "escaping-allowed" from "escaping-not-allowed" cases.
    • escape the escaping scalar 😅. When escaping is allowed, if an escaping scalar is in the middle of the field, it needs to be doubled.

I have created a new release with your code and some structural changes I was working on to support Linux.

Thank you again! 🎉 If you are secluded at home (due to current circumstances) and want to keep helping out in CodableCSV check out the repo projects (specifically the Features project) 😂

Regards,
Marcos

@josh josh deleted the escaping-scalar branch March 26, 2020 16:39
@josh
Copy link
Contributor Author

josh commented Mar 26, 2020

Thank you @dehesa! I'll checkout the projects page for more ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Features
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants