Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental RFC4180 compliant (ish) quoting support #25

Closed
wants to merge 2 commits into from
Closed

experimental RFC4180 compliant (ish) quoting support #25

wants to merge 2 commits into from

Conversation

Dridus
Copy link

@Dridus Dridus commented May 11, 2015

tested a little and seems to work, haven't benchmarked or tried a wide variety of data

see issue #24

tested a little and seems to work, haven't benchmarked or tried a wide variety of data
finish :: ([T.Text], Maybe T.Text) -> [T.Text]
finish (rest, Just dangling) = dangling : rest -- FIXME? just assumes the close quote if it's missing
finish (rest, Nothing ) = rest

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preface to comments: this is a fantastic PR. I really appreciate how you've taken care to make useful, documented definitions.

Is there a problem here with not stripping the first pass at tokenization? Eg, if a line of input is "hey", "you, guys", the second comma-separated piece starts with a space. Of course, naively stripping those pieces is no good either as we need to preserve internal spaces, but the prefix/suffix predicates should probably be applied to stripped Text values. This makes using a space as a quote character impossible, but I think that's a pretty pathological corner case. We may also want to preserve the stripping behavior of the previous version to ease the life of other Readable instances, too. I think all we would need to do is have a where part' = T.strip part in f, and then use part' in the right places.

Another question is whether it makes sense to do this as a foldr rather than a a left-associated thing like a mapAccumL. It seems like finish is going to force the whole thing, so there's not an opportunity for incrementally computing the result.

This is also complicated enough that it deserves to have some test cases written for it. Once we're happy with this PR, we can hash out some basic tests.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re Preface: Thanks!

yes, I missed the strip step. RFC4180 specifies no spaces between the comma and the quote, but I could imagine that people might want different. I'd think that stripping non-quoted tokens and preserving spaces inside quotes is noncontroversial though. We could also have some kind of ParserOption to turn on stripping inside quotes?

I used foldr just to avoid reversing the list at the end in finish, no particular laziness intent.

and yeah, definitely has several corner cases and should have test cases to go with them

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright I added the uncontroversial strip pending further discussion

@acowley
Copy link
Owner

acowley commented May 15, 2015

Merged, thank you!

@acowley acowley closed this May 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants