Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for Parsers changes #54

Merged
merged 2 commits into from
Oct 26, 2022
Merged

Fixes for Parsers changes #54

merged 2 commits into from
Oct 26, 2022

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Oct 25, 2022

No description provided.

Copy link
Collaborator

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still think the EOF changes need some more investigation --- i'm not sure Parsers.jl is behaving consistently

The other 2 changes LGTM

src/hybrid.jl Outdated
@@ -0,0 +1,29 @@
struct HybridString{T <: InlineString} <: AbstractString
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to add this in this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not! Thanks for noticing

test/runtests.jl Show resolved Hide resolved
test/runtests.jl Show resolved Hide resolved
("a,", InlineString7("a"), (; ignorerepeated=true), OK | EOF | DELIMITED),
("a__", InlineString7("a"), (; delim="__", ignorerepeated=true), OK | EOF | DELIMITED),
("\"a\",", InlineString7("a"), NamedTuple(), OK | QUOTED | DELIMITED), # quoted
("a,", InlineString7("a"), NamedTuple(), OK | DELIMITED),
Copy link
Collaborator

@nickrobinson251 nickrobinson251 Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I don't quite follow what's happening with EOF in Parsers.jl now (or before for that matter)

with Parsers.jl v2.4.2

julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED | EOF "

julia> Parsers.codes(Parsers.xparse(String, "a,\n").code)
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(String, "a,,"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED | EOF "

julia> Parsers.codes(Parsers.xparse(String, "a,,\n"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED | NEWLINE | EOF "

and now on Parsers#main:

julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(String, "a,\n").code)
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(String, "a,,"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(String, "a,,\n"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED | NEWLINE | EOF "

^ this last case "a,,\n" still looks odd to me

Copy link
Collaborator

@nickrobinson251 nickrobinson251 Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like finddelimiter is inconsistent on whether or not it looks for newlines after finding delimiter(s):
if ignorerepeated === false, we break as soon as we hit a delim; if ignorerepeated === true, we keep going even if we don't match (another) delim and look for newlines before finally we break if we find neither
https://github.com/JuliaData/Parsers.jl/blob/388613c73ab62341702a6af051577384235b21fc/src/components.jl#L272-L293

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not exactly sure how, but the 2.4.2 version for Strings was just wrong; so I'm going to make the tests here compatible, but at least moving forward they will be more consistent.

v2.4.2

julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED | EOF "

julia> Parsers.codes(Parsers.xparse(Int, "1,").code)
"SUCCESS: OK | DELIMITED "

#main

julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(Int, "1,").code)
"SUCCESS: OK | DELIMITED "

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think that sounds good

it's this apparent inconsistency on main i think is still odd:

julia> Parsers.codes(Parsers.xparse(String, "a,\n").code)  # seems correct
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(String, "a,,"; ignorerepeated=true).code)  # seems correct
"SUCCESS: OK | DELIMITED "

julia> Parsers.codes(Parsers.xparse(String, "a,,\n"; ignorerepeated=true).code)  # why `NEWLINE | EOF`  here?
"SUCCESS: OK | DELIMITED | NEWLINE | EOF "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm yes, that does seem off. Let me look into it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm guessing this is something to do with assuming ignorerepeated=true is only ever used with delim=' '? e.g. we don't want accidental trailing whitespace to cause an extra empty column or ever row in CSV, or something like that?

But i think this case where delim=',' and ignorerepeated=true shows it's pretty odd that ignorerepeated changes what happens after we stop hitting delim chars

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, thinking about the delim=' ' case helps make the most sense of this, and I think drives the requirement here. The general idea of ignorerepeated is "match any number of delimiters until we hit another field or the next row".

Copy link
Collaborator

@nickrobinson251 nickrobinson251 Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can kinda understand the NEWLINE code being added for "a,,\n", but why does it get EOF (when "a,," doesn't get EOF)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, now that does seem like a good inconsistency (er....bad inconsistency?). Yeah, it seems like that's wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aight, thanks for catching that Nick; got a fix here: JuliaData/Parsers.jl@883cfdc. CSV.jl tests seemed happy locally.

quinnj added a commit to JuliaData/Parsers.jl that referenced this pull request Oct 26, 2022
setting the EOF code after matching repeated delimiters, as noticed
by @nickrobinson251 [here](JuliaStrings/InlineStrings.jl#54 (comment)).
quinnj added a commit to JuliaData/Parsers.jl that referenced this pull request Oct 26, 2022
setting the EOF code after matching repeated delimiters, as noticed
by @nickrobinson251 [here](JuliaStrings/InlineStrings.jl#54 (comment)).
@quinnj quinnj merged commit 793e71f into main Oct 26, 2022
@quinnj quinnj deleted the jq/parsers_updates branch October 26, 2022 06:08
quinnj added a commit to JuliaData/Parsers.jl that referenced this pull request Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants