-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for Parsers changes #54
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still think the EOF changes need some more investigation --- i'm not sure Parsers.jl is behaving consistently
The other 2 changes LGTM
src/hybrid.jl
Outdated
@@ -0,0 +1,29 @@ | |||
struct HybridString{T <: InlineString} <: AbstractString |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you mean to add this in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not! Thanks for noticing
("a,", InlineString7("a"), (; ignorerepeated=true), OK | EOF | DELIMITED), | ||
("a__", InlineString7("a"), (; delim="__", ignorerepeated=true), OK | EOF | DELIMITED), | ||
("\"a\",", InlineString7("a"), NamedTuple(), OK | QUOTED | DELIMITED), # quoted | ||
("a,", InlineString7("a"), NamedTuple(), OK | DELIMITED), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I don't quite follow what's happening with EOF
in Parsers.jl now (or before for that matter)
with Parsers.jl v2.4.2
julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED | EOF "
julia> Parsers.codes(Parsers.xparse(String, "a,\n").code)
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(String, "a,,"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED | EOF "
julia> Parsers.codes(Parsers.xparse(String, "a,,\n"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED | NEWLINE | EOF "
and now on Parsers#main
:
julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(String, "a,\n").code)
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(String, "a,,"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(String, "a,,\n"; ignorerepeated=true).code)
"SUCCESS: OK | DELIMITED | NEWLINE | EOF "
^ this last case "a,,\n"
still looks odd to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like finddelimiter
is inconsistent on whether or not it looks for newlines after finding delimiter(s):
if ignorerepeated === false
, we break
as soon as we hit a delim; if ignorerepeated === true
, we keep going even if we don't match (another) delim and look for newlines before finally we break
if we find neither
https://github.com/JuliaData/Parsers.jl/blob/388613c73ab62341702a6af051577384235b21fc/src/components.jl#L272-L293
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not exactly sure how, but the 2.4.2 version for Strings was just wrong; so I'm going to make the tests here compatible, but at least moving forward they will be more consistent.
v2.4.2
julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED | EOF "
julia> Parsers.codes(Parsers.xparse(Int, "1,").code)
"SUCCESS: OK | DELIMITED "
#main
julia> Parsers.codes(Parsers.xparse(String, "a,").code)
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(Int, "1,").code)
"SUCCESS: OK | DELIMITED "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i think that sounds good
it's this apparent inconsistency on main
i think is still odd:
julia> Parsers.codes(Parsers.xparse(String, "a,\n").code) # seems correct
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(String, "a,,"; ignorerepeated=true).code) # seems correct
"SUCCESS: OK | DELIMITED "
julia> Parsers.codes(Parsers.xparse(String, "a,,\n"; ignorerepeated=true).code) # why `NEWLINE | EOF` here?
"SUCCESS: OK | DELIMITED | NEWLINE | EOF "
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm yes, that does seem off. Let me look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm guessing this is something to do with assuming ignorerepeated=true
is only ever used with delim=' '
? e.g. we don't want accidental trailing whitespace to cause an extra empty column or ever row in CSV, or something like that?
But i think this case where delim=','
and ignorerepeated=true
shows it's pretty odd that ignorerepeated
changes what happens after we stop hitting delim
chars
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thinking about the delim=' '
case helps make the most sense of this, and I think drives the requirement here. The general idea of ignorerepeated
is "match any number of delimiters until we hit another field or the next row".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can kinda understand the NEWLINE
code being added for "a,,\n"
, but why does it get EOF
(when "a,,"
doesn't get EOF
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now that does seem like a good inconsistency (er....bad inconsistency?). Yeah, it seems like that's wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aight, thanks for catching that Nick; got a fix here: JuliaData/Parsers.jl@883cfdc. CSV.jl tests seemed happy locally.
4010153
to
d654c4e
Compare
setting the EOF code after matching repeated delimiters, as noticed by @nickrobinson251 [here](JuliaStrings/InlineStrings.jl#54 (comment)).
setting the EOF code after matching repeated delimiters, as noticed by @nickrobinson251 [here](JuliaStrings/InlineStrings.jl#54 (comment)).
setting the EOF code after matching repeated delimiters, as noticed by @nickrobinson251 [here](JuliaStrings/InlineStrings.jl#54 (comment)).
No description provided.