-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend Options' inputs validation #152
Conversation
Codecov ReportBase: 89.41% // Head: 89.41% // No change to project coverage 👍
Additional details and impacted files@@ Coverage Diff @@
## main #152 +/- ##
=======================================
Coverage 89.41% 89.41%
=======================================
Files 9 9
Lines 1549 1549
=======================================
Hits 1385 1385
Misses 164 164 Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
237ebc9
to
21c8b19
Compare
@@ -197,6 +222,8 @@ function Options( | |||
quoted && (isempty(oq) || isempty(cq) || isempty(e)) && throw(ArgumentError("quoted=true requires openquotechar, closequotechar, and escapechar to be specified")) | |||
sent = (sentinel === nothing || sentinel === missing) ? Token[] : map(x -> token(x, "sentinel"), prepare!(sentinel)) | |||
checksentinel = sentinel !== nothing | |||
quoted && ((_match(openquotechar, delim) || _match(closequotechar, delim)) || _match(escapechar, delim)) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe the check should be about having a common prefix rather than equality?
21c8b19
to
d76ad07
Compare
Upon review of Parsers.Options validations ([ref](JuliaData/Parsers.jl#152)), @nickrobinson251, @Drvi, and I agree that this shouldn't be valid. The problem is that there's an ambiguity in the case of a _missing_ field vs. an empty quoted field vs. a quoted field with newlines. For example: ```julia a"b"c 1"2"3 # normal row "2"3 # 1st field is missing, but ambiguous with 1st field as quoted `"2"` "1""2"3 # Also ambiguous because the 2nd quote escapes the 3rd, so value is `1"2` ```
Upon review of Parsers.Options validations ([ref](JuliaData/Parsers.jl#152)), @nickrobinson251, @Drvi, and I agree that this shouldn't be valid. The problem is that there's an ambiguity in the case of a _missing_ field vs. an empty quoted field vs. a quoted field with newlines. For example: ```julia a"b"c 1"2"3 # normal row "2"3 # 1st field is missing, but ambiguous with 1st field as quoted `"2"` "1""2"3 # Also ambiguous because the 2nd quote escapes the 3rd, so value is `1"2` ```
This generalizes the compatibility checks between
escapechar
,openquotechar
,closequotechar
,delim
and the sentinel values. We still fail to correctly validate pairs of regular expressions as those would have to be somehow reduced to their minimal canonical form (I'm sure there is a fancy CS name for this).