Skip to content
This repository has been archived by the owner on Mar 26, 2020. It is now read-only.

Detect comments #42

Merged
merged 16 commits into from
Dec 3, 2015
Merged

Detect comments #42

merged 16 commits into from
Dec 3, 2015

Conversation

capitalaslash
Copy link
Contributor

I know that comments are not defined in json standard, but many parsers do support them, and I find them useful.
the patch is activated by a preprocessor macro and the original behavior can be restored by commenting its definition.

new routine that consumes whitespaces and comments.
activated by JSON11_COMMENTS pre-processor flag.
@smarx
Copy link

smarx commented Nov 27, 2015

Automated message from Dropbox CLA bot

@capitalaslash, thanks for the pull request! It looks like you haven't yet signed the Dropbox CLA. Please sign it here and update the thread so we can consider merging your code.

@capitalaslash
Copy link
Contributor Author

CLA signed

@artwyman
Copy link
Contributor

A few design-level points for discussion:

  1. I like that this behavior is optional (so nobody is surprised) but I wonder if it could be a run-time rather than a compile-time option. It only affects Json::parse, so it would be pretty easy to add an extra argument for configuration, flags, etc.
  2. I don't have experience with other parsers to know if there's a general agreement on comment style here. The style you chose matches C/C++, Java, and Javascript. I'd likely have picked '#' as a comment character to match Python, Perl, YAML, etc. but that's just me. What other parsers have you compared to to determine this is the most likely expected behavior.

I'll make a few inline comments on individual lines too. I wonder what @j4cbo might think of this too.

*
* Advance until the current character is non-whitespace and non-comment.
*/
void consume_garbage() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like a loop would be a safer approach than recursion for consuming more comments/whitespace, since that would avoid the risk that your stack frame grows (and could overflow) when parsing many comments. I feel like this function is the right place to put a loop. If consume_whitespace() and consume_comment() returned a bool to indicate whether they consumed anything (or you just looked at the value of i) it would be easy to decide when to end the loop.

@artwyman
Copy link
Contributor

Oh, also, welcome, and thanks for your contribution! Forgot to say that in my diving straight into business. :)

@capitalaslash
Copy link
Contributor Author

cumulative reply to your points:

  • I went for compile-time activation so that it would not affect performances at all when not needed. no problem on my side to have it run-time activated.
  • as json comes from javascript it feels natural to have javascript-style comments. even the python json module parses c-style comments.
  • I wll go for the loop instead of recursion, patch coming.
  • more tests coming.

test also for nested and mixed comments.
whitespaces/newlines are already intermixed between comments.
add 3 testes for:
- unended multi-line comment,
- malformed single-line comment,
- trailing slash
@capitalaslash
Copy link
Contributor Author

I went for a bool argument with the default set to false, that is quite opaque.
We can go for a properly named bit-flag if you think it would be better.

bool comment_found = false;
if (str[i] == '/') {
i++;
if (str[i] == '/') { // inline comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need a size check here, and indeed after every increment of i. Otherwise if the string ends with '/' you'd crash. That would be a good negative case to add to your utests.

(Update, this also might not crash due to the null terminator, but I think that's undefined behavior, so should be avoided.)

@artwyman
Copy link
Contributor

artwyman commented Dec 1, 2015

I added a few more inline comments, primarily about some issues with size checks which I didn't see on my first look.

I agree with your concern about the opacity of a boolean argument. One approach I've used to avoid that in the past is an enum. You can declare an enum with values like JsonParse::STANDARD and JsonParse::COMMENTS, and pass one or the other instead of the boolean, to make the intent explicit. That approach also can eventually grow to allow enum values to be combined like bitfields, but I wouldn't worry too much about supporting that preemptively.

@capitalaslash
Copy link
Contributor Author

should have addressed all your concerns.
the enum works great for this.

@@ -338,6 +338,7 @@ struct JsonParser {
size_t i;
string &err;
bool failed;
JsonParse strategy;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could be const.

if (i == str.size())
return fail("unexpected end of input inside multi-line comment", 0);
// advance until closing tokens
while (!(str[i] == '*' && str[i+1] == '/')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str[i+1] here is still an unchecked value. Should check against size-1 above, and below, since it takes at least 2 characters to terminate a multi-line comment.

@artwyman
Copy link
Contributor

artwyman commented Dec 1, 2015

Looks good. Just one off-by-one check to point out.

@capitalaslash
Copy link
Contributor Author

ok, fixed.

@artwyman
Copy link
Contributor

artwyman commented Dec 3, 2015

Thanks! Merging.

artwyman added a commit that referenced this pull request Dec 3, 2015
@artwyman artwyman merged commit a6a661e into dropbox:master Dec 3, 2015
@capitalaslash capitalaslash deleted the detect_comments branch December 4, 2015 02:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants