Detect comments #42

capitalaslash · 2015-11-27T15:49:23Z

I know that comments are not defined in json standard, but many parsers do support them, and I find them useful.
the patch is activated by a preprocessor macro and the original behavior can be restored by commenting its definition.

new routine that consumes whitespaces and comments. activated by JSON11_COMMENTS pre-processor flag.

smarx · 2015-11-27T15:49:24Z

Automated message from Dropbox CLA bot

@capitalaslash, thanks for the pull request! It looks like you haven't yet signed the Dropbox CLA. Please sign it here and update the thread so we can consider merging your code.

capitalaslash · 2015-11-27T18:43:16Z

CLA signed

artwyman · 2015-11-28T05:27:01Z

A few design-level points for discussion:

I like that this behavior is optional (so nobody is surprised) but I wonder if it could be a run-time rather than a compile-time option. It only affects Json::parse, so it would be pretty easy to add an extra argument for configuration, flags, etc.
I don't have experience with other parsers to know if there's a general agreement on comment style here. The style you chose matches C/C++, Java, and Javascript. I'd likely have picked '#' as a comment character to match Python, Perl, YAML, etc. but that's just me. What other parsers have you compared to to determine this is the most likely expected behavior.

I'll make a few inline comments on individual lines too. I wonder what @j4cbo might think of this too.

artwyman · 2015-11-28T05:29:06Z

json11.cpp

+     *
+     * Advance until the current character is non-whitespace and non-comment.
+     */
+    void consume_garbage() {


I feel like a loop would be a safer approach than recursion for consuming more comments/whitespace, since that would avoid the risk that your stack frame grows (and could overflow) when parsing many comments. I feel like this function is the right place to put a loop. If consume_whitespace() and consume_comment() returned a bool to indicate whether they consumed anything (or you just looked at the value of i) it would be easy to decide when to end the loop.

artwyman · 2015-11-28T05:32:17Z

Oh, also, welcome, and thanks for your contribution! Forgot to say that in my diving straight into business. :)

capitalaslash · 2015-11-30T10:37:26Z

cumulative reply to your points:

I went for compile-time activation so that it would not affect performances at all when not needed. no problem on my side to have it run-time activated.
as json comes from javascript it feels natural to have javascript-style comments. even the python json module parses c-style comments.
I wll go for the loop instead of recursion, patch coming.
more tests coming.

test also for nested and mixed comments. whitespaces/newlines are already intermixed between comments.

add 3 testes for: - unended multi-line comment, - malformed single-line comment, - trailing slash

capitalaslash · 2015-11-30T11:46:14Z

I went for a bool argument with the default set to false, that is quite opaque.
We can go for a properly named bit-flag if you think it would be better.

artwyman · 2015-12-01T03:10:11Z

json11.cpp

+      bool comment_found = false;
+      if (str[i] == '/') {
+        i++;
+        if (str[i] == '/') { // inline comment


I think you need a size check here, and indeed after every increment of i. Otherwise if the string ends with '/' you'd crash. That would be a good negative case to add to your utests.

(Update, this also might not crash due to the null terminator, but I think that's undefined behavior, so should be avoided.)

artwyman · 2015-12-01T03:19:33Z

I added a few more inline comments, primarily about some issues with size checks which I didn't see on my first look.

I agree with your concern about the opacity of a boolean argument. One approach I've used to avoid that in the past is an enum. You can declare an enum with values like JsonParse::STANDARD and JsonParse::COMMENTS, and pass one or the other instead of the boolean, to make the intent explicit. That approach also can eventually grow to allow enum values to be combined like bitfields, but I wouldn't worry too much about supporting that preemptively.

capitalaslash · 2015-12-01T10:10:12Z

should have addressed all your concerns.
the enum works great for this.

artwyman · 2015-12-01T23:41:00Z

json11.cpp

@@ -338,6 +338,7 @@ struct JsonParser {
    size_t i;
    string &err;
    bool failed;
+    JsonParse strategy;


nit: Could be const.

artwyman · 2015-12-01T23:44:06Z

json11.cpp

+          if (i == str.size())
+            return fail("unexpected end of input inside multi-line comment", 0);
+           // advance until closing tokens
+          while (!(str[i] == '*' && str[i+1] == '/')) {


str[i+1] here is still an unchecked value. Should check against size-1 above, and below, since it takes at least 2 characters to terminate a multi-line comment.

artwyman · 2015-12-01T23:45:06Z

Looks good. Just one off-by-one check to point out.

capitalaslash · 2015-12-02T09:10:23Z

ok, fixed.

artwyman · 2015-12-03T22:25:59Z

Thanks! Merging.

Detect and ignore comments

capitalaslash added 3 commits November 27, 2015 16:31

add routine to detect c-style comments

2d1d176

introduce consume_garbage()

08c391f

new routine that consumes whitespaces and comments. activated by JSON11_COMMENTS pre-processor flag.

add testing for comment functionality

de098c4

artwyman reviewed Nov 28, 2015
View reviewed changes

artwyman mentioned this pull request Nov 28, 2015

It would be nice if the parser could discard // and /* */ comments #43

Closed

capitalaslash added 5 commits November 30, 2015 12:27

add bool to detect comments as run-time option.

882feb5

detect multiple comments with a loop instead of using recursion

b05e655

detect malformed comments

2f5c642

improve comment test.

d292fce

test also for nested and mixed comments. whitespaces/newlines are already intermixed between comments.

add malformed comment tests.

f21b8c3

add 3 testes for: - unended multi-line comment, - malformed single-line comment, - trailing slash

artwyman reviewed Dec 1, 2015
View reviewed changes

capitalaslash added 6 commits December 1, 2015 10:59

check for end of input on every increment of the cursor

4b0f5cf

improve testing for bad inline comments

982b2d8

fix test where the trailing / was not reached due to a previous error

aa270ad

add test for inline comment without trailing newline

c6c6fcf

add test for unfinished multi-line comment

f9833b1

use an enum to select strategy on comment parsing

49a6197

artwyman reviewed Dec 1, 2015
View reviewed changes

capitalaslash added 2 commits December 2, 2015 09:57

make JsonParser::strategy const

988a8fc

watch out for i+1 to overflow the buffer

ebc3a6b

artwyman added a commit that referenced this pull request Dec 3, 2015

Merge pull request #42 from capitalaslash/detect_comments

a6a661e

Detect and ignore comments

artwyman merged commit a6a661e into dropbox:master Dec 3, 2015

capitalaslash deleted the detect_comments branch December 4, 2015 02:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect comments #42

Detect comments #42

capitalaslash commented Nov 27, 2015

smarx commented Nov 27, 2015

capitalaslash commented Nov 27, 2015

artwyman commented Nov 28, 2015

artwyman Nov 28, 2015

artwyman commented Nov 28, 2015

capitalaslash commented Nov 30, 2015

capitalaslash commented Nov 30, 2015

artwyman Dec 1, 2015

artwyman commented Dec 1, 2015

capitalaslash commented Dec 1, 2015

artwyman Dec 1, 2015

artwyman Dec 1, 2015

artwyman commented Dec 1, 2015

capitalaslash commented Dec 2, 2015

artwyman commented Dec 3, 2015

Detect comments #42

Detect comments #42

Conversation

capitalaslash commented Nov 27, 2015

smarx commented Nov 27, 2015

capitalaslash commented Nov 27, 2015

artwyman commented Nov 28, 2015

artwyman Nov 28, 2015

Choose a reason for hiding this comment

artwyman commented Nov 28, 2015

capitalaslash commented Nov 30, 2015

capitalaslash commented Nov 30, 2015

artwyman Dec 1, 2015

Choose a reason for hiding this comment

artwyman commented Dec 1, 2015

capitalaslash commented Dec 1, 2015

artwyman Dec 1, 2015

Choose a reason for hiding this comment

artwyman Dec 1, 2015

Choose a reason for hiding this comment

artwyman commented Dec 1, 2015

capitalaslash commented Dec 2, 2015

artwyman commented Dec 3, 2015