Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON fixes #5511

Merged
merged 9 commits into from Jul 3, 2017
Merged

JSON fixes #5511

merged 9 commits into from Jul 3, 2017

Conversation

CyberShadow
Copy link
Member

Please see the individual commits for details.

@dlang-bot
Copy link
Contributor

dlang-bot commented Jun 26, 2017

Thanks for your pull request, @CyberShadow!

Bugzilla references

Fix Bugzilla Description
5904 std.json parseString doesn't handle chars outside the BMP
17553 std.json should not do UTF decoding when encoding JSON
17555 [REG2.070.0] Control characters in JSON data are invalid and should cause an exception
17556 std.json encodes non-BMP characters incorrectly with JSONOptions.escapeNonAsciiChars
17557 std.json should not do UTF decoding when parsing

@CyberShadow
Copy link
Member Author

@wilzbach Is CircleCI broken on stable?

@wilzbach
Copy link
Member

Is CircleCI broken on stable?

Looks like it:

make: *** No rule to make target `../generated/linux/release/64/VERSION', needed by `../generated/linux/release/64/dmd'.  Stop.
make: *** Waiting for unfinished jobs....

Copy link
Contributor

@aG0aep6G aG0aep6G left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit (fix for issue 17557) looks bad. Rest looks good.

std/json.d Outdated
{
import std.exception : assertThrown;

assertThrown(parseJSON("\"a\nb\""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could specify the type of the expected exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std/json.d Outdated
@@ -707,22 +707,36 @@ if (isInputRange!T && !isInfinite!T && isSomeChar!(ElementEncodingType!T))
JSONValue root;
root.type_tag = JSON_TYPE.NULL;

// UTF decoding is unnecessary when parsing JSON.
static if (is(T : const(char)[]))
alias Char = char;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks stuff:

parseJSON("\"\U0001D11E\"");
    /* std.json.JSONException@std/json.d(1374): Illegal control character. (Line 1:3) */

The thing is that the rest of the code assumes dchar. While char converts to dchar, it doesn't always keep the same meaning. So stuff gets misinterpreted.

In this specific case, isControl is given a char that becomes a control character when converted to dchar. So it rings the alarm even though the encoded code point in the string is not a control character. Could also go the other way: When a control character is encoded with chars that don't look like control characters, it slips through (haven't checked if this can actually happen).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed.

@CyberShadow
Copy link
Member Author

Could also go the other way: When a control character is encoded with chars that don't look like control characters, it slips through (haven't checked if this can actually happen).

Hmm, it's one thing to force decoding a D string just to avoid putting Unicode control characters in the JSON text (as that would result in non-conforming JSON)... but to force decoding JSON just so we detect and throw on Unicode control characters?

@CyberShadow
Copy link
Member Author

Done.

@CyberShadow
Copy link
Member Author

Done.

Said done thing (posted it on the wrong PR):


OK, well, I checked the latest JSON RFC (7159) and it explicitly states that we don't care about the Unicode control characters:

and the control characters (U+0000 through U+001F).

so, I'm going to amend this and look into disabling auto-decoding for encoding JSON as well.

Copy link
Contributor

@aG0aep6G aG0aep6G left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isControl thing is fixed. As far as I see, there are no other such issues. So, LGTM.

But it's rather fragile. Future code might wrongly assume that it's always dealing with dchars.

std/json.d Outdated
@@ -1160,7 +1161,7 @@ string toJSON(const ref JSONValue root, in bool pretty = false, in JSONOptions o
case '\t': json.put("\\t"); break;
default:
{
import std.uni : isControl;
import std.ascii : isControl;
import std.utf : encode;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to assert here that escapeNonAsciiChars is not set together with Char = char. If that could happen, we'd output a \u sequence for each char in a multibyte sequence (which would be completely wrong).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Also discovered and fixed another issue (see last commit).

@wilzbach
Copy link
Member

Hmm, then why is #5511 failing?

Bad karma? No it's due to dlang/dmd#6935

Fix: dlang/dmd#6941

@MartinNowak MartinNowak merged commit 8f85aaa into dlang:stable Jul 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants