New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON fixes #5511
JSON fixes #5511
Conversation
Thanks for your pull request, @CyberShadow! Bugzilla references
|
@wilzbach Is CircleCI broken on stable? |
Looks like it:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last commit (fix for issue 17557) looks bad. Rest looks good.
std/json.d
Outdated
{ | ||
import std.exception : assertThrown; | ||
|
||
assertThrown(parseJSON("\"a\nb\"")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could specify the type of the expected exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
std/json.d
Outdated
@@ -707,22 +707,36 @@ if (isInputRange!T && !isInfinite!T && isSomeChar!(ElementEncodingType!T)) | |||
JSONValue root; | |||
root.type_tag = JSON_TYPE.NULL; | |||
|
|||
// UTF decoding is unnecessary when parsing JSON. | |||
static if (is(T : const(char)[])) | |||
alias Char = char; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks stuff:
parseJSON("\"\U0001D11E\"");
/* std.json.JSONException@std/json.d(1374): Illegal control character. (Line 1:3) */
The thing is that the rest of the code assumes dchar
. While char
converts to dchar
, it doesn't always keep the same meaning. So stuff gets misinterpreted.
In this specific case, isControl
is given a char
that becomes a control character when converted to dchar
. So it rings the alarm even though the encoded code point in the string is not a control character. Could also go the other way: When a control character is encoded with char
s that don't look like control characters, it slips through (haven't checked if this can actually happen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, fixed.
…valid and should cause an exception
Since the previous commit, it was only called from one place.
…h JSONOptions.escapeNonAsciiChars
2e900a9
to
ca788f1
Compare
Hmm, it's one thing to force decoding a D string just to avoid putting Unicode control characters in the JSON text (as that would result in non-conforming JSON)... but to force decoding JSON just so we detect and throw on Unicode control characters? |
ca788f1
to
226f8e0
Compare
Done. |
Said done thing (posted it on the wrong PR): OK, well, I checked the latest JSON RFC (7159) and it explicitly states that we don't care about the Unicode control characters:
so, I'm going to amend this and look into disabling auto-decoding for encoding JSON as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isControl
thing is fixed. As far as I see, there are no other such issues. So, LGTM.
But it's rather fragile. Future code might wrongly assume that it's always dealing with dchar
s.
std/json.d
Outdated
@@ -1160,7 +1161,7 @@ string toJSON(const ref JSONValue root, in bool pretty = false, in JSONOptions o | |||
case '\t': json.put("\\t"); break; | |||
default: | |||
{ | |||
import std.uni : isControl; | |||
import std.ascii : isControl; | |||
import std.utf : encode; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to assert here that escapeNonAsciiChars
is not set together with Char = char
. If that could happen, we'd output a \u sequence for each char
in a multibyte sequence (which would be completely wrong).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also discovered and fixed another issue (see last commit).
68149a3
to
71875c0
Compare
Bad karma? No it's due to dlang/dmd#6935 Fix: dlang/dmd#6941 |
Please see the individual commits for details.