Skip to content
This repository has been archived by the owner on Jan 29, 2023. It is now read-only.

Invalid parsing of escaped unicode values #2

Open
bmcminn opened this issue Aug 20, 2014 · 4 comments
Open

Invalid parsing of escaped unicode values #2

bmcminn opened this issue Aug 20, 2014 · 4 comments

Comments

@bmcminn
Copy link

bmcminn commented Aug 20, 2014

Currently using this library in a Grunt task, and ran into the following issue:

// JSON file data being linted
{
    "copyright": "\u2117 & \u00a9 2014 {{sitename}}"
}

// BASH error...
Invalid Reverse Solidus '\' declaration.

Just tested the above snippet against jsonlint.com, jsonlint pro, and jsoneditoronline and they all infer the unicode characters and parse as valid JSON data.

This snippet exists in a much deeper part of my JSON data that is compiled via PHP's json_encode function, however the raw escaped unicode values cause this linter to throw the above error.

Escaping the reverse solidus ["\\u2117 & \\u00a9 2014 {{sitename}}"] "fixes" the issue; though it seems inconvenient as most systems escape unicode values in this fashion by default.

@bmcminn
Copy link
Author

bmcminn commented Aug 20, 2014

Further testing shows that in jsonlint.js:7, rvalidsolidus is improperly regexing for the appropriate u[0-9] combination. Changing it as described below remedies the problem, though it doesn't make sense that the explicit length of [0-9]{4} would break like this:

    // ...
    rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9]{4})/, // original version
    rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9]+)/,   // my change

Edit: regex demo showing that the {4} should work...

@bmcminn
Copy link
Author

bmcminn commented Dec 2, 2014

Just figured out why it invalidates and it's because the regex is ONLY listening for numeric unicode values...

Updating jsonlint.js:7 as follows corrects the problem.

    // ...
    rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9]{4})/,     // original version
    rvalidsolidus = /\\("|\\|\/|b|f|n|r|t|u[0-9A-F]{4})/i,  // my change
    // > catches \u1234 AND \u12aE

👉 panuhorsmalahti/gulp-json-lint#1

@bmcminn
Copy link
Author

bmcminn commented Dec 2, 2014

@codenothing I just finished updating the test on my fork and it passes. I had to modify the json-lint dependency for nlint because it uses jsonlint and had the same regex problem I'm trying to fix :P

In reading up on Unicode spec, under Architecture and terminology, it specifies that the Basic Multilingual Plane occupies the range of 0000 - FFFF, and so my changes reflect this standard, because outside of Basic 0, you get into larger byte sets that the regex is not handling.

In any case, this is a pretty involved issue, because I have no idea what your goal was in supporting a particular unicode spec and how robust the validation of that should be? And so you can review my changes made here (https://github.com/bmcminn/jsonlint) and see what you think, though I plan to issue a pull request to resolve this issue.

EDIT: Pull request issued #3

@bmcminn
Copy link
Author

bmcminn commented Mar 23, 2015

Finally have a spec to reference for implementation validation: http://rfc7159.net/rfc7159#unichars

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant