New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json_spirit: use utf8 intenally when parsing \uHHHH #4527
Conversation
@tserong looks good, but can we have a test case which reproduces the funny test in http://pastebin.com/KzWab33X ? |
@tchaikov, sure, will write a test case and see about the cleanup you suggested above. |
@tchaikov done. Assuming the above is OK, are you happy with it as three commits, or would you prefer I squash it back to one? |
thanks @tserong =) yeah, i'd prefer we have a single commit for this change, could you do that? |
When the python CLI is given non-ASCII characters, it converts them to \uHHHH escapes in JSON. json_spirit parses these internally into 16 bit characters, which could only work if json_spirit were built to use std::wstring, which it isn't; it's using std::string, so the high byte ends up being zero'd, leaving the low byte which is effectively garbage. This hack^H^H^H^H change makes json_spirit convert to utf8 internally instead, which can be stored just fine inside a std::string. Note that this implementation still assumes \uHHHH escapes are four hex digits, so it'll only cope with characters in the Basic Multilingual Plane. Still, that's rather a lot more characters than it could cope with before ;) (For characters outside the BMP, Python seems to generate escapes in the form \uHHHHHHHH, i.e. 8 hex digits, which the current implementation doesn't expect to see) Fixes: ceph#7387 Signed-off-by: Tim Serong <tserong@suse.com>
9aa7ecb
to
8add15b
Compare
No problem, squashed. |
json_spirit: use utf8 intenally when parsing \uHHHH Reviewed-by: Kefu Chai <kchai@redhat.com>
Looks like this busted things up a bit? http://tracker.ceph.com/issues/11574 |
@gregsfortytwo ack. |
This might fix it:
Although I can't say for sure, as I never had that error ("using 'typename' outside of template") in my test builds -- for me it builds fine with or without |
@tserong yes. i just pushed the same patch to wip-11574-fix-FTBFS, seems the build on centos 6.5 is happy, see http://gitbuilder.sepia.ceph.com/gitbuilder-ceph-rpm-rhel6-5-amd64-basic/#origin/wip-11574-fix-FTBFS . |
#4614 is posted to address this FTBFS. |
When the python CLI is given non-ASCII characters, it converts them to
\uHHHH escapes in JSON. json_spirit parses these internally into 16 bit
characters, which could only work if json_spirit were built to use
std::wstring, which it isn't; it's using std::string, so the high byte
ends up being zero'd, leaving the low byte which is effectively garbage.
This hack^H^H^H^H change makes json_spirit convert to utf8 internally
instead, which can be stored just fine inside a std::string.
Note that this implementation still assumes \uHHHH escapes are four hex
digits, so it'll only cope with characters in the Basic Multilingual
Plane. Still, that's rather a lot more characters than it could cope
with before ;)
(For characters outside the BMP, Python seems to generate escapes in the
form \uHHHHHHHH, i.e. 8 hex digits, which the current implementation
doesn't expect to see)
Fixes: #7387
Signed-off-by: Tim Serong tserong@suse.com