-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JOHNZON-209 Fix JsonObject#toString() to escape key names. #40
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR!
I wonder if the escaping can be less costly, wdyt?
Btw a love the unit test you added, we can add unicode related one too but this is very good to show the issue and prevent any regression, thanks for doing that in this PR!
@@ -147,7 +147,7 @@ public String toString() { | |||
while (hasNext) { | |||
final Map.Entry<String, JsonValue> entry = it.next(); | |||
|
|||
builder.append('"').append(entry.getKey()).append("\":"); | |||
builder.append('"').append(Strings.escape(entry.getKey())).append("\":"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on a global buffer cache, we likely want to use the builder - object builder or parser factory - buffer factory and not the global one which is there for compatibility IIRC.
It also slows down default case - not escaped - which is a concern because toString is a serialization solution for jsonobject for jsonb and some other cases.
From memory, caching was not efficient until you are lock free and hash free.
Alternatively we can make Strings with a method taking a builder and without the buffer logic we could call if the key contains characters to convert in unicode or escapable, in this last case the check should be fast and conversion can surely be done without buffer since it is very unlikely it needs to be done. Needs a small benchmark but it sounds the simplest and fastest to me assuming escaping is rare (from my experience it is for keys). Also iterating over the char[] instead of String with chartAt will be faster.
Wdyt? Can you try to make this escape call a bit less impacting?
Hello @rmannibucau The result are: Johnzon 1.1.11 (current stable)
Johnzon 1.1.12-SNAPSHOT (including the commit 542765d)
It is slower but not so bad now. Please note that I deleted the code escaping some range of code points: '\u0080'-'\u00a0' and '\u2000'-'\u2100'. Is this handling really required? |
Hi @leadpony, it is required AFAIK. Did yiu try to move from String+charAt to toCharArray+[] fir the iteration? Also preallocating the stringbuilder with the incoming string length can help. Then we would need to be able to bypass the escaping when relevant but this requires a cache and interning string or caching based on a hashcode will be likely costly as well and will require eviction. |
Hello @rmannibucau,
It is slower than the previous measurement. This is because |
JSON Processing API is based on RFC 7159 and the document states:
Therefore, we do not need to escape the characters between U+0080-U+00a0 and U+2000-U+2100. The Reference Implementation of JSON-P also does not escape these characters. |
Oki so maybe split the escape method to keep escaping where it is today and drop it from toString and we should be in the 5%, right? Btw thanks for testing the toCharArray path, string have this particularity to be optimized a lot by the jvm so classical optimizations sometime fail. I appreciate you tested. |
Hi @rmannibucau, I do not feel like optimizing the current code further because
RI is not highly optimized but fast and correct. |
Hmm, did you check where the diff comes from, cached escaped value maybe? Recall toString was not originally intended to be fast but more a debug thing but it is now a runtime API so we must enhance it IMHO. I will try to have a look or at least help next week if you cant, Id like it fixed for next release. |
Thank you @rmannibucau |
@leadpony applied this one - was particularly interested in tests ;). Will check toString now. Thanks a lot for the work! |
This PR will fix JOHNZON-209 with additional test cases.