-
-
Notifications
You must be signed in to change notification settings - Fork 145
Fix long ASCII text read #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix long ASCII text read #617
Conversation
There are two bugs in the `_finishLongTextAscii` method introduced in FasterXML#519 that produces the text to be truncated. 1. The `outPtr` is always set to zero (see [here](https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2636)) before the read loop. If the out buffer still has room its contents will be overwritten instead of keep adding to it (overwrite or missing chunks.) 2. If the method exits by fully reading the expected text length, outside the [outer loop]:(https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2657) the length of the last segment is not set, which makes the calling code to drop it when the string is finished (text truncated.) Fixes FasterXML#616.
Whoa! Thank you very much for reporting #616 and providing this fix. I'll need to read it with thought. But due to nature of the bug, I think we'd want fix all the way to One possibly gnarly change there is JUnit 4 -> 5 conversion (see #550), done for 2.19. So alternatively could consider merging full PR in |
I introduced it in the first place, somehow I didn't fully test it. I added Jacoco locally and verified that all the code introduced in the previous PR is covered. Apologies for my sloppiness. Unrelated to this change, with Jacoco, I see some code paths related to reading numbers that are not covered. I wonder if there's any reason not add Jacoco? If there's none, I can send a PR for that. |
As far as I understand this change is only present on 2.19 onwards. I double checked the code present on 2.18 and this code path is not there. So, it's not affected by this particular issue. |
@sugmanue I should have checked before I wrote above: yes, this was changed in 2.19.0 so fix need not (and cannot) go in 2.18 anyway. But I think it'd be good to merge it in |
@sugmanue np, these things happen. I did not review code well enough either. Glad it got caught now at least. |
int inPtr = _inputPtr; | ||
int i = 0; | ||
// Tight loop to copy into the output buffer, bail if a non-ascii char is found | ||
while (outPtr < outEnd && i >= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just noting for posterity, not suggesting change within this fix)
Check for i >= 0
seems sub-optimally placed, before actually access and output of byte itself, leading to need to "undo" copy -- instead of changing control flow where problem encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to remove as many branches from the loop as possible. The cost is the need to undo, but at least that branch is outside the hot code-path. I didn't do performance testing to validate the idea, but I will do some and post back the results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah measuring is good -- actual performance is not always obvious. So in this case there's just one extra comparison (i
checked before first copy) but that's only once per segment/run, probably insignificant over non-trivial data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks with the code as is (using the this benchmarks).
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor ASCII_PRINTABLE XX_LARGE avgt 5 24318.121 ± 329.952 ns/op
And with this patch applied.
Benchmark (flavor) (size) Mode Cnt Score Error Units
MyBenchmark.cbor ASCII_PRINTABLE XX_LARGE avgt 5 26899.778 ± 1239.516 ns/op
Looks like the current version is slightly faster, but not by much. This input has 4 fields of about 16Kb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll take that. :)
Thank you for humoring me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will merge, backport
There are two bugs in the `_finishLongTextAscii` method introduced in #519 that produces the text to be truncated. 1. The `outPtr` is always set to zero (see [here](https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2636)) before the read loop. If the out buffer still has room its contents will be overwritten instead of keep adding to it (overwrite or missing chunks.) 2. If the method exits by fully reading the expected text length, outside the [outer loop]:(https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2657) the length of the last segment is not set, which makes the calling code to drop it when the string is finished (text truncated.) Fixes #616.
Merged, backported in 2.19(.3), 2.20(.1). |
There are two bugs in the
_finishLongTextAscii
method introduced in #519 (via #568) that produces the text to be truncated.The
outPtr
is always set to zero (see here) before the read loop. If the out buffer still has room its contents will be overwritten instead of keep adding to it (overwrite or missing chunks.)If the method exits by fully reading the expected text length, outside the outer loop the length of the last segment is not set, which makes the calling code to drop it when the string is finished (text truncated.)
Notes
Fixes #616.