Fix long ASCII text read #617

sugmanue · 2025-09-29T22:22:09Z

There are two bugs in the _finishLongTextAscii method introduced in #519 (via #568) that produces the text to be truncated.

The outPtr is always set to zero (see here) before the read loop. If the out buffer still has room its contents will be overwritten instead of keep adding to it (overwrite or missing chunks.)
If the method exits by fully reading the expected text length, outside the outer loop the length of the last segment is not set, which makes the calling code to drop it when the string is finished (text truncated.)

Notes

This code path is only triggered by long texts, where long means that we cannot fully read its length in the input buffer.
This change includes two tests, one for non-chunked text (the case being fixed here), and, another for chunked text to validate that this issue is not in that path as well.

Fixes #616.

There are two bugs in the `_finishLongTextAscii` method introduced in FasterXML#519 that produces the text to be truncated. 1. The `outPtr` is always set to zero (see [here](https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2636)) before the read loop. If the out buffer still has room its contents will be overwritten instead of keep adding to it (overwrite or missing chunks.) 2. If the method exits by fully reading the expected text length, outside the [outer loop]:(https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2657) the length of the last segment is not set, which makes the calling code to drop it when the string is finished (text truncated.) Fixes FasterXML#616.

cowtowncoder · 2025-09-29T22:34:28Z

Whoa! Thank you very much for reporting #616 and providing this fix. I'll need to read it with thought.
I think we have CLA for you (as per #568) so that's good.

But due to nature of the bug, I think we'd want fix all the way to 2.18 branch (that's the intended LTS release. I could try cherry-picking, or, if it's easy enough for you, re-creating PR with target as 2.18 would be great.

One possibly gnarly change there is JUnit 4 -> 5 conversion (see #550), done for 2.19.

So alternatively could consider merging full PR in 2.19, and only backporting fix, not tests (not ideal but... would do).

sugmanue · 2025-09-29T22:37:21Z

Whoa! Thank you very much for reporting #616 and providing this fix. I'll need to read it with thought. I think we have CLA for you (as per #568) so that's good.

I introduced it in the first place, somehow I didn't fully test it. I added Jacoco locally and verified that all the code introduced in the previous PR is covered. Apologies for my sloppiness.

Unrelated to this change, with Jacoco, I see some code paths related to reading numbers that are not covered. I wonder if there's any reason not add Jacoco? If there's none, I can send a PR for that.

sugmanue · 2025-09-29T22:42:38Z

But due to nature of the bug, I think we'd want fix all the way to 2.18 branch (that's the intended LTS release. I could try cherry-picking, or, if it's easy enough for you, re-creating PR with target as 2.18 would be great.

As far as I understand this change is only present on 2.19 onwards. I double checked the code present on 2.18 and this code path is not there. So, it's not affected by this particular issue.

cowtowncoder · 2025-09-30T00:22:26Z

@sugmanue I should have checked before I wrote above: yes, this was changed in 2.19.0 so fix need not (and cannot) go in 2.18 anyway. But I think it'd be good to merge it in 2.19 just in case -- in case we'll release 2.19.3.

cowtowncoder · 2025-09-30T00:23:27Z

@sugmanue np, these things happen. I did not review code well enough either. Glad it got caught now at least.

cowtowncoder · 2025-09-30T00:34:00Z

cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java

            int inPtr = _inputPtr;
            int i = 0;
            // Tight loop to copy into the output buffer, bail if a non-ascii char is found
            while (outPtr < outEnd && i >= 0) {


(just noting for posterity, not suggesting change within this fix)

Check for i >= 0 seems sub-optimally placed, before actually access and output of byte itself, leading to need to "undo" copy -- instead of changing control flow where problem encountered.

The idea was to remove as many branches from the loop as possible. The cost is the need to undo, but at least that branch is outside the hot code-path. I didn't do performance testing to validate the idea, but I will do some and post back the results.

yeah measuring is good -- actual performance is not always obvious. So in this case there's just one extra comparison (i checked before first copy) but that's only once per segment/run, probably insignificant over non-trivial data.

Benchmarks with the code as is (using the this benchmarks).

Benchmark (flavor) (size) Mode Cnt Score Error Units MyBenchmark.cbor ASCII_PRINTABLE XX_LARGE avgt 5 24318.121 ± 329.952 ns/op

And with this patch applied.

Benchmark (flavor) (size) Mode Cnt Score Error Units MyBenchmark.cbor ASCII_PRINTABLE XX_LARGE avgt 5 26899.778 ± 1239.516 ns/op

Looks like the current version is slightly faster, but not by much. This input has 4 fields of about 16Kb.

Ok. I'll take that. :)

Thank you for humoring me.

cowtowncoder

LGTM, will merge, backport

There are two bugs in the `_finishLongTextAscii` method introduced in #519 that produces the text to be truncated. 1. The `outPtr` is always set to zero (see [here](https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2636)) before the read loop. If the out buffer still has room its contents will be overwritten instead of keep adding to it (overwrite or missing chunks.) 2. If the method exits by fully reading the expected text length, outside the [outer loop]:(https://github.com/FasterXML/jackson-dataformats-binary/blob/b20075ff0c029d659cb24adc6c65d2be748a8753/cbor/src/main/java/com/fasterxml/jackson/dataformat/cbor/CBORParser.java#L2657) the length of the last segment is not set, which makes the calling code to drop it when the string is finished (text truncated.) Fixes #616.

cowtowncoder · 2025-09-30T00:57:31Z

Merged, backported in 2.19(.3), 2.20(.1).

sugmanue mentioned this pull request Sep 29, 2025

CBOR text gets truncated on decoding #616

Closed

Merge branch '2.x' into fix-finish-long-ascii-text

98c9e4c

cowtowncoder reviewed Sep 30, 2025

View reviewed changes

cowtowncoder approved these changes Sep 30, 2025

View reviewed changes

cowtowncoder merged commit d57fa46 into FasterXML:2.x Sep 30, 2025
4 checks passed

Uh oh!

Fix long ASCII text read #617

Fix long ASCII text read #617

Conversation

sugmanue commented Sep 29, 2025 • edited by cowtowncoder Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Notes

Uh oh!

cowtowncoder commented Sep 29, 2025

Uh oh!

sugmanue commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sugmanue commented Sep 29, 2025

Uh oh!

cowtowncoder commented Sep 30, 2025

Uh oh!

cowtowncoder commented Sep 30, 2025

Uh oh!

cowtowncoder Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

sugmanue Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

cowtowncoder Sep 30, 2025

Choose a reason for hiding this comment

Uh oh!

sugmanue Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

cowtowncoder Oct 1, 2025

Choose a reason for hiding this comment

Uh oh!

cowtowncoder left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cowtowncoder commented Sep 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sugmanue commented Sep 29, 2025 •

edited by cowtowncoder

Loading

sugmanue commented Sep 29, 2025 •

edited

Loading