Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XALANJ-2725: Fix for when UTF16 surrogate pair crosses buffer boundaries #184

Merged
merged 4 commits into from
Feb 23, 2024

Conversation

jkesselm
Copy link
Contributor

Fixes the specific buffer-crossing issue tested in the associated xalan-test branch.

As discussed in XALANJ-2725, there are still some edge conditions possible here. But it fixes one known bad case, and at least partially guards against another.

My preferred fix would be to have malformed UTF16 input throw exceptions rather than trying to dance around this to output (unusable) Numeric Character References for isolated surrogates, but the code is currently inconsistent about that and seems to suggest that we moved away from that for some reason... and I don't recall why we thought the fake-NCRs were a good idea.

If we stay with fake-NCRs for isolated surrogates, I'm seriously considering changing them to be fake-entity-references, which will at least not be syntactically incorrect; this could be done by replacing the current output, eg �, with something more like &ERR_INVALID_UTF16_SURROGATE_55308; , using the MsgKey string so we at least are in synch with the internationalization layer for clarity.

…lution, and I'm not sure whether any of the other surrogate handling needs similar fixes -- I don't know whether they ever run into the buffer break problem.
@jkesselm jkesselm self-assigned this Feb 22, 2024
@jkesselm jkesselm merged commit 77aa724 into master Feb 23, 2024
2 checks passed
@jkesselm jkesselm deleted the XALANJ-2725 branch February 23, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants