Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringRenderer xml-encode leads to invalid XML #260

Closed
tkalmar opened this issue Aug 11, 2020 · 3 comments 路 Fixed by #261
Closed

StringRenderer xml-encode leads to invalid XML #260

tkalmar opened this issue Aug 11, 2020 · 3 comments 路 Fixed by #261
Labels

Comments

@tkalmar
Copy link

tkalmar commented Aug 11, 2020

StringRenderer xml-encode leads to invalid xml

for example the String 馃┏ leads to the XML-Entity: �� which is according to XML-Spec (https://www.w3.org/TR/REC-xml/#NT-Char) not in the range of allowed chars for XML (Note the ranges differ between XML 1.0 and XML 1.1)
This should either be:

  • fixed (don't know what would be the right way?) perhaps raising an error or rejecting the input? Filtering offending chars would also be an option
  • documented

Im not shure if for an UTF-8 encoded XML document the encoding of <, >," and ' would be sufficent and all other characters should be passed through

@parrt
Copy link
Member

parrt commented Aug 11, 2020

Looks like it might be not hex-encoding the values.

@parrt parrt added the type:bug label Aug 11, 2020
@tkalmar
Copy link
Author

tkalmar commented Aug 11, 2020

No it is hex encoding the value, but the 馃┏ is in its encoded form (&#55358;&#56947;) not valid for XML (see the linked spec) I'm not sure if it is legal in unencoded form for UTF-8 encoded xml files

@Clashsoft
Copy link
Contributor

The emoji is encoded in the Java String as two chars due to the code point being over the two byte limit, hence the strange encoded form. It's called a surrogate pair. The fix would be to loop over the code points instead of the characters of the string. I'll make a PR in a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants