New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes the check for the valid UTF8 symbols #2135
Conversation
|
This looks harmless. I don't immediately understand why this fixes the bug. Is it because char is signed? But the signed-unsigned comparison would have coerced the negative to a large unsigned value. More seriously I should ask why all the logic in XMLWriterx::XMLEsc was not reproduced in XMLUtf8BufferWriter::WriteEscaped. I should have asked that question with the earlier PR. |
This is a signed-signed conversion, unfortunately.
The problem is that it was mostly reproduced, just with the exception of surrogate pairs handling. |
DUH! I see 0x... and assume unsigned, but there wasn't ...u after it. Of course.
|
And we know the string has been converted to utf8 here And that makes the surrogate handling unnecessary? It is ok to leave all the utf8 encodings un-escaped? |
Surrogates are easier in UTF8. All the symbols we wanted to keep away from XML (although I do not really understand why) or escape are in the lower 7 bits. If the most significant bit is set - we definitely have a multiple bytes sequence, which we considered to be "safe" for XML.
This is an easy point for further improvements of the XMLWriter class though. The cases when we really need to convert the value from the wxString (or even to construct it!) are rare. At least the |
Resolves: #2132
Resolves: #2134
(short description of the changes and the motivation to make the changes)
Recommended: