Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly handle surrogate pairs in StringRenderer.encodeHTML #261

Merged
merged 3 commits into from Aug 14, 2020

Conversation

Clashsoft
Copy link
Contributor

@Clashsoft Clashsoft commented Aug 14, 2020

Changed the StringRenderer.encodeHTML method, i.e. the implementation for format="xml-encode", in order to support Unicode characters encoded as two chars (surrogate pairs). An example where this problem occurred was with emojis, as outlined in #260. While the old implementation produced two invalid HTML entities �� for the two characters encoding the emoji "馃┏", after this change it only generates one entity, namely 🩳 (ref.: https://unicode-table.com/de/1FA73/)

Closes #260

@parrt parrt merged commit 82642ab into antlr:master Aug 14, 2020
@parrt
Copy link
Member

parrt commented Aug 14, 2020

Thanks, @Clashsoft !!

@Clashsoft Clashsoft deleted the stringrenderer-xmlencode-emojis branch August 14, 2020 17:57
@parrt parrt added this to the 4.3.2 milestone Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StringRenderer xml-encode leads to invalid XML
2 participants