RTC: Fix syncing of emoji / surrogate pairs#76049
Conversation
|
The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message. To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook. |
alecgeatches
left a comment
There was a problem hiding this comment.
Confirmed that the trunk implementation caused syncing issues with emojis (�), and that this fix addresses them.
| page, | ||
| } ); | ||
| await utils.setCollaboration( true ); | ||
| // Clean up any leftover users from previous runs before creating. |
There was a problem hiding this comment.
I've run into this bug once before, thanks for the fix!
|
Size Change: +32 B (0%) Total Size: 6.87 MB
ℹ️ View Unchanged
|
* Fix syncing of emoji / surrogate pairs * Fix type errors
|
I just cherry-picked this PR to the wp/7.0 branch to get it included in the next release: f23578a |
* Fix syncing of emoji / surrogate pairs * Fix type errors
* Fix syncing of emoji / surrogate pairs * Fix type errors
What?
When two users are collaboratively editing a post via RTC, correctly sync emoji characters and other multi-code-unit Unicode characters.
Closes #76044
Why?
The Quill Delta diff engine in
@wordpress/syncusesdiffCharsfrom thedifflibrary (v8.0.3). Indiffv7+,diffCharsusesIntl.Segmenter(when available) to tokenize by grapheme cluster, so an emoji like 😀 counts as1token. However, theDelta Op.length()andOpIteratoruse JavaScript’s.length, which counts UTF-16 code units — where the same emoji counts as2.This mismatch occurs in
convertChangesToDelta()(packages/sync/src/quill-delta/Delta.ts), wherecomponent.countfromdiffCharsis used to advance theOpIterator. Since the iterator measures in UTF-16 code units but count is in grapheme clusters,.substr()calls inOpIterator.next()split surrogate pairs, producing lone surrogates that render as � or result in data corruption.The same mismatch affects cursor position arithmetic in
diffWithCursor(), wherelastDiffPositionis tracked in grapheme clusters butcursorAfterChangeis in UTF-16 code units.How?
Add a single normalization step after each call to
diffCharsthat replacescount(grapheme clusters) withvalue.length(UTF-16 code units).Testing Instructions
Screenshots or screencast
Before
before.mov
After
after.mov