Skip to content

Fix Segment._split_cells() to use grapheme-aware splitting for VS16-widened characters#1

Open
0x7c13 wants to merge 1 commit intomasterfrom
claude/fix-cjk-toast-rendering-bCaCP
Open

Fix Segment._split_cells() to use grapheme-aware splitting for VS16-widened characters#1
0x7c13 wants to merge 1 commit intomasterfrom
claude/fix-cjk-toast-rendering-bCaCP

Conversation

@0x7c13
Copy link
Copy Markdown
Owner

@0x7c13 0x7c13 commented Feb 20, 2026

The previous implementation of _split_cells() used cell_len(text[:pos]) to
measure prefix width when searching for the correct split point. This ignored
VS16 (Variation Selector 16, U+FE0F) context: a character like ♻ (U+267B)
measures as 1 cell in isolation, but as part of the grapheme ♻️ (♻ + VS16)
it occupies 2 cells. Splitting at cell position 1 through such a grapheme
produced a "clean" grapheme-boundary split instead of the correct "replace both
halves with spaces" behaviour, silently losing 1 cell of width.

Fix: replace the manual while-loop with a delegation to _split_text() from
cells.py, which already uses split_graphemes() and handles VS16-widened
grapheme clusters correctly. All standard CJK (single-codepoint, 2-cell)
characters continue to work as before; the fix closes the correctness gap for
multi-codepoint graphemes widened by VS16.

Adds regression tests in test_split_cells_emoji (parametrized cases) and a
new test_split_cells_vs16 function that exhaustively checks every split
position for a repeated VS16-widened grapheme.

https://claude.ai/code/session_01JX9WKJ8RbqvGhmd298HxiD

…idened characters

The previous implementation of `_split_cells()` used `cell_len(text[:pos])` to
measure prefix width when searching for the correct split point. This ignored
VS16 (Variation Selector 16, U+FE0F) context: a character like ♻ (U+267B)
measures as 1 cell in isolation, but as part of the grapheme ♻️ (♻ + VS16)
it occupies 2 cells. Splitting at cell position 1 through such a grapheme
produced a "clean" grapheme-boundary split instead of the correct "replace both
halves with spaces" behaviour, silently losing 1 cell of width.

Fix: replace the manual while-loop with a delegation to `_split_text()` from
`cells.py`, which already uses `split_graphemes()` and handles VS16-widened
grapheme clusters correctly. All standard CJK (single-codepoint, 2-cell)
characters continue to work as before; the fix closes the correctness gap for
multi-codepoint graphemes widened by VS16.

Adds regression tests in `test_split_cells_emoji` (parametrized cases) and a
new `test_split_cells_vs16` function that exhaustively checks every split
position for a repeated VS16-widened grapheme.

https://claude.ai/code/session_01JX9WKJ8RbqvGhmd298HxiD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants