Skip to content

New graphemes iterator & ASCII fast paths#10

Merged
clipperhouse merged 5 commits intomainfrom
clipperhouse/graphemes-iterator
Nov 16, 2025
Merged

New graphemes iterator & ASCII fast paths#10
clipperhouse merged 5 commits intomainfrom
clipperhouse/graphemes-iterator

Conversation

@clipperhouse
Copy link
Owner

@clipperhouse clipperhouse commented Nov 16, 2025

Add a new StringGraphemes method for iterating over widths.

New fast paths for ASCII, via lookup arrays. The goal is to:

  1. Avoid parsing graphemes where it is unnecessary, ie single-byte strings
  2. Avoid looking up properties where it is unnecessary, i.e. ASCII where we only need to know whether it’s a control character.

@clipperhouse
Copy link
Owner Author

PR Review Summary

Reviewer: Cursor (Auto - Claude Sonnet 4.5)

Overview

This PR adds a new grapheme iterator API and implements ASCII fast paths for performance optimization. The changes are well-structured, all tests pass, and there are no linter errors. The race detector also passes cleanly.

Strengths

  1. Performance optimizations: The fast paths for single-byte strings using lookup tables (asciiWidths and asciiProperties) should provide significant performance improvements by avoiding grapheme parsing overhead for ASCII text.
  2. New API: The StringGraphemes and BytesGraphemes methods provide a clean iterator interface for users who need to iterate over graphemes and get their widths.
  3. Code organization: Moving lookup tables to tables.go and extracting graphemeWidth helper function improves code organization.
  4. Test coverage: New tests verify that the iterator produces the same results as the existing String/Bytes methods.

Potential Issues & Observations

  1. Bounds check in width method (lines 206-208): The bounds check if p > boundsCheck is correct. Since propertyWidths is [5]int (valid indices 0-4) and boundsCheck = 4, this safely handles any out-of-bounds property values. The check correctly returns 1 (default width) for invalid property values.

  2. Single-byte fast path (lines 36-37, 62-63, 115-116): The fast path for len(s) == 1 using asciiWidths[s[0]] is correct. For single-byte strings, there's no possibility of a variation selector (VS16 is 3 bytes), so the fast path is safe.

  3. lookupProperties optimization (line 154): Using asciiProperties[s[0]] instead of the previous isASCIIControl check is equivalent and more efficient.

  4. Regional indicator check (lines 162-172): The nested condition structure is correct, though it could potentially be simplified. The current structure is clear and correct.

Minor Suggestions

  1. Documentation: Consider adding a brief example in the graphemes.go file showing how to use the iterator (e.g., in a godoc example).
  2. Consistency: The comment on line 195 mentions "may help the compiler eliminate its bounds check" - this is accurate, though the primary benefit is safety.

Conclusion

Ready to merge - The PR looks solid. The fast paths should provide good performance improvements for ASCII-heavy workloads, and the new iterator API is a useful addition. All tests pass, and the code is well-structured.

@clipperhouse clipperhouse marked this pull request as ready for review November 16, 2025 16:11
Copilot AI review requested due to automatic review settings November 16, 2025 16:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

A concise description of the purpose of the PR, followed by summarized bullets of changes:

  • Adds a new Graphemes iterator type for iterating over grapheme clusters with widths
  • Introduces ASCII fast-path optimizations using lookup arrays to avoid unnecessary grapheme parsing and property lookups for single-byte strings
  • Refactors code organization by moving lookup tables to a new tables.go file

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
graphemes.go New file introducing Graphemes iterator type and StringGraphemes/BytesGraphemes functions
tables.go New file containing ASCII lookup tables (asciiWidths and asciiProperties) and property width mappings
width.go Refactored to use ASCII fast paths for single-byte strings, improved code organization and documentation
width_test.go Added comprehensive tests for new iterator functionality

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@clipperhouse clipperhouse merged commit 9df5955 into main Nov 16, 2025
10 checks passed
@clipperhouse clipperhouse deleted the clipperhouse/graphemes-iterator branch November 16, 2025 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant