Description
Is your feature request related to a problem? Please describe.
For reasons of interoperability etc. it's often useful to know how long a string is, compare string length, collate strings, or index into a string based on its code points.
The built-in JS functionality for this is instead based on UTF-16 code units, which give the same results for many but not all strings.
Meanwhile, it's trivial to split a string into its constituent code points using Array.from
or similar, but doing so generally isn't very efficient compared to more optimized alternatives.
Describe the solution you'd like
Low-level, well optimized, but fully Unicode code point aware functions such as:
- String length
- String length comparison1
- Indexing by code point2
- Two-way conversion between code-point index and code-unit index2
- String comparison by code point (stage-1 proposal)
- Maybe others?
Describe alternatives you've considered
N/A
Footnotes
-
String length comparison vs a limit can often be calculated more efficiently than fully calculating the actual string length and comparing it to the limit. For example, for
limit=1000
,text.length == 2001
definitely exceeds it, whiletext.length == 999
is definitely lower (whereastext.length
in the range1000..2000
needs to be checked more carefully). ↩ -
Not sure how useful these are actually. Indexing is usually best done in code-units for compatibility with other JS in-built functions. However, there could still be interoperability use cases for them? ↩ ↩2