Skip to content

Low-level Unicode code-point-based functions #6627

Open
@lionel-rowe

Description

@lionel-rowe

Is your feature request related to a problem? Please describe.

For reasons of interoperability etc. it's often useful to know how long a string is, compare string length, collate strings, or index into a string based on its code points.

The built-in JS functionality for this is instead based on UTF-16 code units, which give the same results for many but not all strings.

Meanwhile, it's trivial to split a string into its constituent code points using Array.from or similar, but doing so generally isn't very efficient compared to more optimized alternatives.

Describe the solution you'd like

Low-level, well optimized, but fully Unicode code point aware functions such as:

  • String length
  • String length comparison1
  • Indexing by code point2
  • Two-way conversion between code-point index and code-unit index2
  • String comparison by code point (stage-1 proposal)
  • Maybe others?

Describe alternatives you've considered

N/A

Footnotes

  1. String length comparison vs a limit can often be calculated more efficiently than fully calculating the actual string length and comparing it to the limit. For example, for limit=1000, text.length == 2001 definitely exceeds it, while text.length == 999 is definitely lower (whereas text.length in the range 1000..2000 needs to be checked more carefully).

  2. Not sure how useful these are actually. Indexing is usually best done in code-units for compatibility with other JS in-built functions. However, there could still be interoperability use cases for them? 2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions