Skip to content

Character count component counts code points, not characters #1104

@36degrees

Description

@36degrees

The character count currently uses string.length to establish the length of the user input. string.length counts code units, not characters, and this can lead to some confusing results when using certain strings.

A single emoji (👩🏻‍🚀) counted as 7 characters within the character count component

You can see this by trying the following strings into the character component:

String Result
cat😹 The emoji is counted as 2 code units, and the length is reported as 5 characters.
cafȩ́ Each combining mark is counted separately, and the reported length is 6 characters.
👩🏻‍🚀 Because this emoji includes both gender and skin modifiers and a zero-width joiner, this single character is counted as 7 characters.

We should probably find a less naive way to count characters in strings, but we also need to work out how this will work with any backend validation or data storage on a service, which may already be using a different definition of a 'character' (for example, where the backend or storage treats one character as one byte).

Further reading:

Metadata

Metadata

Assignees

No one assigned

    Labels

    character countjavascript🐛 bugSomething isn't working the way it should (including incorrect wording in documentation)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions