Unicode Grapheme Cluster Boundary detection
Perl 6 Makefile
Switch branches/tags
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
lib/Unicode bugfix Oct 12, 2016
t infrastructure Oct 12, 2016
.gitignore first commit Oct 11, 2016
.travis.yml refactor Oct 12, 2016
LICENSE first commit Oct 11, 2016
META6.json update license field in META6.json May 12, 2017
Makefile first commit Oct 11, 2016
README.md fix readme Oct 12, 2016


Unicode::GCB Build Status

Unicode grapheme cluster boundary detection


    use Unicode::GCB;

    say GCB.always(0x600, 0x30);
    say GCB.maybe(

    say GCB.clusters("äöü".NFD);


Implements the Unicode 9.0 grapheme cluster boundary rules.

In contrast to earlier versions of the standard, it is no longer possible to unambiguously decide if there's a cluster break between two Unicode characters by looking at just these two characters.

In particular, there's a break between a pair of regional indicator symbols only if the first symbol has already been paired up with another indicator and there's no break between extension characters and emoji modifiers if the current cluster forms an emoji sequence.

Therefore, the module provides two different methods GCB.always() and GCB.maybe() which both expect two Unicode codepoints as arguments.

The method GCB.clusters() expects a Uni object as argument and returns a sequence of such objects split along cluster boundaries.

Bugs and Development

Development happens at GitHub. If you found a bug or have a feature request, use the issue tracker over there.

Copyright and License

Copyright (C) 2016 by cygx@cpan.org

Distributed under the Boost Software License, Version 1.0