Skip to content

cygx/p6-unicode-gcb

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
t
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Unicode::GCB Build Status

Unicode grapheme cluster boundary detection

Synopsis

    use Unicode::GCB;

    say GCB.always(0x600, 0x30);
    say GCB.maybe(
        "\c[REGIONAL INDICATOR SYMBOL LETTER G]".ord,
        "\c[REGIONAL INDICATOR SYMBOL LETTER B]".ord);

    say GCB.clusters("äöü".NFD);

Description

Implements the Unicode 9.0 grapheme cluster boundary rules or Unicode 11.0 grapheme cluster boundary rules depending on the Rakudo version in use.

In contrast to earlier versions of the standard, it is no longer possible to unambiguously decide if there's a cluster break between two Unicode characters by looking at just these two characters.

In particular, there's a break between a pair of regional indicator symbols only if the first symbol has already been paired up with another indicator and there's no break between extension characters and emoji modifiers if the current cluster forms an emoji sequence. [FIXME: Unicode 11.0 rules]

Therefore, the module provides two different methods GCB.always() and GCB.maybe() which both expect two Unicode codepoints as arguments.

The method GCB.clusters() expects a Uni object as argument and returns a sequence of such objects split along cluster boundaries.

Bugs and Development

Development happens at GitHub. If you found a bug or have a feature request, use the issue tracker over there.

Copyright and License

Copyright (C) 2016 by cygx@cpan.org

Distributed under the Boost Software License, Version 1.0

About

Unicode Grapheme Cluster Boundary detection

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages