Skip to content
Permalink
Browse files

Improvements to segmentation of Emoji w/ GCB=Other

Not all Emoji Modifiers have Grapheme_Cluster_Break = E_Base or
E_Base_GAZ. In these cases we need to check the Emoji_Modifier_Base
property.

Correctly segments 25 more Emoji than before (Unicode Emoji v4.0).
  • Loading branch information...
samcv committed Jul 7, 2017
1 parent a3e9869 commit 4ff2f1f9185b1b7601ad2d67b1a05c4d2d0e3ab0
Showing with 6 additions and 0 deletions.
  1. +6 −0 src/strings/normalize.c
@@ -555,6 +555,12 @@ static MVMint32 should_break(MVMThreadContext *tc, MVMCodepoint a, MVMCodepoint
case MVM_UNICODE_PVALUE_GCB_ZWJ:
case MVM_UNICODE_PVALUE_GCB_GLUE_AFTER_ZWJ:
return 0;
if (MVM_unicode_codepoint_get_property_int(tc, a, MVM_UNICODE_PROPERTY_EMOJI)) {
/* Not all emoji modifiers have E_BASE or E_BASE_GAZ, some cases we need to check the
* Emoji_Modifier_Base property */
return 0;
}

}
if ( b == UNI_CP_FEMALE_SIGN || b == UNI_CP_MALE_SIGN )
return 0;

0 comments on commit 4ff2f1f

Please sign in to comment.
You can’t perform that action at this time.