Skip to content

Commit

Permalink
utf8.h: Document some #defines
Browse files Browse the repository at this point in the history
The reorganization in the previous commit revealed some undocumented
public macros
  • Loading branch information
khwilliamson committed Jul 31, 2021
1 parent 9c02e48 commit 193479a
Showing 1 changed file with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions utf8.h
Expand Up @@ -855,6 +855,11 @@ case any call to string overloading updates the internal UTF-8 encoding flag.
#define UNICODE_SURROGATE_LAST 0xDFFF

/*
=for apidoc Am|bool|UNICODE_IS_SURROGATE|const UV uv
Returns a boolean as to whether or not C<uv> is one of the Unicode surrogate
code points
=for apidoc Am|bool|UTF8_IS_SURROGATE|const U8 *s|const U8 *e
Evaluates to non-zero if the first few bytes of the string starting at C<s> and
Expand All @@ -877,6 +882,19 @@ point's representation.
Evaluates to 0xFFFD, the code point of the Unicode REPLACEMENT CHARACTER
=for apidoc Am|bool|UNICODE_IS_REPLACEMENT|const UV uv
Returns a boolean as to whether or not C<uv> is the Unicode REPLACEMENT
CHARACTER
=for apidoc Am|bool|UTF8_IS_REPLACEMENT|const U8 *s|const U8 *e
Evaluates to non-zero if the first few bytes of the string starting at C<s> and
looking no further than S<C<e - 1>> are well-formed UTF-8 that represents the
Unicode REPLACEMENT CHARACTER; otherwise it evaluates to 0. If non-zero, the
value gives how many bytes starting at C<s> comprise the code point's
representation.
=cut
*/
#define UNICODE_REPLACEMENT 0xFFFD
Expand All @@ -887,6 +905,16 @@ Evaluates to 0xFFFD, the code point of the Unicode REPLACEMENT CHARACTER
* let's be conservative and do as Unicode says. */
#define PERL_UNICODE_MAX 0x10FFFF

/*
=for apidoc Am|bool|UNICODE_IS_SUPER|const UV uv
Returns a boolean as to whether or not C<uv> is above the maximum legal Unicode
code point of U+10FFFF.
=cut
*/

#define UNICODE_IS_SUPER(uv) UNLIKELY((UV) (uv) > PERL_UNICODE_MAX)

/*
Expand Down Expand Up @@ -933,6 +961,15 @@ fit in an IV on the current machine.
? is_utf8_char_helper(s, s + UTF8SKIP(s), 0) : 0)
#endif

/*
=for apidoc Am|bool|UNICODE_IS_NONCHAR|const UV uv
Returns a boolean as to whether or not C<uv> is one of the Unicode
non-character code points
=cut
*/

/* Is 'uv' one of the 32 contiguous-range noncharacters? */
#define UNICODE_IS_32_CONTIGUOUS_NONCHARS(uv) \
UNLIKELY(inRANGE(uv, 0xFDD0, 0xFDEF))
Expand Down

0 comments on commit 193479a

Please sign in to comment.