Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 8 additions & 5 deletions pod/perlguts.pod
Original file line number Diff line number Diff line change
Expand Up @@ -3568,15 +3568,18 @@ whether the byte is encoded as a single byte even in UTF-8):
char */
STRLEN len; /* Returned length of character in bytes */

if (!UTF8_IS_INVARIANT(*utf))
if (!UTF8_IS_INVARIANT(*utf)) {
/* Must treat this as UTF-8 */
uv = utf8_to_uvchr_buf(utf, utf_end, &len);
if (! utf8_to_uv(utf, utf_end, &uv, &len)) {
/* handle error */
}
}
else
/* OK to treat this character as a byte */
uv = *utf;

You can also see in that example that we use C<utf8_to_uvchr_buf> to get the
value of the character; the inverse function C<uvchr_to_utf8> is available
You can also see in that example that we use C<utf8_to_uv> to get the
value of the character; the inverse function C<uv_to_utf8> is available
for putting a UV into UTF-8:

if (!UVCHR_IS_INVARIANT(uv))
Expand Down Expand Up @@ -3794,7 +3797,7 @@ the PV to somewhere, pass on the flag too.

=item *

If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value,
If a string is UTF-8, B<always> use C<utf8_to_uv> to get at the value,
unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>.

=item *
Expand Down
18 changes: 9 additions & 9 deletions utf8.c
Original file line number Diff line number Diff line change
Expand Up @@ -123,19 +123,19 @@ const char super_cp_format[] = "Code point 0x%" UVXf " is not Unicode,"
=for apidoc_item uvchr_to_utf8_flags_msgs

These functions are identical. THEY SHOULD BE USED IN ONLY VERY SPECIALIZED
CIRCUMSTANCES.
CIRCUMSTANCES. The C<uv_to_utf8_msgs> spelling is preferred in new code.

Most code should use C<L</uv_to_utf8_flags>()> rather than call this directly.
Most code should use C<L</uv_to_utf8_flags>()> rather than call these directly.

This function is for code that wants any warning and/or error messages to be
These functions are for code that wants any warning and/or error messages to be
returned to the caller rather than be displayed. Any message that would have
been displayed if all lexical warnings are enabled will instead be returned.

It is just like C<L</uvchr_to_utf8_flags>> but it takes an extra parameter
placed after all the others, C<msgs>. If this parameter is 0, this function
behaves identically to C<L</uvchr_to_utf8_flags>>. Otherwise, C<msgs> should
be a pointer to an C<HV *> variable, in which this function creates a new HV to
contain any appropriate message. The hash has three key-value pairs, as
They are just like C<L</uv_to_utf8_flags>> but take an extra parameter
placed after all the others, C<msgs>. If this parameter is 0, the functions
behave identically to C<L</uv_to_utf8_flags>>. Otherwise, C<msgs> should
be a pointer to an C<HV *> variable, in which these functions create a new HV
to contain any appropriate message. The hash has three key-value pairs, as
follows:

=over 4
Expand Down Expand Up @@ -169,7 +169,7 @@ The possibilities are:
=back

It's important to note that specifying this parameter as non-null will cause
any warning this function would otherwise generate to be suppressed, and
any warning the functions would otherwise generate to be suppressed, and
instead be placed in C<*msgs>. The caller can check the lexical warnings state
(or not) when choosing what to do with the returned message.

Expand Down
Loading