Skip to content

Commit

Permalink
Avoid some conditionals in is...UTF8_CHAR()
Browse files Browse the repository at this point in the history
These three functions to determine if the next bit of a string is UTF-8
(constrained in three different ways) have basically the same short
loop.

One of the initial conditions in the while() is always true the first
time around.  By moving that condition to the middle of the loop, we
avoid it for the common case where the loop is executed just once.  This
is when the input is a UTF-8 invariant character (ASCII on ASCII
platforms).

If the functions were constrained to require the first byte pointed to
by the input to exist, the while() could be a do {} while(), and there
would be no extra conditional in calling this vs checking if the next
character is invariant, and if not calling this.  And there would be
fewer conditionals for the case of 2 or more bytes in the character.
  • Loading branch information
khwilliamson committed May 30, 2021
1 parent fef07e7 commit f2ef00a
Showing 1 changed file with 26 additions and 17 deletions.
43 changes: 26 additions & 17 deletions inline.h
Expand Up @@ -1127,16 +1127,19 @@ Perl_isUTF8_CHAR(const U8 * const s0, const U8 * const e)
* on 32-bit ASCII platforms where it trivially is an error). Call a
* helper function for the other platforms. */

while (s < e && LIKELY(state != 1)) {
state = PL_extended_utf8_dfa_tab[256
while (s < e) {
state = PL_extended_utf8_dfa_tab[ 256
+ state
+ PL_extended_utf8_dfa_tab[*s]];
if (state != 0) {
s++;
continue;
s++;

if (state == 0) {
return s - s0;
}

return s - s0 + 1;
if (UNLIKELY(state == 1)) {
break;
}
}

#if defined(UV_IS_QUAD) || defined(EBCDIC)
Expand Down Expand Up @@ -1195,15 +1198,19 @@ Perl_isSTRICT_UTF8_CHAR(const U8 * const s0, const U8 * const e)

PERL_ARGS_ASSERT_ISSTRICT_UTF8_CHAR;

while (s < e && LIKELY(state != 1)) {
state = PL_strict_utf8_dfa_tab[256 + state + PL_strict_utf8_dfa_tab[*s]];
while (s < e) {
state = PL_strict_utf8_dfa_tab[ 256
+ state
+ PL_strict_utf8_dfa_tab[*s]];
s++;

if (state != 0) {
s++;
continue;
if (state == 0) {
return s - s0;
}

return s - s0 + 1;
if (UNLIKELY(state == 1)) {
break;
}
}

#ifndef EBCDIC
Expand Down Expand Up @@ -1261,15 +1268,17 @@ Perl_isC9_STRICT_UTF8_CHAR(const U8 * const s0, const U8 * const e)

PERL_ARGS_ASSERT_ISC9_STRICT_UTF8_CHAR;

while (s < e && LIKELY(state != 1)) {
while (s < e) {
state = PL_c9_utf8_dfa_tab[256 + state + PL_c9_utf8_dfa_tab[*s]];
s++;

if (state != 0) {
s++;
continue;
if (state == 0) {
return s - s0;
}

return s - s0 + 1;
if (UNLIKELY(state == 1)) {
break;
}
}

return 0;
Expand Down

0 comments on commit f2ef00a

Please sign in to comment.