Skip to content

Commit

Permalink
utf8_length: Fix undefined C behavior
Browse files Browse the repository at this point in the history
In C the comparison of two pointers is only legal if both point to
within the same object, or to a virtual element one above the high edge
of the object.

The previous code was doing an addition potentially outside that range,
and so the results would be undefined.
  • Loading branch information
khwilliamson committed Jun 11, 2021
1 parent f6111fa commit bdc84a9
Showing 1 changed file with 19 additions and 11 deletions.
30 changes: 19 additions & 11 deletions utf8.c
Expand Up @@ -2372,23 +2372,31 @@ Perl_utf8_length(pTHX_ const U8 *s, const U8 *e)
* the bitops (especially ~) can create illegal UTF-8.
* In other words: in Perl UTF-8 is not just for Unicode. */

if (UNLIKELY(e < s))
goto warn_and_return;
while (s < e) {
s += UTF8SKIP(s);
Ptrdiff_t expected_byte_count = UTF8SKIP(s);

if (UNLIKELY(e - s < expected_byte_count)) {
goto warn_and_return;
}

len++;
s += expected_byte_count;
expected_byte_count = UTF8SKIP(s);
}

if (UNLIKELY(e != s)) {
len--;
warn_and_return:
if (PL_op)
Perl_ck_warner_d(aTHX_ packWARN(WARN_UTF8),
"%s in %s", unees, OP_DESC(PL_op));
else
Perl_ck_warner_d(aTHX_ packWARN(WARN_UTF8), "%s", unees);
if (LIKELY(e == s)) {
return len;
}

/* Here, s > e on entry */

warn_and_return:
if (PL_op)
Perl_ck_warner_d(aTHX_ packWARN(WARN_UTF8),
"%s in %s", unees, OP_DESC(PL_op));
else
Perl_ck_warner_d(aTHX_ packWARN(WARN_UTF8), "%s", unees);

return len;
}

Expand Down

0 comments on commit bdc84a9

Please sign in to comment.