Skip to content

Commit

Permalink
Fix Bugzilla #2642: no match bug in 8-bit mode for caseless invalid utf
Browse files Browse the repository at this point in the history
matching.
  • Loading branch information
PhilipHazel committed Sep 15, 2020
1 parent 0cf247f commit f8cbb1f
Show file tree
Hide file tree
Showing 4 changed files with 22 additions and 2 deletions.
7 changes: 7 additions & 0 deletions ChangeLog
Expand Up @@ -66,6 +66,13 @@ this case have been moved from test 1 to test 2.
12. Further to 10 above, pcre2test has been updated to detect and grumble if a
delimiter other than / is used after #perltest.

13. Fixed a bug with PCRE2_MATCH_INVALID_UTF in 8-bit mode when PCRE2_CASELESS
was set and PCRE2_NO_START_OPTIMIZE was not set. The optimization for finding
the start of a match was not resetting correctly after a failed match on the
first valid fragment of the subject, possibly causing incorrect "no match"
returns on subsequent fragments. For example, the pattern /A/ failed to match
the subject \xe5A. Fixes Bugzilla #2642.


Version 10.35 09-May-2020
---------------------------
Expand Down
10 changes: 8 additions & 2 deletions src/pcre2_match.c
Expand Up @@ -6115,8 +6115,8 @@ BOOL has_req_cu = FALSE;
BOOL startline;

#if PCRE2_CODE_UNIT_WIDTH == 8
BOOL memchr_not_found_first_cu = FALSE;
BOOL memchr_not_found_first_cu2 = FALSE;
BOOL memchr_not_found_first_cu;
BOOL memchr_not_found_first_cu2;
#endif

PCRE2_UCHAR first_cu = 0;
Expand Down Expand Up @@ -6709,6 +6709,11 @@ the loop runs just once. */
start_partial = match_partial = NULL;
mb->hitend = FALSE;

#if PCRE2_CODE_UNIT_WIDTH == 8
memchr_not_found_first_cu = FALSE;
memchr_not_found_first_cu2 = FALSE;
#endif

for(;;)
{
PCRE2_SPTR new_start_match;
Expand Down Expand Up @@ -7187,6 +7192,7 @@ if (utf && end_subject != true_end_subject &&
starting code units in 8-bit and 16-bit modes. */

start_match = end_subject + 1;

#if PCRE2_CODE_UNIT_WIDTH != 32
while (start_match < true_end_subject && NOT_FIRSTCU(*start_match))
start_match++;
Expand Down
3 changes: 3 additions & 0 deletions testdata/testinput10
Expand Up @@ -610,4 +610,7 @@
/X(\x{e1})Y/replace=>\U$1<,substitute_extended
X\x{e1}Y

/A/utf,match_invalid_utf,caseless
\xe5A

# End of testinput10
4 changes: 4 additions & 0 deletions testdata/testoutput10
Expand Up @@ -1871,4 +1871,8 @@ Subject length lower bound = 1
X\x{e1}Y
1: >\xe1<

/A/utf,match_invalid_utf,caseless
\xe5A
0: A

# End of testinput10

0 comments on commit f8cbb1f

Please sign in to comment.