Skip to content

Commit

Permalink
Perl_re_op_compile(): handle utf8 concating better
Browse files Browse the repository at this point in the history
When concatting the list of arguments together to form a final pattern
string, the code formerly did a quick scan of all the args first, and
if any of them were SvUTF8, it set the (empty) destination string to UTF8
before concatting all the individual args. This avoided the pattern
getting upgraded to utf8 halfway through, and thus the indices for code
blocks becoming invalid.

However this was not 100% reliable because, as an "XXX" code comment of
mine pointed out, when overloading is involved it is possible for an arg
to appear initially not to be utf8, but to be utf8 when its value is
finally accessed. This results an obscure bug (as shown in the test added
for this commit), where literal /(?{code})/ still required 'use re
"eval"'.

The fix for this is to instead adjust the code block indices on the fly
if the pattern string happens to get upgraded to utf8. This is easy(er)
now that we have the new S_pat_upgrade_to_utf8() function.

As well as fixing the bug, this also simplifies the main concat loop in
the code, which will make it easier to handle interpolating arrays (e.g.
/@foo/) when we move the interpolation from the join op into the regex
engine itself shortly.
  • Loading branch information
iabyn committed Apr 20, 2013
1 parent 0bc87de commit 3573854
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 14 deletions.
28 changes: 14 additions & 14 deletions regcomp.c
Expand Up @@ -5444,20 +5444,8 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
}

if (pat_count > 1) {

pat = newSVpvn("", 0);
SAVEFREESV(pat);

/* determine if the pattern is going to be utf8 (needed
* in advance to align code block indices correctly).
* XXX This could fail to be detected for an arg with
* overloading but not concat overloading; but the main effect
* in this obscure case is to need a 'use re eval' for a
* literal code block */
for (svp = new_patternp; svp < new_patternp + pat_count; svp++) {
if (SvUTF8(*svp))
SvUTF8_on(pat);
}
}

/* process args, concat them if there are multiple ones,
Expand Down Expand Up @@ -5518,8 +5506,20 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
if (SvROK(msv) && SvTYPE(SvRV(msv)) == SVt_REGEXP)
msv = SvRV(msv);
if (pat) {
orig_patlen = SvCUR(pat);
sv_catsv_nomg(pat, msv);
/* this is a partially unrolled
* sv_catsv_nomg(pat, msv);
* that allows us to adjust code block indices if
* needed */
STRLEN slen, dlen;
char *dst = SvPV_force_nomg(pat, dlen);
const char *src = SvPV_flags_const(msv, slen, 0);
orig_patlen = dlen;
if (SvUTF8(msv) && !SvUTF8(pat)) {
S_pat_upgrade_to_utf8(aTHX_ pRExC_state, &dst, &dlen);
sv_setpvn(pat, dst, dlen);
SvUTF8_on(pat);
}
sv_catpvn_nomg(pat, src, slen);
rx = msv;
}
else
Expand Down
24 changes: 24 additions & 0 deletions t/re/overload.t
Expand Up @@ -97,6 +97,18 @@ no warnings 'syntax';

}

{
# returns chr(str)

package OL_CHR;
use overload q{""} => sub {
my $chr = shift;
return chr($$chr);
},
fallback => 1;

}


my $qr;

Expand Down Expand Up @@ -173,6 +185,18 @@ no warnings 'syntax';
}
}

# if the pattern gets (undetectably in advance) upgraded to utf8
# while being concatenated, it could mess up the alignment of the code
# blocks, giving rise to 'Eval-group not allowed at runtime' errs.

$::CONST_QR_CLASS = 'OL_CHR';

{
my $count = 0;
is(eval q{ "\x80\x{100}" =~ /128(?{ $count++ })256/ }, 1,
"OL_CHR eval + match");
is($count, 1, "OL_CHR count");
}

undef $::CONST_QR_CLASS;
}
Expand Down

0 comments on commit 3573854

Please sign in to comment.