Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Perl_re_op_compile(): handle utf8 concating better
When concatting the list of arguments together to form a final pattern string, the code formerly did a quick scan of all the args first, and if any of them were SvUTF8, it set the (empty) destination string to UTF8 before concatting all the individual args. This avoided the pattern getting upgraded to utf8 halfway through, and thus the indices for code blocks becoming invalid. However this was not 100% reliable because, as an "XXX" code comment of mine pointed out, when overloading is involved it is possible for an arg to appear initially not to be utf8, but to be utf8 when its value is finally accessed. This results an obscure bug (as shown in the test added for this commit), where literal /(?{code})/ still required 'use re "eval"'. The fix for this is to instead adjust the code block indices on the fly if the pattern string happens to get upgraded to utf8. This is easy(er) now that we have the new S_pat_upgrade_to_utf8() function. As well as fixing the bug, this also simplifies the main concat loop in the code, which will make it easier to handle interpolating arrays (e.g. /@foo/) when we move the interpolation from the join op into the regex engine itself shortly.
- Loading branch information