Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Refactor utf8 to code point conversion
Most such conversions occur in the inlined function Perl_utf8n_to_uvchr_msgs(), which several macros like utf8n_to_uvchr() expand to. This commit effectively removes a conditional from inside the loop, and avoids some conditionals when converting the common case of the input being UTF-8 invariant (ASCII on ASCII platforms). Prior to this commit, the code did something different the first time through the loop than the other times. By hoisting that to pre-loop initialization, that conditional is removed. That meant rearranging the loop to be a while(1), and have its exit conditions in the middle. All calls to this function from the Perl core pass in a non-empty string. But outside calls could conceivably pass an empty one which could lead to reading outside the buffer. An extra check is added to non-core calls, as is already done elsewhere. This change means that calls from core execute no more conditionals than the typical: if (UTF8_IS_INVARIANT(*s)) { code_point = *s; } else { code_point = utf8n_to_uvchr(s, ...) } I'm therefore thinking these can now just be replaced by the simpler code_point = utf8n_to_uvchr(s, ...) without a noticeable hit in performance. The essential difference is that the former gets its code point from the string already being examined, and the latter looks up data in a 450 byte static array that is referred to constantly, so is likely to be cached. f
- Loading branch information