Please sign in to comment.
Speed up UTF-8 validation checking on modern perls
Perl 5.26 introduced infrastructure in the core that can be used by Encode to check UTF-8 stream validity much faster than before. It is not clear when or if this functionality will be backported into Devel::PPPort, in part because there is no one available currently who knows how to do it, and in part because it may be that everyone else relies on Encode, so it's not needed generally to be backported. This commit replaces the current scheme for checking UTF-8 validity if the infrastructure is availabe, by one in which normal processing doesn't require having to decode the UTF-8 into code points. The copying of characters individually from the input to the output is changed to be a single operation for each entire span of valid input at once. Thus in the normal case, what ends up happening is a tight loop to check the validity, and then a memmove of the entire input to the output, then return. If an error is found, it copies all the valid input before the error, then handles the character in error, then positions to the next input position, and repeats the whole process starting from there. Thus, this does not need to know about the intricacies of UTF-8 malformations, relying on the core to handle this. There are currently some problems with Encode on EBCDIC platforms. The infrastructure is known to correctly work there, so I'm hopeful this will solve these portability issues.
- Loading branch information...