While it is very cool indeed, it is slower than the new C code under all circumstances, sometimes by a factor of two or more.
Once I noticed that I'd screwed up the range checking and fixed it, it became slow enough to not be worth it. All test cases are about 10% faster with this extra complexity removed, with the exception of pure Russian, which is about 50% slower.
This makes it rather expensive, alas.
This helps performance a lot in most cases: up to 2x faster, in fact. The exception seems to be Japanese, which is slowed down by about 10%.
Since encodeUtf8_2 wins under all circumstances, there's no reason to keep the intermediate version around.
This has the odd side effect of improving tiny-string performance from 20% slower then encodeUtf8_1 to about 5% faster. Never stop being weird, GHC optimizer!
Not surprisingly, this is a lot faster than encodeUtf8_1 and the Builder-based rewrite under almost all circumstances. It's slower on tiny inputs (20%), but roughly twice as fast as encodeUtf8_1 on longer inputs. --HG-- extra : amend_source : 093410e
On a 5 byte string the conversion of strict text to a strict bytestring is still a factor 2x slower than the custom 'encodeUtf8_1' routine. However, this is much better than the factor 4.5x that we started with. I attribute the slowdown to the more expensive startup cost for the bytestring-builder-based solution. Note that this startup cost is shared in case a small string is encoded as part of a larger document, e.g., a JSON document. I am thus not sure how relevant the small string performance for converting to individual strict 'ByteString's is. Note that the ASCII performance of the Builder-based UTF-8 encoder is 1.6x faster than 'encodeUtf8_1'. The japanese and russion performance is about the same. Note also that the Builder-based strict text UTF-8 encoder has the benefit that it won't waste any memory. In contrast, the 'encodeUtf8_1' function can allocate as much as 4 times more memory than needed, as it does not trim the resulting bytestring.
The value that was having too general a type inferred is now a pointer, so inference doesn't accidentally overgeneralize.
This helps performance quite a bit! Now encoding Japanese text is 2x faster than encodeUtf8, as opposed to 30% faster before. Not bad!
--HG-- extra : amend_source : badbc20
This requires a bit more torturing to maintain performance. For some unknown reason, doing the same refactoring on go4 decreases performance on russian-small.txt by half!
The goal here is to avoid a buffer size check on every iteration, instead only doing one the first time we encounter some input that's larger than the buffer we preallocated. This helps performance rather a lot: we don't regress on the smallest inputs, but we are up to 35% faster than the previous version of encodeUtf8 on larger inputs.
…tegration Polish UTF-8 bytestring builder support
The counter-example for the existing code is a string of length '2*n' that starts with 'n' characters with codepoints in the range (0x7F, 0x7FF) and ends with 'n' ASCII characters. All 'n' ASCII characters will be written after the end of the output buffer.