My assertion that it was safe to skip the "do I have 1 byte available?" check was incorrect.
This code is derived from Björn Höhrmann's UTF-8 decoder. Compared to the original Haskell decoder from cac7dbcbc392, it's between 2.17 and 3.68 times faster. It's even between 1.18 and 3.58 times faster than the improved Haskell decoder from 71ead801296a. The x86-specific decoding path gives a substantial win for entirely and partly ASCII text, e.g. HTML and XML, at the cost of being about 17% slower than the portable C decoder for entirely non-ASCII text.
The previous code was more concise, but alas GHC boxed each Word8 it read from the ByteString, which resulted in poor performance. This mankier code adds (seemingly required) strictness annotations, along with a little bit of manual CSE. Timing of the DecodeUtf8/Strict benchmark went from 41.8ms to 19.6ms, a pleasing improvement.
When performance testing encodeUtf8, I noticed that for some reason I was still seeing "ensure" show up in the profile, when I expected it shouldn't have been. Turns out I was using a "min" where I should have been using a "max", and thus allocating an initial bytestring that would almost always be too small, thus forcing reallocations and copying. Boo!
We had been performing a resize any time that (a) we had data to write and (b) we got to within 4 bytes of filling the target bytestring. This was safe, but suboptimal, as it meant that in the common case of encoding ASCII text, we would *always* perform a resize. Now, we check the exact number of bytes we need to fit, and resize only if they won't fit. This eliminates resizes for ASCII data, and makes them a little less likely for other data.
This was inspired by a patch from Simon Meier, who wrote a direct implementation of encodeUtf8 using his 'blaze-builder' package. His code showed a very impressive speedup. My code is similar in both structure and performance, its chief difference being that it doesn't require 'blaze-builder'. --HG-- extra : convert_revision : 1b338ee3a345ac1e437be1f5d8cd0919d9690c14
--HG-- extra : convert_revision : 3795901067732c91b235f9281f8e3691756dc5d3