Even though the value behind a Size is an Int, we actually intend that those values should always be non-negative. (We don't use the notionally more appropriate Word because GHC doesn't do a very good job with it.) But non-negative means that 0+0 should be 0! Um, oops.
This code is derived from Björn Höhrmann's UTF-8 decoder. Compared to the original Haskell decoder from cac7dbcbc392, it's between 2.17 and 3.68 times faster. It's even between 1.18 and 3.58 times faster than the improved Haskell decoder from 71ead801296a. The x86-specific decoding path gives a substantial win for entirely and partly ASCII text, e.g. HTML and XML, at the cost of being about 17% slower than the portable C decoder for entirely non-ASCII text.
The previous code was more concise, but alas GHC boxed each Word8 it read from the ByteString, which resulted in poor performance. This mankier code adds (seemingly required) strictness annotations, along with a little bit of manual CSE. Timing of the DecodeUtf8/Strict benchmark went from 41.8ms to 19.6ms, a pleasing improvement.
When performance testing encodeUtf8, I noticed that for some reason I was still seeing "ensure" show up in the profile, when I expected it shouldn't have been. Turns out I was using a "min" where I should have been using a "max", and thus allocating an initial bytestring that would almost always be too small, thus forcing reallocations and copying. Boo!
We had been performing a resize any time that (a) we had data to write and (b) we got to within 4 bytes of filling the target bytestring. This was safe, but suboptimal, as it meant that in the common case of encoding ASCII text, we would *always* perform a resize. Now, we check the exact number of bytes we need to fit, and resize only if they won't fit. This eliminates resizes for ASCII data, and makes them a little less likely for other data.