Permalink
Commits on Oct 25, 2011
  1. Bump version

    committed Oct 25, 2011
Commits on Oct 4, 2011
  1. Fix a corner case in lazy text search.

    On a chunk boundary, we were not passing the correct mask and skip values
    along to the function that would process the next chunk.
    committed Oct 4, 2011
  2. Silence a compiler warning.

    committed Oct 4, 2011
  3. Eliminate a useless fromIntegral.

    committed Oct 4, 2011
Commits on Aug 22, 2011
  1. Widen dependency on directory

    committed Aug 22, 2011
  2. Add top-level QuickCheck test support.

    The "real" tests remain in tests/tests - this test suite is built without
    optimization, and simply lets us do a quick pass/fail during automated builds.
    committed Aug 22, 2011
  3. Merge

    committed Aug 22, 2011
  4. Merge 1b33e08 into 3845ffd

    GitHub Merge Button committed Aug 22, 2011
Commits on Aug 18, 2011
Commits on Aug 13, 2011
  1. Add Streaming benchmarks

    jaspervdj committed Aug 13, 2011
Commits on Jul 22, 2011
Commits on Jul 20, 2011
  1. Bump version

    committed Jul 20, 2011
  2. Fix an overly cautious bit of arithmetic checking.

    Even though the value behind a Size is an Int, we actually intend that those
    values should always be non-negative. (We don't use the notionally more
    appropriate Word because GHC doesn't do a very good job with it.)
    
    But non-negative means that 0+0 should be 0! Um, oops.
    committed Jul 20, 2011
Commits on Jul 15, 2011
  1. Merge e1bc8a8 into 9e9d83e

    GitHub Merge Button committed Jul 15, 2011
  2. Bump dependency on integer-gmp

    tibbe committed Jul 15, 2011
Commits on Jul 11, 2011
  1. Change where we look for test data

    committed Jul 11, 2011
  2. Update

    committed Jul 11, 2011
Commits on Jul 10, 2011
  1. Portable native UTF-8 decoder gives 3.7x faster decoding

    This code is derived from Björn Höhrmann's UTF-8 decoder.  Compared
    to the original Haskell decoder from cac7dbcbc392, it's between
    2.17 and 3.68 times faster.  It's even between 1.18 and 3.58 times
    faster than the improved Haskell decoder from 71ead801296a.
    
    The x86-specific decoding path gives a substantial win for entirely
    and partly ASCII text, e.g. HTML and XML, at the cost of being about
    17% slower than the portable C decoder for entirely non-ASCII text.
    committed Jul 10, 2011
  2. Merge

    committed Jul 10, 2011
  3. Add Chinese HTML to decode benchmark

    committed Jul 10, 2011
Commits on Jul 8, 2011
  1. Benchmark the performance of iconv.

    On my Mac, it takes 33ms, vs about 20ms for the Haskell code.
    committed Jul 8, 2011
  2. Bump version

    committed Jul 8, 2011
  3. Merge

    committed Jul 8, 2011
  4. Speed up UTF-8 decoding by a little over 2x

    The previous code was more concise, but alas GHC boxed each Word8
    it read from the ByteString, which resulted in poor performance.
    
    This mankier code adds (seemingly required) strictness annotations,
    along with a little bit of manual CSE.
    
    Timing of the DecodeUtf8/Strict benchmark went from 41.8ms to 19.6ms,
    a pleasing improvement.
    committed Jul 8, 2011
Commits on Jun 29, 2011
  1. Bump version

    committed Jun 29, 2011
  2. Merge

    committed Jun 29, 2011
Commits on Jun 28, 2011
  1. Oh noes! I was miscalculating the initial buffer size!

    When performance testing encodeUtf8, I noticed that for some reason I
    was still seeing "ensure" show up in the profile, when I expected it
    shouldn't have been.
    
    Turns out I was using a "min" where I should have been using a "max",
    and thus allocating an initial bytestring that would almost always be
    too small, thus forcing reallocations and copying. Boo!
    committed Jun 28, 2011
  2. Eliminate unnecessary resizes from encodeUtf8.

    We had been performing a resize any time that (a) we had data to write
    and (b) we got to within 4 bytes of filling the target bytestring.
    This was safe, but suboptimal, as it meant that in the common case of
    encoding ASCII text, we would *always* perform a resize.
    
    Now, we check the exact number of bytes we need to fit, and resize
    only if they won't fit.  This eliminates resizes for ASCII data, and
    makes them a little less likely for other data.
    committed Jun 28, 2011