Feat encode via bytestring builder #172

Merged
merged 2 commits into from Jan 8, 2014

2 participants

@meiersi

This pull request relies on the pull request bos/text#63. The key benefit is a 35% to 40% speed improvement for encode.

Note that I had to remove the deriving Data on Value to get the master branch of aeson to compile. This should be reverted.

Note also that text-1.1.0.0 compiled against bytestring < 0.10.4.0 seems to have a bug in its UTF-8 encoding routine. This is exposed by the tests in aeson. It might be that the input pointer is computed wrongly, as from time to time there was garbled output, which might be due to massive escaping due to an off-by-one error.

meiersi added some commits Jan 3, 2014
@meiersi meiersi Remove deriving 'Data', as 'Scientific' is no instance of 'Data'. 99882c9
@meiersi meiersi Use fast 'encodeUtf8BuilderEscaped' if possible
- only use it when compiling against bytestring >= 0.10.4.0
- speed improvement over bytestring-via-text encoding:
    factor 1.5 - 1.6 for japanase and english JSON messages
    factor 2         for integer encoding

- equal speed for encoding floats
7f92045
@meiersi

The easiest way to reproduce the problem with UTF8-encoding that I referred to above is to install aeson with -f-new-bytestring-builder and run its test suite. Here's an example output

  benchmarks/json-data/twitter1.json: [Failed]
expected: "{\"since_id_str\":\"0\",\"results\":[{\"profile_image_url\":\"http://a2.twimg.com/profile_images/536455139/icon32_normal.png\",\"id_str\":\"30159761706061824\",\"from_user_id\":80430860,\"to_user_id_str\":null,\"text\":\"Haskell Server Pages \227\129\163\227\129\166\227\128\129\227\129\190\227\129\160\231\182\154\227\129\132\227\129\166\227\129\132\227\129\159\227\129\174\227\129\139\239\188\129\",\"from_user\":\"kazu_yamamoto\",\"to_user_id\":null,\"iso_language_code\":\"no\",\"created_at\":\"Wed, 26 Jan 2011 07:07:02 +0000\",\"metadata\":{\"result_type\":\"recent\"},\"source\":\"&lt;a href=&quot;http://twitter.com/&quot;&gt;web&lt;/a&gt;\",\"geo\":null,\"id\":30159761706061824,\"from_user_id_str\":\"80430860\"}],\"completed_in\":1.2606e-2,\"next_page\":\"?page=2&max_id=30159761706061824&rpp=1&q=haskell\",\"page\":1,\"refresh_url\":\"?since_id=30159761706061824&q=haskell\",\"max_id_str\":\"30159761706061824\",\"query\":\"haskell\",\"since_id\":0,\"max_id\":30159761706061824,\"results_per_page\":1}"
 but got: "{\"since_id_str\":\"0\",\"results\":[{\"profile_image_url\":\"http://a2.twimg.com/profile_images/536455139/icon32_normal.png\",\"id_str\":\"30159761706061824\",\"from_user_id\":80430860,\"to_user_id_str\":null,\"text\":\"Haskell Server Pages \227\129\163\227\129\166\227\128\129\227\129\190\227\129\160\231\182\154\227\129\132\227\129\166\227\129\132\227\129\159\227\129\174\227\129\139\239\188\129\",\"from_user\"\144\221\139\DLE\SOH\NUL\NUL\NULh\NUL\NUL\NUL\NUL\NUL\NUL\NUL80to_user_id\":null,\"iso_language_code\":\"no\",\"created_at\":\"Wed, 26 Jan 2011 07:07:02 +0000\",\"metadata\":{\"result_type\":\"recent\"},\"source\":\"&lt;a href=&quot;http://twitter.com/&quot;&gt;web&lt;/a&gt;\",\"geo\":null,\"id\":30159761706061824,\"from_user_id_str\":\"80430860\"}],\"completed_in\":1.2606e-2,\"next_page\":\"?page=2&max_id=30159761706061824&rpp=1&q=haskell\",\"page\":1,\"refresh_url\":\"?since_id=30159761706061824&q=haskell\",\"max_id_str\":\"30159761706061824\",\"query\":\"haskell\",\"since_id\":0,\"max_id\":30159761706061824,\"results_per_page\":1}"
@meiersi

I found the problem. Here's a pull request that fixes it: bos/text#64

@bos bos merged commit faa9936 into bos:master Jan 8, 2014
@bos
Owner
bos commented Jan 9, 2014

Nice work, @meiersi — I can reproduce the big performance boost that you report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment