Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Performance bottleneck in arrayValues #134

Closed
libscott opened this Issue Aug 3, 2013 · 1 comment

Comments

Projects
None yet
2 participants

libscott commented Aug 3, 2013

Performance of the Parser.Internal.arrayValues function is somewhat unintuitive:

   object_'                 Data.Aeson.Parser.Internal   272           0    0.0    0.0    98.4   99.4
    jstring_                Data.Aeson.Parser.Internal   273       29000    0.7    0.5    98.4   99.4
     array_'                Data.Aeson.Parser.Internal   275           0    2.6    1.8    97.7   98.9
      array_values            Data.Aeson.Parser.Internal   278           0   78.6   81.4    95.1   97.1
       array_fromlist       Data.Aeson.Parser.Internal   279     1245912   16.5   15.8    16.5   15.8

In this example it's taking over 99.5% of the total time of the program. That's not so bad since this program is not doing much, however the total throughput is about 7MB/s as opposed to 20MB/s for smaller mostly-dictionary inputs. I also suspect thunks since output seems to arrive in small batches.

The array_values and array_fromlist CCs are my own:

arrayValues val = do
  {-# SCC "array_skipspace" #-} skipSpace
  vals <- {-# SCC "array_values" #-} ((val <* skipSpace) `sepBy` (char ',' *> skipSpace)) <* char ']'
  {-# SCC "array_fromlist" #-} return (Vector.fromList vals)

I verified that it's not the floating point values that are slowing it down.

OSX 64bit, GHC 7.6.3, Aeson 0.6.1.0

Example input here: https://gist.github.com/anonymous/6144803/raw/301740340c96290306bb66d9347f07c9220643b0/a.json

bos added a commit that referenced this issue Nov 23, 2013

bos added a commit that referenced this issue Nov 26, 2013

Rewrite arrayValues to be less combinator-driven
This gives us about a 20% performance increase when parsing an
array-heavy input, such as json-data/geometry.json (gh-134).

bos added a commit that referenced this issue Nov 26, 2013

Rework objectValues to use commaSeparated
This gives us a further 10% performance improvement when parsing
object-heavy inputs, e.g. json-data/twitter100.json (gh-134).
Owner

bos commented Nov 27, 2013

So far, I've reduced the memory footprint for your test case by about 25%, and improved performance by about 75%. That's not bad, but I hope to find a few other ways to improve it.

If you want to keep up and try this code yourself, you'll need the latest commits to both the attoparsec repo and this one.

tolysz pushed a commit to tolysz/aeson that referenced this issue May 18, 2015

tolysz pushed a commit to tolysz/aeson that referenced this issue May 18, 2015

Rewrite arrayValues to be less combinator-driven
This gives us about a 20% performance increase when parsing an
array-heavy input, such as json-data/geometry.json (gh-134).

tolysz pushed a commit to tolysz/aeson that referenced this issue May 18, 2015

Rework objectValues to use commaSeparated
This gives us a further 10% performance improvement when parsing
object-heavy inputs, e.g. json-data/twitter100.json (gh-134).

@bos bos closed this Jul 21, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment