Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Performance bottleneck in arrayValues #134

Open
ssadler opened this Issue · 1 comment

2 participants

@ssadler

Performance of the Parser.Internal.arrayValues function is somewhat unintuitive:

   object_'                 Data.Aeson.Parser.Internal   272           0    0.0    0.0    98.4   99.4
    jstring_                Data.Aeson.Parser.Internal   273       29000    0.7    0.5    98.4   99.4
     array_'                Data.Aeson.Parser.Internal   275           0    2.6    1.8    97.7   98.9
      array_values            Data.Aeson.Parser.Internal   278           0   78.6   81.4    95.1   97.1
       array_fromlist       Data.Aeson.Parser.Internal   279     1245912   16.5   15.8    16.5   15.8

In this example it's taking over 99.5% of the total time of the program. That's not so bad since this program is not doing much, however the total throughput is about 7MB/s as opposed to 20MB/s for smaller mostly-dictionary inputs. I also suspect thunks since output seems to arrive in small batches.

The array_values and array_fromlist CCs are my own:

arrayValues val = do
  {-# SCC "array_skipspace" #-} skipSpace
  vals <- {-# SCC "array_values" #-} ((val <* skipSpace) `sepBy` (char ',' *> skipSpace)) <* char ']'
  {-# SCC "array_fromlist" #-} return (Vector.fromList vals)

I verified that it's not the floating point values that are slowing it down.

OSX 64bit, GHC 7.6.3, Aeson 0.6.1.0

Example input here: https://gist.github.com/anonymous/6144803/raw/301740340c96290306bb66d9347f07c9220643b0/a.json

@bos bos referenced this issue from a commit
@bos Rewrite arrayValues to be less combinator-driven
This gives us about a 20% performance increase when parsing an
array-heavy input, such as json-data/geometry.json (gh-134).
da17f1f
@bos bos referenced this issue from a commit
@bos Rework objectValues to use commaSeparated
This gives us a further 10% performance improvement when parsing
object-heavy inputs, e.g. json-data/twitter100.json (gh-134).
ed8c095
@bos
Owner

So far, I've reduced the memory footprint for your test case by about 25%, and improved performance by about 75%. That's not bad, but I hope to find a few other ways to improve it.

If you want to keep up and try this code yourself, you'll need the latest commits to both the attoparsec repo and this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.