Skip to content

adithyaov/haskell-parser-benchmarks

Repository files navigation

Haskell parsing benchmarks

A simple benchmark comparing the performance of different parser implementations. The objective is to parse the following grammar:

SUM   = SUM '+' PROD
      | SUM '-' PROD
      | PROD

PROD  = PROD '*' ATOM
      | PROD '/' ATOM
      | ATOM

ATOM  = integer
      | '(' SUM ')'

The following parsers are implemented right now:

Benchmark

Parse the file generated by gen-example.py. It is roughly 5MB in size and contains around 1 million numbers, each having 1 to 4 digits, separated by a randomly chosen operator (+, -, *, /). Parenthesis are randomly inserted.

Reading the file is part of the benchmark, since I would consider this part of the parser.

Results

The benchmark was compiled with GHC 9.2.4, without a threaded runtime or the LLVM code generator, but with -O2.

ByteString

Parser Time Factor Memory allocated Peak memory
Flatparse 210 ms 1.00x 70 MB 96 MB
Handwritten.CPS 224 ms 1.07x 289 MB 61 MB
Handwritten.Normal 226 ms 1.08x 301 MB 59 MB
Attoparsec 364 ms 1.73x 1.3 GB 97 MB
Parsley 419 ms 2.00x 1.0 GB 102 MB
Megaparsec/Happy 450 ms 2.14x 1.9 GB 95 MB
Megaparsec 456 ms 2.18x 2.2 GB 94 MB
Alex/Happy 653 ms 3.11x 2.8 GB 92 MB
Parsec 2.09 s 9.96x 7.6 GB 192 MB

Text

Parser Time Factor Memory allocated Peak memory
Attoparsec 392 ms 1.87x 1.3 GB 85 MB
Parsley 445 ms 2.00x 1.0 GB 87 MB
Megaparsec 601 ms 2.87x 3.0 GB 81 MB
Parsec 2.06 s 9.82x 7.6 GB 172 MB
UU Parsing Lib 3.81 s 18.19x 5.5 GB 672 MB

Notes

Flatparse, Attoparsec, Megaparsec, and Parsley benefit greatly from the Strict GHC extension, as they run twice as fast. Generating the happy parser with --strict improves its performance by 25%. Using an explicit export list gave another 25% speedup. The handwritten parser performs best with StrictData. All implementations suffer from at least a 2x slowdown when compiled with -threaded and run with +RTS -N.

I did try benchmarking Earley, but on a file this size it consumed multiple gigabytes of memory. On smaller files it was around 200x slower than Flatparse. I did however not use a tokenizer, which could have improved its performance.

Running it yourself

$ python ./gen-example.py
$ cabal bench

If you want to make changes, you should run

$ cabal bench --benchmark-options="--csv baseline.csv"

once. Make the changes you want to make, and then run

$ cabal bench --benchmark-options="--baseline baseline.csv"

to see how much the performance has changed.

IMPORTANT: When simply running cabal bench the reported peak memory consumption is not accurate, as the RTS will not free unused memory between benchmarks. So the numbers will only go up, never down.

To measure peak memory consumption, but not time, run

$ ./peak-memory.sh

Credits

Name Contribution
Jaro Reinders UU Parsing Lib benchmark
Vladislav Zavialov Megaparsec/Happy benchmark

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •