Haskell parsing benchmarks

A simple benchmark comparing the performance of different parser implementations. The objective is to parse the following grammar:

SUM   = SUM '+' PROD
      | SUM '-' PROD
      | PROD

PROD  = PROD '*' ATOM
      | PROD '/' ATOM
      | ATOM

ATOM  = integer
      | '(' SUM ')'

The following parsers are implemented right now:

Handwritten: A handwritten lexer and recursive ascent parser.
Attoparsec
Megaparsec
Parsec
Flatparse
UU Parsing Lib
Megaparsec/Happy: An LALR(1) parser, generated by Happy, and a lexer, written using Megaparsec.
Alex/Happy: An LALR(1) parser, generated by Happy, and a lexer, generated by Alex.
Parsley

Benchmark

Parse the file generated by gen-example.py. It is roughly 5MB in size and contains around 1 million numbers, each having 1 to 4 digits, separated by a randomly chosen operator (+, -, *, /). Parenthesis are randomly inserted.

Reading the file is part of the benchmark, since I would consider this part of the parser.

Results

The benchmark was compiled with GHC 9.2.4, without a threaded runtime or the LLVM code generator, but with -O2.

ByteString

Parser	Time	Factor	Memory allocated	Peak memory
Flatparse	210 ms	1.00x	70 MB	96 MB
Handwritten.CPS	224 ms	1.07x	289 MB	61 MB
Handwritten.Normal	226 ms	1.08x	301 MB	59 MB
Attoparsec	364 ms	1.73x	1.3 GB	97 MB
Parsley	419 ms	2.00x	1.0 GB	102 MB
Megaparsec/Happy	450 ms	2.14x	1.9 GB	95 MB
Megaparsec	456 ms	2.18x	2.2 GB	94 MB
Alex/Happy	653 ms	3.11x	2.8 GB	92 MB
Parsec	2.09 s	9.96x	7.6 GB	192 MB

Text

Parser	Time	Factor	Memory allocated	Peak memory
Attoparsec	392 ms	1.87x	1.3 GB	85 MB
Parsley	445 ms	2.00x	1.0 GB	87 MB
Megaparsec	601 ms	2.87x	3.0 GB	81 MB
Parsec	2.06 s	9.82x	7.6 GB	172 MB
UU Parsing Lib	3.81 s	18.19x	5.5 GB	672 MB

Notes

Flatparse, Attoparsec, Megaparsec, and Parsley benefit greatly from the Strict GHC extension, as they run twice as fast. Generating the happy parser with --strict improves its performance by 25%. Using an explicit export list gave another 25% speedup. The handwritten parser performs best with StrictData. All implementations suffer from at least a 2x slowdown when compiled with -threaded and run with +RTS -N.

I did try benchmarking Earley, but on a file this size it consumed multiple gigabytes of memory. On smaller files it was around 200x slower than Flatparse. I did however not use a tokenizer, which could have improved its performance.

Running it yourself

$ python ./gen-example.py
$ cabal bench

If you want to make changes, you should run

$ cabal bench --benchmark-options="--csv baseline.csv"

once. Make the changes you want to make, and then run

$ cabal bench --benchmark-options="--baseline baseline.csv"

to see how much the performance has changed.

IMPORTANT: When simply running cabal bench the reported peak memory consumption is not accurate, as the RTS will not free unused memory between benchmarks. So the numbers will only go up, never down.

To measure peak memory consumption, but not time, run

$ ./peak-memory.sh

Credits

Name	Contribution
Jaro Reinders	UU Parsing Lib benchmark
Vladislav Zavialov	Megaparsec/Happy benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
bench		bench
src		src
test		test
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
LICENSE		LICENSE
README.md		README.md
cabal.project		cabal.project
gen-example.py		gen-example.py
haskell-parsing-benchmarks.cabal		haskell-parsing-benchmarks.cabal
peak-memory.sh		peak-memory.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Haskell parsing benchmarks

Benchmark

Results

ByteString

Text

Notes

Running it yourself

Credits

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

License

adithyaov/haskell-parser-benchmarks

Folders and files

Latest commit

History

Repository files navigation

Haskell parsing benchmarks

Benchmark

Results

ByteString

Text

Notes

Running it yourself

Credits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages