New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse POMs with SAX parser #1029

merged 9 commits into from Jan 23, 2019


None yet
1 participant
Copy link

alexarchambault commented Jan 23, 2019

No description provided.

@alexarchambault alexarchambault force-pushed the topic/pom-sax-parser branch from 310200a to 24955d3 Jan 23, 2019


This comment has been minimized.

Copy link
Member Author

alexarchambault commented Jan 23, 2019

Benchmark results for the new SAX parser (as of 24955d3)

$ sbt 'benchmark/jmh:run -i 20 -wi 20 -f1 -t1'
[info] Benchmark                              Mode  Cnt    Score    Error  Units
[info] ParseTests.parseApacheParent           avgt   20    0.783 ±  0.011  ms/op
[info] ParseTests.parseSparkParent            avgt   20    2.936 ±  0.037  ms/op
[info] ParseTests.parseSparkParentMavenModel  avgt   20    0.692 ±  0.001  ms/op
[info] ParseTests.parseSparkParentXmlDom      avgt   20    2.288 ±  0.002  ms/op
[info] ParseTests.parseSparkParentXmlSax      avgt   20    0.997 ±  0.001  ms/op
[info] ProcessingTests.sparkSql               avgt   20    0.647 ±  0.001  ms/op
[info] ResolutionDomTests.coursierCli         avgt   20    4.235 ±  0.035  ms/op
[info] ResolutionDomTests.sparkSql            avgt   20  187.636 ±  0.389  ms/op
[info] ResolutionTests.coursierCli            avgt   20    3.856 ±  0.021  ms/op
[info] ResolutionTests.sparkSql               avgt   20  168.218 ±  0.384  ms/op
[success] Total time: 4048 s, completed Jan 23, 2019 2:28:56 PM

parseSparkParentXmlDom and parseSparkParentXmlSax are basically String => Project functions, going from the content of org.apache.spark:spark-parent_2.12:2.4.0 to a coursier.core.Project, either via an AST of the XML, or via a SAX-based parser. The SAX parser divides that time by more than 50%.

ResolutionDomTests.sparkSql and ResolutionTests.sparkSql run whole resolutions of org.apache.spark:spark-sql_2.12:2.4.0, hitting metadata fully in memory (no network or file I/O). The gain is only around ~10 % here. This gain is kind of disappointing compared to the one in the former comparison (which looks at a more atomic operation).

@alexarchambault alexarchambault merged commit 6d0e242 into master Jan 23, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
continuous-integration/travis-ci/pr The Travis CI build passed

@alexarchambault alexarchambault deleted the topic/pom-sax-parser branch Jan 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment