Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs and benchmark for JSON flattening parser #1921

Merged
merged 1 commit into from
Dec 15, 2015

Conversation

jon-wei
Copy link
Contributor

@jon-wei jon-wei commented Nov 6, 2015

Part of a set of 3 related pull requests, addressing Druid issue:
#1839

metamx/java-util#34 -- new JSON parser
druid-io/druid-api#65 -- ingestion spec modifications
#1921 -- docs and benchmark

Adds documentation and a JMH benchmark for comparing the new JsonPath-based parser with the existing parser.

Example benchmark result:

java -jar target/benchmarks.jar FlattenJSONBenchmark -wi 1 -i 250 -f 1 -v EXTRA

Result "baseline":
19.366 ±(99.9%) 0.268 us/op [Average](min, avg, max) = (17.979, 19.366, 28.522), stdev = 1.274
CI (99.9%): [19.098, 19.634](assumes normal distribution)

Result "flatten":
25.113 ±(99.9%) 0.434 us/op [Average](min, avg, max) = (22.311, 25.113, 37.351), stdev = 2.061
CI (99.9%): [24.679, 25.547](assumes normal distribution)

Run complete. Total time: 00:10:46

Benchmark Mode Cnt Score Error Units
FlattenJSONBenchmark.baseline avgt 250 19.366 ± 0.268 us/op
FlattenJSONBenchmark.flatten avgt 250 25.113 ± 0.434 us/op

<version>0.1.0</version>
</dependency>
<dependency>
<groupId>com.jayway.jsonpath</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shouldn't have to add this, it should get pulled in by java-util now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gianm removed JsonPath dependency from benchmarks

@jon-wei
Copy link
Contributor Author

jon-wei commented Nov 10, 2015

@drcrallen @gianm @himanshug

The benchmarks were unnecessarily creating a StringInputRowParser, I've revised them to use only the JSONParser/JSONPathParser.

New benchmark numbers:

Benchmark Mode Cnt Score Error Units
FlattenJSONBenchmark.baseline avgt 50 20.971 ± 2.840 us/op
FlattenJSONBenchmark.flatten avgt 50 36.168 ± 7.699 us/op
FlattenJSONBenchmark.preflattenNestedParser avgt 50 19.007 ± 3.677 us/op

baseline is old JSONParser
flatten is new JSONPathParser on nested input
preflatten is new JSONPathParser on preflattened input, same input as baseline

After discussing some profiling results with @gianm today, I added a 4th benchmark that extracts root-level fields with JsonPath (i.e., specify a "nested" field but with a root expression). This was done to get an idea of the overhead added by JsonPath, normally the new JSONPathParser would read directly from the Jackson-provided Map for root fields.

With the same flattened input as baseline and preflatten:
FlattenJSONBenchmark.forcedRootPaths avgt 50 30.024 ± 4.324 us/op

It looks like:

  • the new parser and old parser perform similarly for preflattened input.
  • parsing my test nested input takes ~50% longer compared to a preflattened input
  • JsonPath reads add significant overhead for both root fields and more deeply nested fields.

@himanshug
Copy link
Contributor

Given the results, I am +1 for deprecating old JsonParser and just using the new one.

@himanshug
Copy link
Contributor

👍

@jon-wei jon-wei force-pushed the flat_json branch 2 times, most recently from 1e27b04 to 380b830 Compare November 11, 2015 22:08
@jon-wei
Copy link
Contributor Author

jon-wei commented Nov 24, 2015

Before merging, needs a new druid-api version that includes druid-io/druid-api@e8c3533

@jon-wei
Copy link
Contributor Author

jon-wei commented Dec 10, 2015

Updated pom.xml to use druid-api 3.14, is this good to merge?

@jon-wei
Copy link
Contributor Author

jon-wei commented Dec 15, 2015

@gianm @himanshug Do you have any more feedback for this?

@himanshug
Copy link
Contributor

👍

1 similar comment
@gianm
Copy link
Contributor

gianm commented Dec 15, 2015

👍

gianm added a commit that referenced this pull request Dec 15, 2015
Add docs and benchmark for JSON flattening parser
@gianm gianm merged commit e6c2db8 into apache:master Dec 15, 2015
@fjy fjy modified the milestone: 0.9.0 Feb 4, 2016
@fjy fjy mentioned this pull request Mar 31, 2016
@jon-wei jon-wei deleted the flat_json branch October 6, 2017 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants