Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge. #3740

Merged
merged 2 commits into from Dec 7, 2016

Conversation

gianm
Copy link
Contributor

@gianm gianm commented Dec 5, 2016

~30% improvement on query benchmarks that include serde time (queryMultiQueryableIndexWithSerde).

Specifically:

  • Remove timestamp from RowBasedKey when not needed.
  • Set timestamp to null in MapBasedRows that are not part of the final merge.
  • Added two new benchmarks, queryMultiQueryableIndexWithSerde (simulates serde between historical and broker) and queryMultiQueryableIndexWithSpilling (includes spilling to disk) that show the improvement here, which is mostly from having less serde work to do.
groupby-improvements

Benchmark                                              (defaultStrategy)  (initialBuckets)  (numProcessingThreads)  (numSegments)  (queryGranularity)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt        Score       Error  Units
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 all            100000           basic.A  avgt   30   373040.126 ±  6687.778  us/op
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 day            100000           basic.A  avgt   30   704732.206 ±  9292.572  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 all            100000           basic.A  avgt   30   486083.016 ±  4252.756  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 day            100000           basic.A  avgt   30  1028039.357 ± 11060.639  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 all            100000           basic.A  avgt   30   444659.485 ±  5380.572  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 day            100000           basic.A  avgt   30   532730.064 ±  6565.590  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 all            100000           basic.A  avgt   30    75440.164 ±  1382.679  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 day            100000           basic.A  avgt   30    76651.784 ±  1288.932  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 all            100000           basic.A  avgt   30    37673.145 ±   689.344  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 day            100000           basic.A  avgt   30    40981.706 ±  1276.147  us/op

master

Benchmark                                              (defaultStrategy)  (initialBuckets)  (numProcessingThreads)  (numSegments)  (queryGranularity)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt        Score       Error  Units
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 all            100000           basic.A  avgt   30   366566.054 ±  5324.826  us/op
GroupByBenchmark.queryMultiQueryableIndex                             v2                -1                       2              4                 day            100000           basic.A  avgt   30   704091.741 ±  7608.095  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 all            100000           basic.A  avgt   30   676987.839 ± 10675.159  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSerde                    v2                -1                       2              4                 day            100000           basic.A  avgt   30  1015616.319 ± 12242.756  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 all            100000           basic.A  avgt   30   487088.000 ±  6308.555  us/op
GroupByBenchmark.queryMultiQueryableIndexWithSpilling                 v2                -1                       2              4                 day            100000           basic.A  avgt   30   536788.568 ± 10157.553  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 all            100000           basic.A  avgt   30    75694.447 ±  1499.463  us/op
GroupByBenchmark.querySingleIncrementalIndex                          v2                -1                       2              4                 day            100000           basic.A  avgt   30    76950.858 ±  1223.638  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 all            100000           basic.A  avgt   30    37716.549 ±   938.930  us/op
GroupByBenchmark.querySingleQueryableIndex                            v2                -1                       2              4                 day            100000           basic.A  avgt   30    38152.708 ±   548.103  us/op

@gianm gianm added this to the 0.9.3 milestone Dec 5, 2016
@gianm gianm assigned fjy and jon-wei Dec 5, 2016
@fjy
Copy link
Contributor

fjy commented Dec 5, 2016

👍

@fjy
Copy link
Contributor

fjy commented Dec 6, 2016

@gianm there's some conflicts

…t for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
@gianm
Copy link
Contributor Author

gianm commented Dec 6, 2016

@fjy @jon-wei updated

@jon-wei
Copy link
Contributor

jon-wei commented Dec 6, 2016

👍 after travis

@fjy fjy merged commit b1bac9f into apache:master Dec 7, 2016
dgolitsyn pushed a commit to metamx/druid that referenced this pull request Feb 14, 2017
…t for the final merge. (apache#3740)

* GroupByBenchmark: Add serde, spilling, all-gran benchmarks.

Also use more iterations.

* groupBy v2: Ignore timestamp completely when granularity = all, except for the final merge.

Specifically:

- Remove timestamp from RowBasedKey when not needed
- Set timestamp to null in MapBasedRows that are not part of the final merge.
@gianm gianm deleted the groupby-improvements branch March 1, 2017 03:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants