-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ability to not rollup at index time, make pre aggregation an option #3020
Conversation
Could you please add the ability to find out if there is rollup in a data source from the segmentMetadata query. This would be very helpful for tools that interface with Druid. Like https://github.com/implydata/pivot and https://github.com/airbnb/caravel |
Could you please add docs for how to use this feature? |
Kickass! Thank you for incorporating my comments! |
@@ -71,6 +72,7 @@ An example dataSchema is shown below: | |||
| dataSource | String | The name of the ingested datasource. Datasources can be thought of as tables. | yes | | |||
| parser | JSON Object | Specifies how ingested data can be parsed. | yes | | |||
| metricsSpec | JSON Object array | A list of [aggregators](../querying/aggregations.html). | yes | | |||
| rollup | Boolean | rollup or not | no (default == true) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From that I know of the Druid commiter people they will not like the "default to true" settings. Things should always default to false or so they say :-p. Now the 'simple fix' is to change this setting to (say) disableRollup
and have it default to false
. But I think we should have a conversation about having no rollup be the default in Druid.
'No rollup' is how most DBs out there operate. It would make Druid much easier to evaluate because you would not need to have to think about rollup from day 0.
I would love to get the Druid people @xvrl , @gianm , @cheddar , @fjy , @himanshug , @drcrallen and anyone I am forgetting to comment on possibly making Druid have no rollup by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be a fairly large change to have no rollup as default and require updating the version number. FWIW, I think we should keep rollup on by default and re-evaluate for 0.10.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree with @fjy, I am in favor of keeping rollup on by default at least for 0.9
@kaijianding: Not a blocker for this PR, just out of curiousity, did you ran any benchmarks with rollup disabled ? |
Will post results here when I done the benchmarks @nishantmonu51 |
0231651
to
aff1f1b
Compare
@kaijianding did you get a chance to get the benchmarks? |
Will get benchmark by end of this week, @fjy |
@kaijianding I looked at the core changes in IncrementalIndex today, seems good so far, will review the rest as well |
here is the benchmark, @fjy @nishantmonu51 |
How many iterations did you run? There seems to be a 5-6% slowdown for ingestion, which is significant. There is also a wide error range. Does running more iterations help reduce this error range? If there is actually a 5-6% slowdown in performance, I think it is worth investigating why the slowdown is occurring as ideally there should be no performance impact when rollup is turned on. |
here is the latest benchmark after running with more iterations on a more powerful machine and with slightly code modification, @fjy |
5adf9ef
to
b941b38
Compare
@kaijianding cool! looks much better |
@@ -173,6 +175,11 @@ handle all formatting decisions on their own, without using the ParseSpec. | |||
| dimensionExclusions | JSON String array | The names of dimensions to exclude from ingestion. | no (default == [] | | |||
| spatialDimensions | JSON Object array | An array of [spatial dimensions](../development/geo.html) | no (default == [] | | |||
|
|||
## Rollup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move this section and instead have this info in the table that first describes the rollup setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaijianding can we remove this section and have this info in the description of rollup in the table above?
b5ec414
to
7e75988
Compare
I modified the code according to your comments, and rebased it, do you have more comments? @fjy |
@kaijianding can you fix the merge conflicts? |
fixed. @fjy |
} | ||
|
||
@Override | ||
public Integer getRowIndexForRollup(TimeAndDims key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this getPriorIndex()?
@kaijianding mostly looks good, I have a few more minor comments @jon-wei @nishantmonu51 can u guys take a look ? |
renamed. @fjy |
@fjy sure I'll take a look at this today |
@@ -159,14 +159,18 @@ Append tasks append a list of segments together into a single segment (one after | |||
|
|||
### Merge Task | |||
|
|||
Merge tasks merge a list of segments together. Any common timestamps are merged. The grammar is: | |||
Merge tasks merge a list of segments together. Any common timestamps are merged. | |||
If rollup is disabled as part of ingestion, common timestamps are not merged and rows are reorderded by their timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"reorderded" should be "reordered"
looks good to me aside from my last two comments |
👍 |
@kaijianding can you run benchmarks again just to make sure that nothing has been impacted with all the changes during code review? |
👍 if new benchmarks look good |
here is the benchmark. @fjy @jon-wei BTW, I noticed that the jmh framework would call each benchmark method 1~2 times each iteration. In this way, the second call, it is not called in clean environment. Fg, IndexIngestionBenchmark.addRows test, the second call will cause incIndex to add more rows than expected. |
here is the Proposal https://groups.google.com/forum/#!msg/druid-development/GE8QtSMdoFo/BhEkLIOjBAAJ
In our case, we need to keep all details instead of rollup duplicate rows, so here is the patch.
We believe there should be no impact to query part if not rollup
In this patch