Skip to content

interval chunk query runner now processes individual chunk in a threadpool#1150

Merged
fjy merged 1 commit intoapache:masterfrom
himanshug:broker-parallel-chunk-process
Mar 2, 2015
Merged

interval chunk query runner now processes individual chunk in a threadpool#1150
fjy merged 1 commit intoapache:masterfrom
himanshug:broker-parallel-chunk-process

Conversation

@himanshug
Copy link
Contributor

this patch enables

  • interval chunking query processor to process individual chunks in parallel inside the "Processor" executor service.
  • addition of "chunkPeriod" to query context
  • removal of "druid.query.chunkPeriod" and "druid.query.<query-type>.chunkPeriod" configuration as this should really be tuned per query [interval] basis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check the broker docs for druid.processing.numThreads? I think they will need to be updated as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document how the chunkPeriod context parameter interacts with the existing druid.query.chunkPeriod and druid.query.<queryType>.chunkPeriod configuration parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually it replaces
druid.query.chunkPeriod, druid.query.<queryType>.chunkPeriod

they are not valid after this pull request. in my experience we found the chunking behavior really needs to be tuned per query [ sometimes based on size of its interval] .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, can we remove the old configs from docs and code as well, instead of keeping unused config around.

@himanshug
Copy link
Contributor Author

@fjy updated as per the review comments.

@fjy
Copy link
Contributor

fjy commented Feb 26, 2015

It also looks like we are ignoring the query config chunk Period, which is changing behavior. We should document that in the PR description and in the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a repository for "context":{.....} reserved words?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I created it as those strings were slowly getting hard coded in different places.
this pull request does not have the refactoring done for old code though.

@drcrallen
Copy link
Contributor

Why do we need another means of adding a decorator to the tool chest as part of the constructor? Did the needed manipulators not fit in the pre/post merge decorators for some reason?

@drcrallen
Copy link
Contributor

Or a better thing to say is that most of the other runner decorators are expressed as a titled method to make them a little more descriptive and a little more generic-use. Is there a particular reason this decorator cannot follow that workflow?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment something along the lines of "Static strings in this class should be considered 'reserved words' for the context of a query."

We'll slowly move over the other reserved words to this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but, class name in this case is conveying that intent and comment really seems redundant.

@himanshug
Copy link
Contributor Author

Why do we need another means of adding a decorator to the tool chest as part of the constructor? 
Did the needed manipulators not fit in the pre/post merge decorators for some reason?

@drcrallen query toolchest needs to instantiate IntervalChunkingQueryRunner which needs ExecutorService, QueryWatcher, ServiceEmitter . One way was that I could add 3 of those in the constructors of each query toolchest and have them instantiate IntervalChunkingQueryRunner directly. IntervalChunkingQueryRunnerDecorator allowed me to do that more cleanly by just adding one thing in query toolchest constructors(also, say, one more thing gets added to IntervalChunkingQueryRunner constructor, then only decorator needs to be updated instead of having to modify all the toolchest constructors)

Or a better thing to say is that most of the other runner decorators are expressed as a titled method to make them a little more descriptive and a little more generic-use. Is there a particular reason this decorator cannot follow that workflow?

Can you further explain the alternative you're suggesting?

@nishantmonu51
Copy link
Member

IIRC, the purpose of interval chunking was to handle memory pressure on bards when running queries for very long intervals. I wonder how processing the chunks parallely affect the memory usage on bards.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be toolchest.makeMetricBuilder(input) instead ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can you check that the query/time emitted contains the correct chunked interval instead of full query interval

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thatz a good catch, I haven't checked it yet, but looks like it should indeed be toolChest.makeMetricBuilder(input).

@drcrallen
Copy link
Contributor

@himanshug : The alternative would be to have another method for a decorator that handles whatever part of the query execution chain the InteralChunkingQueryRunner would need to be inserted. It just feels weird to me to have a very specific query runner instance that gets passed instead of a method for properly handling that logical step of the query. As an alternative, for example, there could be a "queryPlanningRunnerDecorator" method (similar to pre or post merger decorators) or something that properly describes the step the IntervalChunkingQueryRunner does. I'm not convinced that's the way to go by any means.

In general, if possible I'd rather have InteralChunkingQueryRunner be one option for a particular logical query pipeline step, and have the hooks into the toolchest help ensure that query runner decorator gets applied at the proper step in the query pipeline.

@drcrallen
Copy link
Contributor

@nishantmonu51 also has a very good point. Can you please get some JVM / performance stats on this patch for inclusion in this PR?

@fjy fjy added the Feature label Feb 27, 2015
@fjy fjy added this to the 0.7.1 milestone Feb 27, 2015
@himanshug
Copy link
Contributor Author

@drcrallen @nishantmonu51
we tested this patch on a group-by query with about 8 months of our production data and were able to reduce the reponse time from ~29 secs to ~11 secs in the best case. Please see how response time varied with the chunk period. (we had enough threads in processor executor of course to process all chunks in parallel)
We mostly have nested group-by queries and this does not change the memory usage bcoz result-set of internal query is kept in memory to process outer query whether you use chunking or not. Also, in general I did not see noticeable difference (though I admit, at the time, I did not really collect many numbers around memory usage but mostly around response times).

Chunk Period Response Time
P0D 0m29.134s
P1D 0m16.367s
P2D 0m11.060s
P3D 0m12.788s
P4D 0m13.292s
P5D 0m11.064s
P5D 0m12.218s
P10D 0m11.832s
P15D 0m12.459s
P20D 0m11.478s
P25D 0m12.295s
P30D 0m12.156s
P35D 0m11.939s
P40D 0m12.194s
P45D 0m12.802s
P50D 0m12.573s
P55D 0m13.395s
P60D 0m14.474s
P65D 0m13.374s
P70D 0m15.168s
P75D 0m14.285s
P80D 0m15.309s
P85D 0m14.950s
P90D 0m15.475s
P95D 0m15.440s
P100D 0m17.679s
P105D 0m15.611s
P110D 0m16.409s
P115D 0m19.854s
P120D 0m16.849s
P125D 0m16.818s
P130D 0m17.176s
P135D 0m17.979s
P140D 0m17.399s
P145D 0m17.024s
P150D 0m19.705s
P155D 0m17.996s
P160D 0m17.007s
P165D 0m17.982s
P170D 0m17.555s
P175D 0m19.196s
P180D 0m20.846s

Also, to tell you the truth, it seems that interval chunking query runner never really did any chunking :P
If you look at the code closely at
https://github.com/druid-io/druid/blob/6e315ddcd2eaff70ccda3786ede9a6d36394a15f/processing/src/main/java/io/druid/query/IntervalChunkingQueryRunner.java#L51
if (period.getMillis() == 0) {
return baseRunner.run(query, responseContext);
}

"period.getMillis()" will pretty much always be 0 for e.g. P1D, P1M etc. it should've been "period.toStandardDuration().getMillis()".

@fjy
Copy link
Contributor

fjy commented Mar 2, 2015

@himanshug Can you squash the commits and I will merge

…d pool and prints metrics query/time per chunk
@himanshug himanshug force-pushed the broker-parallel-chunk-process branch from 4e16e80 to 29039fd Compare March 2, 2015 21:45
@himanshug
Copy link
Contributor Author

@fjy squashing done.

fjy added a commit that referenced this pull request Mar 2, 2015
interval chunk query runner now processes individual chunk in a threadpool
@fjy fjy merged commit e8605c6 into apache:master Mar 2, 2015
@himanshug himanshug deleted the broker-parallel-chunk-process branch March 2, 2015 21:58
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@himanshug any reason this got removed? It looks like we lost a bunch of metrics when upgrading to 0.7.1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xvrl sorry to keep you waiting, i have been away.
anyways, it used to report pretty much same metric as the request/time from QueryResource. Instead we moved it to IntervalChunkingQueryRunner which now reports query/time for each chunk (if chunking is used).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the problem is that IntervalChunkingQueryRunner is not always used by default, so we lost those metrics when upgrading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

losing that metric should be ok (when no chunking) as numbers reported in query/time and request/time were pretty much the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we are not losing too much information, but for systems that rely on existing metrics to be present that can be an issue. Either we should notify users of backwards incompatible changes, or try to maintain backwards compatibility whenever possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think notifying the users makes sense, release-notes for 0.7.1 would be the best place. Is there a way for me to update those?

I updated the unofficial metrics doc though https://docs.google.com/spreadsheets/d/15XxGrGv2ggdt4SnCoIsckqUNJBZ9ludyhC9hNQa3l-M/edit#gid=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants