Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add query metrics for broker parallel merges, off by default #8981

Merged
merged 4 commits into from
Dec 6, 2019

Conversation

clintropolis
Copy link
Member

@clintropolis clintropolis commented Dec 3, 2019

Description

This PR is a follow-up to #8578, adding a handful of query metrics that I believe are interesting, but taking the conservative approach in that all of them are off by default, meaning a custom extension implementing QueryMetrics is necessary to actually emit them. ParallelMergeCombiningSequence is in druid-core where as QueryMetrics and friends are in druid-processing, so this is done mechanically via a Consumer<ParallelMergeCombiningSequence.MergeCombineMetrics> that is supplied to to the sequence, where ParallelMergeCombiningSequence.MergeCombineMetrics is the type that all of the metrics from the fork join tasks are accumulated. This allows the consumer, CachingClusteredClient in our case, to define how to report the metrics.

New QueryMetrics metrics methods:

  • reportParallelMergeParallelism - Reports number of parallel tasks the broker used to process the query during parallel merge
  • reportParallelMergeInputSequences - Reports total number of input sequences processed by the broker during parallel merge
  • reportParallelMergeInputRows - Reports total number of input rows processed by the broker during parallel merge
  • reportParallelMergeOutputRows - Reports broker total number of output rows after merging and combining input sequences
  • reportParallelMergeTaskCount - Reports broker total number of fork join pool tasks required to complete query
  • reportParallelMergeTotalCpuTime - Reports broker total CPU time in nanoseconds where fork join merge combine tasks were doing work

Additionally, since parallelism will always be a fixed range of values between 1 and the number of cores, it can also be added as a dimension instead through QueryMetrics.parallelMergeParallelism.

I did not document these metrics because they are off by default, and don't want to give operators false hope before they get in deep and realize they need to make a custom extension or whatever. This omission is in favor of someday refining this process so that maybe we bundle several different 'profiles' to enabling emitting a set of metrics based on the profile, or some other system that allows operators to customize without a custom extension.

Also absent are any sort of aggregate metrics, such as pool utilization over some periodic collection interval or whatever. I would like to look into this as a follow-up, since it seems like there are potentially many metrics that would maybe make sense to collect like this, so I'd like to think a bit harder about how to do this so it's not a one-off solution.

This PR also modifies the parallel merge config to disable using the fork join pool if the computed level of pool parallelism isn't more than 2, since you need at least 3 tasks to do the 2 layer merge in parallel.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
Key changed/added classes in this PR
  • ParallelMergeCombiningSequence
  • QueryMetrics
  • DruidProcessingConfig

@jnaous
Copy link
Contributor

jnaous commented Dec 3, 2019

I'm surprised that off by default means the user has to write code to enable them vs using a config option of some sort. Is that latter option something that's not possible?

@clintropolis
Copy link
Member Author

I'm surprised that off by default means the user has to write code to enable them vs using a config option of some sort. Is that latter option something that's not possible?

Unfortunately, at the moment code is the only way to control the query metrics that are emitted. Related discussion: #6559

@@ -247,6 +263,7 @@ public void cleanup(Iterator<T> iterFromMake)
private final long targetTimeNanos;
private final boolean hasTimeout;
private final long timeoutAt;
private final MergeCombineMetricsAccumlator metricsAccumlator;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metricsAccumlator -> metricsAccumulator

{
long numInputRows = 0;
long cpuTimeNanos = 0;
// 1 partition task, 1 layer 2 prepare merge inputs task, 1 layer 1 prepare merge inputs task for each partition
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggest "one layer 2 prepare merge inputs task" or "1 layer two prepare merge inputs task" and similar, I was confused initially with all the numbers

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heh, will fix

outputQueue.offer(ResultBatch.TERMINAL);
} else {
// if priority queue is empty, push the final accumulated value into the output batch and push it out
outputBatch.add(currentCombinedValue);
metricsAccumulator.incrementOutputRows(batchCounter + 1L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 1L here for the terminal value? Does that need to be counted towards the output rows, since it's not really a row?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is for the straggling currentCombinedValue that is being added to the batch the line before, which comes from here if the inputs were exhausted: https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java#L537

Copy link
Contributor

@jon-wei jon-wei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -62,6 +62,9 @@ public OutType get()
catch (Exception e) {
t.addSuppressed(e);
}
if (t instanceof RuntimeException) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what prompted these changes ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the 'withBaggage', https://github.com/apache/incubator-druid/pull/8981/files#diff-84792f9d3cefe47cbb471669dce2a276R145, caused all of the RuntimeException being thrown to be wrapped in an additional RuntimeException which didn't seem very useful. Otherwise, I would have needed to change the expected cause of the expected exception in the tests to expect RuntimeException instead of TimeoutException.

Copy link
Contributor

@himanshug himanshug Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, in the surrounding code, Throwables.propagateIfPossible(t); is used to get that behavior instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, that's a good point, I just threw these in here to fix my tests. Looking closer I will have a pass to clean this up a bit.

@himanshug
Copy link
Contributor

LGTM overall

non blocking commentary....

Also absent are any sort of aggregate metrics, such as pool utilization over some periodic collection interval or whatever. I would like to look into this as a follow-up, since it seems like there are potentially many metrics that would maybe make sense to collect like this, so I'd like to think a bit harder about how to do this so it's not a one-off solution.

actually periodic snapshot of FJP state would be nice to see to identify potential bugs e.g. number of running threads , total number of threads, queued work etc.

it sucks that to enable these metrics, I would have to write an extension. I haven't read through the thread in #6559 yet but will hopefully check that out sometime this week.

@clintropolis
Copy link
Member Author

actually periodic snapshot of FJP state would be nice to see to identify potential bugs e.g. number of running threads , total number of threads, queued work etc.

I definitely agree here, I've been looking at the recently added Dropwizard emitter for inspiration, to see if it would maybe make sense to have something based on Dropwizard (or something like that) available in core to allow re-use for any sorts of periodic/rate driven metrics we might want to collect. Though rather than the Dropwizard emitter, instead a common core piece would collect these metrics and periodically emit the snapshot of their current values to whatever the actual emitters are. I plan to look into this deeper sometime after this PR.

it sucks that to enable these metrics, I would have to write an extension. I haven't read through the thread in #6559 yet but will hopefully check that out sometime this week.

I agree here as well; I think maybe having some sorts of common profile implementations of QueryMetrics out of the box like 'default', 'none', 'all', 'query-tuning', etc, might be the easiest approach for now, though it's probably worth opening a discussion again to see if we can come up with any better ideas.

@himanshug
Copy link
Contributor

to see if it would maybe make sense to have something based on Dropwizard (or something like that) available in core to allow re-use for any sorts of periodic/rate driven metrics we might want to collect.

for periodic collection we already have the Monitor infra in the core e.g. things like JvmThreadsMonitor . You could pull in dropwizard dependency in core to have histogram metric aggregate if there is any use case for that ... gauge, counter etc are simple anyway.
or, maybe I misunderstood what you wanted to say :)

@clintropolis
Copy link
Member Author

for periodic collection we already have the Monitor infra in the core e.g. things like JvmThreadsMonitor . You could pull in dropwizard dependency in core to have histogram metric aggregate if there is any use case for that ... gauge, counter etc are simple anyway.
or, maybe I misunderstood what you wanted to say :)

That's pretty much what I had in mind 👍

@himanshug
Copy link
Contributor

@clintropolis not sure if you plan to add the periodic metrics in this PR or later, current changes LGTM , so feel free to proceed whichever way you need.

@clintropolis
Copy link
Member Author

@clintropolis not sure if you plan to add the periodic metrics in this PR or later, current changes LGTM , so feel free to proceed whichever way you need.

Thanks, I'm going to merge this and do in a follow-up 👍

@clintropolis clintropolis merged commit 06cd304 into apache:master Dec 6, 2019
@clintropolis clintropolis deleted the parallel-merge-metrics branch December 6, 2019 21:42
@jon-wei jon-wei added this to the 0.17.0 milestone Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants