add query metrics for broker parallel merges, off by default #8981

clintropolis · 2019-12-03T08:31:29Z

Description

This PR is a follow-up to #8578, adding a handful of query metrics that I believe are interesting, but taking the conservative approach in that all of them are off by default, meaning a custom extension implementing QueryMetrics is necessary to actually emit them. ParallelMergeCombiningSequence is in druid-core where as QueryMetrics and friends are in druid-processing, so this is done mechanically via a Consumer<ParallelMergeCombiningSequence.MergeCombineMetrics> that is supplied to to the sequence, where ParallelMergeCombiningSequence.MergeCombineMetrics is the type that all of the metrics from the fork join tasks are accumulated. This allows the consumer, CachingClusteredClient in our case, to define how to report the metrics.

New QueryMetrics metrics methods:

reportParallelMergeParallelism - Reports number of parallel tasks the broker used to process the query during parallel merge
reportParallelMergeInputSequences - Reports total number of input sequences processed by the broker during parallel merge
reportParallelMergeInputRows - Reports total number of input rows processed by the broker during parallel merge
reportParallelMergeOutputRows - Reports broker total number of output rows after merging and combining input sequences
reportParallelMergeTaskCount - Reports broker total number of fork join pool tasks required to complete query
reportParallelMergeTotalCpuTime - Reports broker total CPU time in nanoseconds where fork join merge combine tasks were doing work

Additionally, since parallelism will always be a fixed range of values between 1 and the number of cores, it can also be added as a dimension instead through QueryMetrics.parallelMergeParallelism.

I did not document these metrics because they are off by default, and don't want to give operators false hope before they get in deep and realize they need to make a custom extension or whatever. This omission is in favor of someday refining this process so that maybe we bundle several different 'profiles' to enabling emitting a set of metrics based on the profile, or some other system that allows operators to customize without a custom extension.

Also absent are any sort of aggregate metrics, such as pool utilization over some periodic collection interval or whatever. I would like to look into this as a follow-up, since it seems like there are potentially many metrics that would maybe make sense to collect like this, so I'd like to think a bit harder about how to do this so it's not a one-off solution.

This PR also modifies the parallel merge config to disable using the fork join pool if the computed level of pool parallelism isn't more than 2, since you need at least 3 tasks to do the 2 layer merge in parallel.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.

Key changed/added classes in this PR

ParallelMergeCombiningSequence
QueryMetrics
DruidProcessingConfig

…d tests

jnaous · 2019-12-03T15:46:41Z

I'm surprised that off by default means the user has to write code to enable them vs using a config option of some sort. Is that latter option something that's not possible?

clintropolis · 2019-12-03T20:56:16Z

I'm surprised that off by default means the user has to write code to enable them vs using a config option of some sort. Is that latter option something that's not possible?

Unfortunately, at the moment code is the only way to control the query metrics that are emitted. Related discussion: #6559

jon-wei · 2019-12-03T22:18:29Z

core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java

@@ -247,6 +263,7 @@ public void cleanup(Iterator<T> iterFromMake)
    private final long targetTimeNanos;
    private final boolean hasTimeout;
    private final long timeoutAt;
+    private final MergeCombineMetricsAccumlator metricsAccumlator;


metricsAccumlator -> metricsAccumulator

jon-wei · 2019-12-04T00:17:11Z

core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java

+    {
+      long numInputRows = 0;
+      long cpuTimeNanos = 0;
+      // 1 partition task, 1 layer 2 prepare merge inputs task, 1 layer 1 prepare merge inputs task for each partition


nit: suggest "one layer 2 prepare merge inputs task" or "1 layer two prepare merge inputs task" and similar, I was confused initially with all the numbers

heh, will fix

jon-wei · 2019-12-04T00:23:05Z

core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java

          outputQueue.offer(ResultBatch.TERMINAL);
        } else {
          // if priority queue is empty, push the final accumulated value into the output batch and push it out
          outputBatch.add(currentCombinedValue);
+          metricsAccumulator.incrementOutputRows(batchCounter + 1L);


Is the 1L here for the terminal value? Does that need to be counted towards the output rows, since it's not really a row?

No, it is for the straggling currentCombinedValue that is being added to the batch the line before, which comes from here if the inputs were exhausted: https://github.com/apache/incubator-druid/blob/master/core/src/main/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequence.java#L537

jon-wei

LGTM

himanshug · 2019-12-04T06:54:07Z

core/src/main/java/org/apache/druid/java/util/common/guava/WrappingYielder.java

@@ -62,6 +62,9 @@ public OutType get()
      catch (Exception e) {
        t.addSuppressed(e);
      }
+      if (t instanceof RuntimeException) {


what prompted these changes ?

Ah, the 'withBaggage', https://github.com/apache/incubator-druid/pull/8981/files#diff-84792f9d3cefe47cbb471669dce2a276R145, caused all of the RuntimeException being thrown to be wrapped in an additional RuntimeException which didn't seem very useful. Otherwise, I would have needed to change the expected cause of the expected exception in the tests to expect RuntimeException instead of TimeoutException.

I see, in the surrounding code, Throwables.propagateIfPossible(t); is used to get that behavior instead.

Oops, that's a good point, I just threw these in here to fix my tests. Looking closer I will have a pass to clean this up a bit.

himanshug · 2019-12-04T07:01:32Z

LGTM overall

non blocking commentary....

Also absent are any sort of aggregate metrics, such as pool utilization over some periodic collection interval or whatever. I would like to look into this as a follow-up, since it seems like there are potentially many metrics that would maybe make sense to collect like this, so I'd like to think a bit harder about how to do this so it's not a one-off solution.

actually periodic snapshot of FJP state would be nice to see to identify potential bugs e.g. number of running threads , total number of threads, queued work etc.

it sucks that to enable these metrics, I would have to write an extension. I haven't read through the thread in #6559 yet but will hopefully check that out sometime this week.

clintropolis · 2019-12-04T07:08:52Z

actually periodic snapshot of FJP state would be nice to see to identify potential bugs e.g. number of running threads , total number of threads, queued work etc.

I definitely agree here, I've been looking at the recently added Dropwizard emitter for inspiration, to see if it would maybe make sense to have something based on Dropwizard (or something like that) available in core to allow re-use for any sorts of periodic/rate driven metrics we might want to collect. Though rather than the Dropwizard emitter, instead a common core piece would collect these metrics and periodically emit the snapshot of their current values to whatever the actual emitters are. I plan to look into this deeper sometime after this PR.

it sucks that to enable these metrics, I would have to write an extension. I haven't read through the thread in #6559 yet but will hopefully check that out sometime this week.

I agree here as well; I think maybe having some sorts of common profile implementations of QueryMetrics out of the box like 'default', 'none', 'all', 'query-tuning', etc, might be the easiest approach for now, though it's probably worth opening a discussion again to see if we can come up with any better ideas.

himanshug · 2019-12-04T18:21:00Z

to see if it would maybe make sense to have something based on Dropwizard (or something like that) available in core to allow re-use for any sorts of periodic/rate driven metrics we might want to collect.

for periodic collection we already have the Monitor infra in the core e.g. things like JvmThreadsMonitor . You could pull in dropwizard dependency in core to have histogram metric aggregate if there is any use case for that ... gauge, counter etc are simple anyway.
or, maybe I misunderstood what you wanted to say :)

clintropolis · 2019-12-05T21:21:55Z

for periodic collection we already have the Monitor infra in the core e.g. things like JvmThreadsMonitor . You could pull in dropwizard dependency in core to have histogram metric aggregate if there is any use case for that ... gauge, counter etc are simple anyway.
or, maybe I misunderstood what you wanted to say :)

That's pretty much what I had in mind 👍

himanshug · 2019-12-06T21:16:02Z

@clintropolis not sure if you plan to add the periodic metrics in this PR or later, current changes LGTM , so feel free to proceed whichever way you need.

clintropolis · 2019-12-06T21:42:24Z

@clintropolis not sure if you plan to add the periodic metrics in this PR or later, current changes LGTM , so feel free to proceed whichever way you need.

Thanks, I'm going to merge this and do in a follow-up 👍

add a bunch of metrics for broker parallel merges, off by default, an…

9d4d5a8

…d tests

clintropolis added Area - Querying Area - Metrics/Event Emitting labels Dec 3, 2019

fix tests

bfdeb09

jon-wei reviewed Dec 4, 2019

View reviewed changes

review stuffs

7981ade

jon-wei approved these changes Dec 4, 2019

View reviewed changes

himanshug reviewed Dec 4, 2019

View reviewed changes

propogateIfPossible

8f353e3

clintropolis merged commit 06cd304 into apache:master Dec 6, 2019

clintropolis deleted the parallel-merge-metrics branch December 6, 2019 21:42

jon-wei added this to the 0.17.0 milestone Dec 17, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add query metrics for broker parallel merges, off by default #8981

add query metrics for broker parallel merges, off by default #8981

clintropolis commented Dec 3, 2019 •

edited

Loading

jnaous commented Dec 3, 2019

clintropolis commented Dec 3, 2019

jon-wei Dec 3, 2019

jon-wei Dec 4, 2019

clintropolis Dec 4, 2019

jon-wei Dec 4, 2019

clintropolis Dec 4, 2019

jon-wei left a comment

himanshug Dec 4, 2019

clintropolis Dec 4, 2019

himanshug Dec 4, 2019 •

edited

Loading

clintropolis Dec 5, 2019

himanshug commented Dec 4, 2019

clintropolis commented Dec 4, 2019

himanshug commented Dec 4, 2019

clintropolis commented Dec 5, 2019

himanshug commented Dec 6, 2019

clintropolis commented Dec 6, 2019

add query metrics for broker parallel merges, off by default #8981

add query metrics for broker parallel merges, off by default #8981

Conversation

clintropolis commented Dec 3, 2019 • edited Loading

Description

Key changed/added classes in this PR

jnaous commented Dec 3, 2019

clintropolis commented Dec 3, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jon-wei left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

himanshug Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

himanshug commented Dec 4, 2019

clintropolis commented Dec 4, 2019

himanshug commented Dec 4, 2019

clintropolis commented Dec 5, 2019

himanshug commented Dec 6, 2019

clintropolis commented Dec 6, 2019

clintropolis commented Dec 3, 2019 •

edited

Loading

himanshug Dec 4, 2019 •

edited

Loading