Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prometheus metric exporter #10412

Merged
merged 45 commits into from
Mar 9, 2021

Conversation

Tiaaa
Copy link
Contributor

@Tiaaa Tiaaa commented Sep 21, 2020

Fixes #8621

Adds a new extension prometheus-emitter to expose Druid metrics for collection directly by a Prometheus server.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

Key changed/added classes in this PR
  • org.apache.druid.emitter.prometheus.*

This is re-open the PR here #8621

@Tiaaa Tiaaa mentioned this pull request Sep 21, 2020
8 tasks
@@ -0,0 +1,128 @@
{
"query/time" : { "dimensions" : ["dataSource", "type"], "type" : "timer", "conversionFactor": 1000.0, "help": "Seconds taken to complete a query."},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these conversion a good idea?

It would mean that these metrics will be slightly different from how they are described in this documentation. https://druid.apache.org/docs/latest/operations/metrics.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://prometheus.io/docs/practices/naming/#base-units
It's coming from prometheus common practice.

cc @michaelschiff

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I guess there are tradeoffs with either choices. Maybe a good way to do is that for those converted, we can just put the unit in the prometheus names. Otherwise, if we refer to the druid metrics doc, we would find the name of the metrics to be documented in a different unit.

"query/failed/count" : { "dimensions" : [], "type" : "count", "help": "Number of failed queries"},
"query/interrupted/count" : { "dimensions" : [], "type" : "count", "help": "Number of queries interrupted due to cancellation or timeout"},

"query/cache/delta/numEntries" : { "dimensions" : [], "type" : "count", "help": "Number of entries in cache"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential bug, deltas can be negative but Prometheus counter accepts only non-negative increment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change to guage. This only happens in since last emission we got more entries evicted than added right.

private final Metrics metrics;
private final PrometheusEmitterConfig config;
private final PrometheusEmitterConfig.Strategy strategy;
private final Pattern pattern = Pattern.compile("[^a-zA-Z0-9_][^a-zA-Z0-9_]*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the pattern in PrometheusEmitterConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two is not the same regex. The one in PromtheusEmitterConfig is for namespace regex that need to start with alphabetic character.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, my bad.

private static final Logger log = new Logger(Metrics.class);
private final Map<String, DimensionsAndCollector> map = new HashMap<>();
private final ObjectMapper mapper = new ObjectMapper();
private final Pattern pattern = Pattern.compile("[^a-zA-Z_:][^a-zA-Z0-9_:]*");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reuse the pattern in PrometheusEmitterConfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used the one in PrometheusEmitter.java

}
}

public Map<String, DimensionsAndCollector> getMap()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can rename the map to registeredMetrics and then we could rename this method to getRegisteredMetrics(), I feel like this will be easier to read

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

}).readValue(is);
}
catch (IOException e) {
throw new ISE(e, "Failed to parse metric dimensions and types");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

}
}

void emitMetric(ServiceMetricEvent metricEvent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

Map<String, DimensionsAndCollector> map = metrics.getMap();
try {
for (DimensionsAndCollector collector : map.values()) {
pushGateway.push(collector.getCollector(), config.getNamespace(), ImmutableMap.of(config.getNamespace(), identifier));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potential NPE? if the configured strategy is not pushgateway, then this pushGateway wouldn't have been instantiated

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also should we use a more meaningful label name for identifier instead of using the config.getNamespace()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add the null check - however flush() for this emitter should only called by close() which strategy check already done.

Copy link
Contributor Author

@Tiaaa Tiaaa Nov 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the identifier label name, any suggestion? The config.namespace will be set in config files for each service. So for example peon task it could be peon=taskXXX as groupingKey.



@Override
public void start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should schedule a task to push updates periodically when the strategy is set to pushgateway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, every 5min sounds reasonable?

Copy link
Contributor

@michaelschiff michaelschiff Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this - I think the scheduled executor may not be necessary. Main reason we've added strategy pushgateway is for things that are potentially too short-lived to be scraped by prometheus (in druid that's really just peon tasks). Things that are living long enough to be pushing every 5 minutes are likely not "task" based, and may be better fit for normal scraping. I lean toward keeping things simple, and pushing once at close seems sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the only metric pushed by peon is "last pushed timestamp", I think it's valid to remove the scheduled task. Removed.


public DimensionsAndCollector getByName(String name, String service)
{
if (map.containsKey(name)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return Optional.ofNullable(map.get(name)).orElse(map.get(service + "_" + name));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the second part to a getOrDefault() for simplification. I don't see the need of changing this function return type from DimensionsAndCollector to Optional<DimensionsAndCollector>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to change the return type, but anyway this is a minor comment, feel free to pick whichever you prefer

@suhassumukh
Copy link

Is there a timeline for this feature?

@michaelschiff
Copy link
Contributor

For what is worth, Ive been running production clusters with this extension for monitoring for over a year. Things are stable but there are a couple open issues around pushgateway collection of Peon task metrics that are the main reasons to delay merging.

@Tiaaa
Copy link
Contributor Author

Tiaaa commented Jan 15, 2021

Verification of metrics on a local mini druid cluster as shown in below table.

Coordinator, historical, broker, router, middle-manager, overlord set up as prometheus monitoring target.
Peon metrics is sent to pushgateway.

Component Metric
Coordinator # TYPE druid_segment_size gauge druid_segment_size{dataSource="wiki_test",} 21679.0 # TYPE druid_segment_loadqueue_count gauge druid_segment_loadqueue_count{server="stats_druid_historical_2_stats_druid_historical_stats_dev_svc_cluster_local_8083",} 0.0 druid_segment_loadqueue_count{server="stats_druid_historical_1_stats_druid_historical_stats_dev_svc_cluster_local_8083",} 0.0 druid_segment_loadqueue_count{server="stats_druid_historical_0_stats_druid_historical_stats_dev_svc_cluster_local_8083",} 1.0
Historical # TYPE druid_query_time histogram druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.1",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.25",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.5",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.75",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="1.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="2.5",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="5.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="7.5",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="10.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="30.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="60.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="120.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="300.0",} 1.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="+Inf",} 1.0 druid_query_time_count{dataSource="wiki_test",type="scan",} 1.0 druid_query_time_sum{dataSource="wiki_test",type="scan",} 0.093
Broker # TYPE druid_query_time histogram druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.1",} 2.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.25",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.5",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="0.75",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="1.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="2.5",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="5.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="7.5",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="10.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="30.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="60.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="120.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="300.0",} 3.0 druid_query_time_bucket{dataSource="wiki_test",type="scan",le="+Inf",} 3.0 druid_query_time_count{dataSource="wiki_test",type="scan",} 3.0 druid_query_time_sum{dataSource="wiki_test",type="scan",} 0.132 # TYPE druid_query_node_time histogram druid_query_node_time_bucket{server="stats_druid_historical_1_stats_druid_historical_stats_dev_svc_cluster_local_8083",le="0.1",} 0.0
Router # TYPE druid_jvm_mem_used gauge druid_jvm_mem_used{memKind="heap",} 4.8370752E7 druid_jvm_mem_used{memKind="nonheap",} 6.4390328E7
Middle-manager # TYPE druid_jvm_pool_used gauge druid_jvm_pool_used{poolKind="heap",poolName="PS_Old_Gen",} 2.8118984E7 druid_jvm_pool_used{poolKind="nonheap",poolName="Code_Cache",} 1.2116608E7 druid_jvm_pool_used{poolKind="heap",poolName="PS_Survivor_Space",} 0.0 druid_jvm_pool_used{poolKind="nonheap",poolName="Metaspace",} 4.5766376E7 druid_jvm_pool_used{poolKind="heap",poolName="PS_Eden_Space",} 2.5587941728E10
Overlord # TYPE druid_task_run_time histogram druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="0.1",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="0.25",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="0.5",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="0.75",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="1.0",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="2.5",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="5.0",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="7.5",} 0.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="10.0",} 1.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="30.0",} 1.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="60.0",} 1.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="120.0",} 1.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="300.0",} 1.0 druid_task_run_time_bucket{dataSource="wiki_test",taskType="index_parallel",le="+Inf",} 1.0 druid_task_run_time_count{dataSource="wiki_test",taskType="index_parallel",} 1.0 druid_task_run_time_sum{dataSource="wiki_test",taskType="index_parallel",} 9.553
Peon push_time_seconds{druid="index_parallel_wiki_test_2021-01-10T23:42:35.061Z",instance="",job="druid"} 1.6103221647556024e+09

@michaelschiff
Copy link
Contributor

michaelschiff commented Jan 15, 2021

@Tiaaa awesome! I see peon metrics in pushgateway as well.
Screen Shot 2021-01-14 at 8 42 38 PM

One thing - the label we're using right now is the task ID. I think this it going to be too high cardinality for Prometheus

@clintropolis
Copy link
Member

hmm, it looks like the commits have become sort of messed up for this PR one way or another and github is showing a lot of unrelated commits. @Tiaaa any chance you can try clean this up to only show the changes of this PR to make it easier to review?

@Tiaaa Tiaaa force-pushed the feature/prometheus-metric-exporter branch from 4b0b414 to 3a7a2b6 Compare January 16, 2021 02:28
@stroeovidiu
Copy link

Any update on this ? Looking forward having it.

@michaelschiff
Copy link
Contributor

@clintropolis it looks like this is the last failing build step: https://travis-ci.com/github/apache/druid/jobs/483326128 - seems unrelated to the new emitter. Are we good to merge?

@clintropolis
Copy link
Member

@clintropolis it looks like this is the last failing build step: https://travis-ci.com/github/apache/druid/jobs/483326128 - seems unrelated to the new emitter. Are we good to merge?

Sorry for the delay, I will have a look as soon as I'm able and see if we can get this merged 👍

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, but overall lgtm 👍 I don't know prometheus very well, but I think the mappings of metrics looks reasonable.

Thanks for your patience and persistence!

docs/development/extensions-contrib/prometheus.md Outdated Show resolved Hide resolved
docs/operations/metrics.md Outdated Show resolved Hide resolved
docs/development/extensions-contrib/prometheus.md Outdated Show resolved Hide resolved
@@ -1223,7 +1228,7 @@ SysMonitor
TaskCountStatsMonitor
TaskSlotCountStatsMonitor
bufferCapacity
bufferpoolName
bufferPoolName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, missed one, it's causing CI to fail

Copy link
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@michaelschiff
Copy link
Contributor

@clintropolis anything left we need to do before merge?

@stroeovidiu
Copy link

Will this be available în the next 0.21.0 release?

Thank you

@clintropolis
Copy link
Member

@clintropolis anything left we need to do before merge?

Oops no, sorry got distracted and hadn't got back to this yet.

Will this be available în the next 0.21.0 release?

Unfortunately we have already cut the branch for 0.21.0, after which we only merge bug fixes, so this will go out in the release after that. 0.21.0 has been a bit delayed, so it shouldn't be too much longer before we begin the next release as well.

@clintropolis clintropolis merged commit a57c28e into apache:master Mar 9, 2021
@clintropolis clintropolis added this to the 0.22.0 milestone Aug 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants