Metrics support by lulf · Pull Request #159 · apache/qpid-dispatch

lulf · 2017-04-06T09:40:26Z

@ganeshmurthy @ted-ross This is a proposal of metrics in the router and expose them to a prometheus agent.

This work would allow anyone running the router on openshift to easily integrate it with hawkular or any other store that supports reading prometheus endpoints.

The PR consists of:

An API for a metric manager, that keeps track of all metrics in the router, and that is able to export them (see metric_manager.h and metric_manager.c)
An API for a metric, used to increment/set values for a particular metric, with labels (see metric.h, metric_private.h, metric.c)
Integration with the http-libwebsockets for exposing the metrics in prometheus format on http://host:httpport/metrics (http-libwebsockets.c)
An example of a metric in router_core/connections.c when opening a connection. (router_core/router_core.c, router_core/connections.c)

The libraries are written with the idea that incrementing a metric should be as cheap as possible (though there are several optimizations to be made for efficiently looking up labels). A metric is registered in the metric manager, which will store it in its internal database. A handle to the metric (num_connections for instance) is stored for efficient metric reporting. The handle is used whenever the metric should be incremented, decremented or set.

The API also contains a set of macros that makes passing labels less verbose.

* Add num_connections metric as initial metric for testing

ChugR · 2017-04-06T19:49:15Z

This patch adds a competing statistic for one that already exists. Going forward it would require a completely redundant set and every statistic would be counted twice. That's not desirable for the long run.

Another approach might be to hook up the http-libwebsockets to the existing statistics and then to export them in the desired format. The existing management database has configuration data and runtime statistics together. See https://github.com/apache/qpid-dispatch/blob/master/python/qpid_dispatch/management/qdrouter.json.readme.txt

This would be more work to hook up the the metric_manager as the code would have to hunt for all the values where they exist today. But in the end the resulting reports would have identical descriptions and values reported by qdstat and the other management tools.

lulf · 2017-04-07T06:49:40Z

Thank you for your comments. My example metric was perhaps too simplicic, apologies for that. Here are more examples of the metrics I'd like to be able to provide:

Counting the number of connections made (tagged with container id) in total, even if they are not active now
The number of failed connection attempts, link attaches etc.
Internal router errors of different types (could be non-amqp specific)
Latency metrics (time spent in core thread, in I/O thread etc), possibly as a histogram (not supported in this example API, but part of the prometheus metric spec: https://prometheus.io/docs/practices/histograms/)
Ability to report quantiles and histograms

Is this information that I can retrieve in the same way as qdstat or specify in the management.json? My impression was that qdstat only allowed for 'current state' type of metrics. If we can represent the above using management, I agree it could be better to use that.

However, there is also the question of the impact of collecting metrics (tradeoff between quering for the metrics vs. the cost of reporting the metrcs). Reporting a metric explicitly may scale better, since a counter can be incremented and queried without locking in the common case. Correct me if I'm wrong, but my impression was that management operations either has to run in the core thread or do some locking?

Would an alternative here be to have management/qdstat retrieve the metrics from the metrics manager? This would resolve the concerns of reporting metrics twice, and could also potentially enrich the qdstat output with values from histograms etc.

For reference, here is the document describing the format: https://prometheus.io/docs/instrumenting/exposition_formats/

ted-ross · 2017-04-07T12:25:03Z

I share Chuck's concern that this PR is introducing an alternative and redundant management access mechanism. I would certainly prefer to add the desired metrics to the AMQP-based access mechanism that is already in place.
One of the advantages of the AMQP management mechanism is that it is routable. With a single connection to a router in a network, a process can query data from all of the routers in the network (and eventually brokers as well). Using the HTTP access point only allows access to a single router.
Is there a way to bridge the gap between the AMQP management protocol and Prometheus?

ted-ross · 2017-04-07T12:39:06Z

At the risk of contradicting my previous post, I would add that having an HTTP front end on management data might be a worthwhile enhancement for some use cases as long as it used the same data sources as the AMQP front end. Mind would have to be paid to the security model for such HTTP accesses to management data.

Ulf Lilleengen added 12 commits March 31, 2017 13:30

Add data structures for tracking metrics

03e0ebb

* Add num_connections metric as initial metric for testing

Add support for labels in the interface

2e3b688

Support reporting with labels

e7659af

Pass some labels

0bbe036

Add metric manager for keeping track of metrics and exporting

37a5e0b

Expose metrics in http

0cede49

Free resources at shutdown

39cdaed

Initialize all memory

b259e24

Add dec with label

c847e9b

One metric per line

93c4d70

Print help and type

c68658f

Advance iterator

963a6d3

Ulf Lilleengen added 5 commits April 7, 2017 16:27

Remove metric APIs

2d8a217

Remove more traces of metric manager

3ad1ebb

Add support for per-query response handler

77f49b3

Initial attempt and exporting metrics using management

ba5a520

Initial pass responding with data after querying management

5606d70

lulf closed this Apr 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Metrics support#159

Metrics support#159
lulf wants to merge 17 commits into
apache:masterfrom
lulf:metrics

lulf commented Apr 6, 2017

Uh oh!

ChugR commented Apr 6, 2017

Uh oh!

lulf commented Apr 7, 2017

Uh oh!

ted-ross commented Apr 7, 2017

Uh oh!

ted-ross commented Apr 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

lulf commented Apr 6, 2017

Uh oh!

ChugR commented Apr 6, 2017

Uh oh!

lulf commented Apr 7, 2017

Uh oh!

ted-ross commented Apr 7, 2017

Uh oh!

ted-ross commented Apr 7, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants