Skip to content
This repository was archived by the owner on Apr 15, 2026. It is now read-only.

Metrics support#159

Closed
lulf wants to merge 17 commits into
apache:masterfrom
lulf:metrics
Closed

Metrics support#159
lulf wants to merge 17 commits into
apache:masterfrom
lulf:metrics

Conversation

@lulf

@lulf lulf commented Apr 6, 2017

Copy link
Copy Markdown

@ganeshmurthy @ted-ross This is a proposal of metrics in the router and expose them to a prometheus agent.

This work would allow anyone running the router on openshift to easily integrate it with hawkular or any other store that supports reading prometheus endpoints.

The PR consists of:

  • An API for a metric manager, that keeps track of all metrics in the router, and that is able to export them (see metric_manager.h and metric_manager.c)
  • An API for a metric, used to increment/set values for a particular metric, with labels (see metric.h, metric_private.h, metric.c)
  • Integration with the http-libwebsockets for exposing the metrics in prometheus format on http://host:httpport/metrics (http-libwebsockets.c)
  • An example of a metric in router_core/connections.c when opening a connection. (router_core/router_core.c, router_core/connections.c)

The libraries are written with the idea that incrementing a metric should be as cheap as possible (though there are several optimizations to be made for efficiently looking up labels). A metric is registered in the metric manager, which will store it in its internal database. A handle to the metric (num_connections for instance) is stored for efficient metric reporting. The handle is used whenever the metric should be incremented, decremented or set.

The API also contains a set of macros that makes passing labels less verbose.

@ChugR

ChugR commented Apr 6, 2017

Copy link
Copy Markdown
Contributor

This patch adds a competing statistic for one that already exists. Going forward it would require a completely redundant set and every statistic would be counted twice. That's not desirable for the long run.

Another approach might be to hook up the http-libwebsockets to the existing statistics and then to export them in the desired format. The existing management database has configuration data and runtime statistics together. See https://github.com/apache/qpid-dispatch/blob/master/python/qpid_dispatch/management/qdrouter.json.readme.txt

This would be more work to hook up the the metric_manager as the code would have to hunt for all the values where they exist today. But in the end the resulting reports would have identical descriptions and values reported by qdstat and the other management tools.

@lulf

lulf commented Apr 7, 2017

Copy link
Copy Markdown
Author

Thank you for your comments. My example metric was perhaps too simplicic, apologies for that. Here are more examples of the metrics I'd like to be able to provide:

  • Counting the number of connections made (tagged with container id) in total, even if they are not active now
  • The number of failed connection attempts, link attaches etc.
  • Internal router errors of different types (could be non-amqp specific)
  • Latency metrics (time spent in core thread, in I/O thread etc), possibly as a histogram (not supported in this example API, but part of the prometheus metric spec: https://prometheus.io/docs/practices/histograms/)
  • Ability to report quantiles and histograms

Is this information that I can retrieve in the same way as qdstat or specify in the management.json? My impression was that qdstat only allowed for 'current state' type of metrics. If we can represent the above using management, I agree it could be better to use that.

However, there is also the question of the impact of collecting metrics (tradeoff between quering for the metrics vs. the cost of reporting the metrcs). Reporting a metric explicitly may scale better, since a counter can be incremented and queried without locking in the common case. Correct me if I'm wrong, but my impression was that management operations either has to run in the core thread or do some locking?

Would an alternative here be to have management/qdstat retrieve the metrics from the metrics manager? This would resolve the concerns of reporting metrics twice, and could also potentially enrich the qdstat output with values from histograms etc.

For reference, here is the document describing the format: https://prometheus.io/docs/instrumenting/exposition_formats/

@ted-ross

ted-ross commented Apr 7, 2017

Copy link
Copy Markdown
Member

I share Chuck's concern that this PR is introducing an alternative and redundant management access mechanism. I would certainly prefer to add the desired metrics to the AMQP-based access mechanism that is already in place.
One of the advantages of the AMQP management mechanism is that it is routable. With a single connection to a router in a network, a process can query data from all of the routers in the network (and eventually brokers as well). Using the HTTP access point only allows access to a single router.
Is there a way to bridge the gap between the AMQP management protocol and Prometheus?

@ted-ross

ted-ross commented Apr 7, 2017

Copy link
Copy Markdown
Member

At the risk of contradicting my previous post, I would add that having an HTTP front end on management data might be a worthwhile enhancement for some use cases as long as it used the same data sources as the AMQP front end. Mind would have to be paid to the security model for such HTTP accesses to management data.

@lulf lulf closed this Apr 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants