status: allow exporting ConnectivityCheck results over HTTP #2411

gnarula · 2020-11-25T10:54:01Z

Adds a command that initiates ConnectivityCheck every interval
and serves the results in the form of prometheus metrics.

Adds a command that initiates ConnectivityCheck every `interval` and serves the results in the form of prometheus metrics.

ineiti

I think I like that approach very much: do I correctly understand that you put the charge of fetching the status and sending it to prometheus on the status binary?

In our current systems, this would work best as a one-shot thing: call status serve once every 5 minutes. But YMMV...

pierluca · 2020-11-26T08:25:14Z

I think I like that approach very much: do I correctly understand that you put the charge of fetching the status and sending it to prometheus on the status binary?

In our current systems, this would work best as a one-shot thing: call status serve once every 5 minutes. But YMMV...

This is not how prometheus works. Prometheus works by scraping various systems and letting them expose their status.
I.e. Prometheus just needs a few addresses to scrape, but every service exposes metrics that are relevant for itself.
As a matter of fact, "sending to prometheus" is considered an anti-pattern and only recommended for ephemereal / batch jobs ( https://prometheus.io/docs/practices/pushing/ )

As such, this starts a webserver that exposes the metrics on a webpage.

pierluca

LGTM, minor remarks to improve our debugging capabilities

status/status.go

ineiti · 2020-11-26T10:09:16Z

Prometheus works by scraping various systems and letting them expose their status.

Hmm - true. I was still thinking about Grafana. You'll have to explain to me sometimes why a logger that calls the service to be logged is a good idea.

Getting back to my previous idea - shouldn't this be implemented using https://github.com/dedis/onet/blob/992a708c6c664b744ec67d8ea6f9b0181c16d166/processor.go#L192 ? Then you wouldn't even have to start yet another process, but could let the nodes serve Prometheus. You would have to add a rate-limitation, but other than that, I would prefer that...

pierluca · 2020-11-26T11:18:53Z

Prometheus works by scraping various systems and letting them expose their status.

Hmm - true. I was still thinking about Grafana. You'll have to explain to me sometimes why a logger that calls the service to be logged is a good idea.

Getting back to my previous idea - shouldn't this be implemented using https://github.com/dedis/onet/blob/992a708c6c664b744ec67d8ea6f9b0181c16d166/processor.go#L192 ? Then you wouldn't even have to start yet another process, but could let the nodes serve Prometheus. You would have to add a rate-limitation, but other than that, I would prefer that...

Grafana just does visualisation. I assume you're thinking about Graphite.
Prometheus is explicitly NOT about logging, and "why scraping"... I could talk about for hours :)
Happy to have that conversation around coffee :)

gnarula · 2020-11-26T12:24:14Z

Getting back to my previous idea - shouldn't this be implemented using https://github.com/dedis/onet/blob/992a708c6c664b744ec67d8ea6f9b0181c16d166/processor.go#L192 ? Then you wouldn't even have to start yet another process, but could let the nodes serve Prometheus. You would have to add a rate-limitation, but other than that, I would prefer that...

I think the limitation there is the REST Handler only supports JSON responses but Prometheus expects a text format in the response.

Re: rate-limitation, I think the metrics would only be consumed by the conode internally and I'd assume the HTTP port would be firewalled to allow connections only from trusted sources (i.e. the prometheus server).

pierluca · 2020-11-26T12:43:59Z

There's no need for rate limiting : Prometheus metrics are never meant to be "computed" on HTTP request, merely displayed.
We're following this pattern : we check the connectivity regularly, display it on request.
The cost of displaying the metric, i.e. a template with some data, is minimal.
Worst case we can cache the metric page for optimisation purposes, but that's an unlikely scenario.

pierluca

LGTM. Awesome :)

gnarula requested review from ineiti and pierluca November 25, 2020 10:54

status: allow exporting ConnectivityCheck results over HTTP

cfe5fb4

Adds a command that initiates ConnectivityCheck every `interval` and serves the results in the form of prometheus metrics.

gnarula force-pushed the status-prometheus branch from c08cfa2 to cfe5fb4 Compare November 25, 2020 10:55

ineiti approved these changes Nov 26, 2020

View reviewed changes

pierluca reviewed Nov 26, 2020

View reviewed changes

status/status.go Show resolved Hide resolved

status/status.go Show resolved Hide resolved

nkcr reviewed Nov 26, 2020

View reviewed changes

status/status.go Show resolved Hide resolved

status: add timestamp and error message to prometheus endpoint

0b68856

gnarula requested review from pierluca and nkcr November 26, 2020 09:53

status: change milliseconds to seconds

ee70e61

gnarula added 2 commits November 26, 2020 13:32

status: add note about not rate-limiting

cb228ad

status: fix interval parse

29b6241

pierluca approved these changes Nov 26, 2020

View reviewed changes

nkcr approved these changes Nov 26, 2020

View reviewed changes

go fmt

eadaf34

gnarula added this to WIP in Cothority via automation Dec 2, 2020

gnarula moved this from WIP to Ready4Merge in Cothority Dec 2, 2020

gnarula merged commit 0303996 into master Dec 2, 2020

Cothority automation moved this from Ready4Merge to Closed Dec 2, 2020

gnarula deleted the status-prometheus branch December 2, 2020 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

status: allow exporting ConnectivityCheck results over HTTP #2411

status: allow exporting ConnectivityCheck results over HTTP #2411

gnarula commented Nov 25, 2020

ineiti left a comment

pierluca commented Nov 26, 2020

pierluca left a comment

ineiti commented Nov 26, 2020

pierluca commented Nov 26, 2020

gnarula commented Nov 26, 2020

pierluca commented Nov 26, 2020

pierluca left a comment

status: allow exporting ConnectivityCheck results over HTTP #2411

status: allow exporting ConnectivityCheck results over HTTP #2411

Conversation

gnarula commented Nov 25, 2020

ineiti left a comment

Choose a reason for hiding this comment

pierluca commented Nov 26, 2020

pierluca left a comment

Choose a reason for hiding this comment

ineiti commented Nov 26, 2020

pierluca commented Nov 26, 2020

gnarula commented Nov 26, 2020

pierluca commented Nov 26, 2020

pierluca left a comment

Choose a reason for hiding this comment