Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status: allow exporting ConnectivityCheck results over HTTP #2411

Merged
merged 6 commits into from Dec 2, 2020

Conversation

gnarula
Copy link
Contributor

@gnarula gnarula commented Nov 25, 2020

Adds a command that initiates ConnectivityCheck every interval
and serves the results in the form of prometheus metrics.

Adds a command that initiates ConnectivityCheck every `interval`
and serves the results in the form of prometheus metrics.
Copy link
Member

@ineiti ineiti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like that approach very much: do I correctly understand that you put the charge of fetching the status and sending it to prometheus on the status binary?

In our current systems, this would work best as a one-shot thing: call status serve once every 5 minutes. But YMMV...

@pierluca
Copy link
Contributor

I think I like that approach very much: do I correctly understand that you put the charge of fetching the status and sending it to prometheus on the status binary?

In our current systems, this would work best as a one-shot thing: call status serve once every 5 minutes. But YMMV...

This is not how prometheus works. Prometheus works by scraping various systems and letting them expose their status.
I.e. Prometheus just needs a few addresses to scrape, but every service exposes metrics that are relevant for itself.
As a matter of fact, "sending to prometheus" is considered an anti-pattern and only recommended for ephemereal / batch jobs ( https://prometheus.io/docs/practices/pushing/ )

As such, this starts a webserver that exposes the metrics on a webpage.

Copy link
Contributor

@pierluca pierluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, minor remarks to improve our debugging capabilities

status/status.go Show resolved Hide resolved
status/status.go Show resolved Hide resolved
status/status.go Show resolved Hide resolved
@ineiti
Copy link
Member

ineiti commented Nov 26, 2020

Prometheus works by scraping various systems and letting them expose their status.

Hmm - true. I was still thinking about Grafana. You'll have to explain to me sometimes why a logger that calls the service to be logged is a good idea.

Getting back to my previous idea - shouldn't this be implemented using https://github.com/dedis/onet/blob/992a708c6c664b744ec67d8ea6f9b0181c16d166/processor.go#L192 ? Then you wouldn't even have to start yet another process, but could let the nodes serve Prometheus. You would have to add a rate-limitation, but other than that, I would prefer that...

@pierluca
Copy link
Contributor

Prometheus works by scraping various systems and letting them expose their status.

Hmm - true. I was still thinking about Grafana. You'll have to explain to me sometimes why a logger that calls the service to be logged is a good idea.

Getting back to my previous idea - shouldn't this be implemented using https://github.com/dedis/onet/blob/992a708c6c664b744ec67d8ea6f9b0181c16d166/processor.go#L192 ? Then you wouldn't even have to start yet another process, but could let the nodes serve Prometheus. You would have to add a rate-limitation, but other than that, I would prefer that...

Grafana just does visualisation. I assume you're thinking about Graphite.
Prometheus is explicitly NOT about logging, and "why scraping"... I could talk about for hours :)
Happy to have that conversation around coffee :)

@gnarula
Copy link
Contributor Author

gnarula commented Nov 26, 2020

Getting back to my previous idea - shouldn't this be implemented using https://github.com/dedis/onet/blob/992a708c6c664b744ec67d8ea6f9b0181c16d166/processor.go#L192 ? Then you wouldn't even have to start yet another process, but could let the nodes serve Prometheus. You would have to add a rate-limitation, but other than that, I would prefer that...

I think the limitation there is the REST Handler only supports JSON responses but Prometheus expects a text format in the response.

Re: rate-limitation, I think the metrics would only be consumed by the conode internally and I'd assume the HTTP port would be firewalled to allow connections only from trusted sources (i.e. the prometheus server).

@pierluca
Copy link
Contributor

There's no need for rate limiting : Prometheus metrics are never meant to be "computed" on HTTP request, merely displayed.
We're following this pattern : we check the connectivity regularly, display it on request.
The cost of displaying the metric, i.e. a template with some data, is minimal.
Worst case we can cache the metric page for optimisation purposes, but that's an unlikely scenario.

Copy link
Contributor

@pierluca pierluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Awesome :)

@gnarula gnarula added this to WIP in Cothority via automation Dec 2, 2020
@gnarula gnarula moved this from WIP to Ready4Merge in Cothority Dec 2, 2020
@gnarula gnarula merged commit 0303996 into master Dec 2, 2020
Cothority automation moved this from Ready4Merge to Closed Dec 2, 2020
@gnarula gnarula deleted the status-prometheus branch December 2, 2020 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Cothority
  
Closed
Development

Successfully merging this pull request may close these issues.

None yet

4 participants