Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring #20

Closed
HonzaMac opened this issue May 25, 2020 · 4 comments
Closed

Monitoring #20

HonzaMac opened this issue May 25, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@HonzaMac
Copy link

HonzaMac commented May 25, 2020

We would like to start monitor of circuit breakers and retry's and timeouts in production.

I like this library the most from others but I am missing a way how to monitor current state.

I've tried to get at least breaker.stats and some onRetry and other, but would be better to have more holistic way how to get this events and state out of cockatiel.

Is there a way how to get this statistics?

i.e. here is some inspiration in Levee project.

Example how such output could one day look.
Article from Hystrix with video.

image
This image actually aggregates all instances of circuit breaker per concrete service.

I can image taking this information with every request (or batching can be done by another tool).
I am thinking now about this metrics:

  • time to process request
  • state - open, closed, half-open
  • end state of request - timedout/sucess/failed
  • 1,2,3 retry of request

As a side note: I want to integrate it into Grafana using InfluxDb (but this is not important)
Example of Grafana dashboard

@HonzaMac
Copy link
Author

HonzaMac commented Jun 4, 2020

What do you think @connor4312 ?

@connor4312
Copy link
Owner

Apologies, been busy. I will look into this more deeply this weekend.

@connor4312
Copy link
Owner

connor4312 commented Jun 20, 2020

The circuit breaker now includes a general onStateChange callback, and all policies include onSuccess and onFailure events, which are called with information about the function run duration. I think this should be sufficient to get the monitoring you're after.

Levee actually includes a basic statistics store, but I think this is out of scope for the library--there are numerous, differently configurable ways to collect call duration, for instance, which can be handled better by a specific telemetry library.

@connor4312 connor4312 added the enhancement New feature or request label Jun 20, 2020
@HonzaMac
Copy link
Author

Thank you @connor4312 for this improvement 👏. This will enable monitoring what I am looking for, will try it in upcoming week.

Great work 💪🏼

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants