Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend and refactor prometheus metrics #1850

Closed
wants to merge 3 commits into from

Conversation

m3co-code
Copy link
Contributor

For our internal Traefik setup I was working on improving the Prometheus metrics and wanted to upstream my work, as I guess this is potentially useful for a lot of Traefik users. It is planned to integrate more metrics in the future, but this PR already restructures the metrics implementation and provides significant benefits when monitoring Traefik with Prometheus. For more metrics follow-up PRs will be smaller and more focused. It is important to note that in this rework, the metric naming was adapted in order to fix #1759 and to be more consistent. As the metrics implementation is in a rather young state, I had the feeling this is the right time to go for removing inconsistencies.

In general the following metrics are now available. Please check the code for details including the metric labels.

// server meta information
traefik_config_reloads_total
traefik_config_reload_failures_total
traefik_config_last_reload_success

// entrypoint
traefik_entrypoint_requests_total
traefik_entrypoint_request_duration_seconds

// backend level
traefik_backend_requests_total
traefik_backend_request_duration_seconds
traefik_backend_retries_total

In the course of extending the metrics I did some design reworks:

  • I moved the implementation to an own package. This is necessary because some metrics get also collected in the server package and it makes the design more clear.
  • I switched the prometheus test to a real integration test. On the one hand this made things easier on the other its a more "real world test".
  • I changed the Metrics interface to one big interface. Even though this might not be the most idiomatic way in go, it makes it sufficiently easier for other potential future implementations to provide a "full" metrics implementation. Also the VoidMetrics and the general handling in the server.go is much easier through this.
  • I integrated a VoidMetrics implementation to remove the necessity to check for nil on the consumer side.

@ldez
Copy link
Contributor

ldez commented Jul 10, 2017

The best practices when you do a (large) refactor are to split each step in commits to facilitate the review and don't make a mix of refactor and improvements (or split into separate commit).

In the current state, this PR will be very hard to review.

@ldez
Copy link
Contributor

ldez commented Jul 10, 2017

Moreover this modification breaks the work done in the #1701

@m3co-code
Copy link
Contributor Author

In regards to splitting up the commits, when it helps you significantly I can split the commits and rebase accordingly. Let me know please when you want me to do it, because it will be also effort on my time.

In regards to breaking #1701. This is unfortunate, I wasn't aware of the PR. Though to be honest it is also "dangerous" to rely or consider it when moving forward, given it is open for longer than a month and no activity since 13 days. Anyway I think it will be not too much effort to adapt one of the two PRs after the merge of the other one. To be honest this PR will make integration of other providers in general easier and clearer and its design already accomplishes what @emilevauge requested in one of his open PR comments.

What in general should be done and improved in this PR still are documentation. But I think it only makes sense for me to start with them when there is a clear chance that it gets merged.

The order in which the Servers are setup is not guaranteed. Therefore it
is not guaranteed that two retries MUST happen. It is only guaranteed
that at least one retry MUST happen. This is because in the integration
test we have 2 Servers and 2 Requests. In a uniform rotation through the
load balancers it MUST hit the invalid Server at least once.

Added the weight configuration setting to make it more explicit.
@m3co-code
Copy link
Contributor Author

I am closing this PR for now, as I don't have the time right now to update my PR and the other provider implementations. I plan to do this though in middle term so stay tuned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prometheus request metrics tracked twice
2 participants