extend and refactor prometheus metrics #1850

m3co-code · 2017-07-10T18:23:54Z

For our internal Traefik setup I was working on improving the Prometheus metrics and wanted to upstream my work, as I guess this is potentially useful for a lot of Traefik users. It is planned to integrate more metrics in the future, but this PR already restructures the metrics implementation and provides significant benefits when monitoring Traefik with Prometheus. For more metrics follow-up PRs will be smaller and more focused. It is important to note that in this rework, the metric naming was adapted in order to fix #1759 and to be more consistent. As the metrics implementation is in a rather young state, I had the feeling this is the right time to go for removing inconsistencies.

In general the following metrics are now available. Please check the code for details including the metric labels.

// server meta information
traefik_config_reloads_total
traefik_config_reload_failures_total
traefik_config_last_reload_success

// entrypoint
traefik_entrypoint_requests_total
traefik_entrypoint_request_duration_seconds

// backend level
traefik_backend_requests_total
traefik_backend_request_duration_seconds
traefik_backend_retries_total

In the course of extending the metrics I did some design reworks:

I moved the implementation to an own package. This is necessary because some metrics get also collected in the server package and it makes the design more clear.
I switched the prometheus test to a real integration test. On the one hand this made things easier on the other its a more "real world test".
I changed the Metrics interface to one big interface. Even though this might not be the most idiomatic way in go, it makes it sufficiently easier for other potential future implementations to provide a "full" metrics implementation. Also the VoidMetrics and the general handling in the server.go is much easier through this.
I integrated a VoidMetrics implementation to remove the necessity to check for nil on the consumer side.

ldez · 2017-07-10T20:13:58Z

The best practices when you do a (large) refactor are to split each step in commits to facilitate the review and don't make a mix of refactor and improvements (or split into separate commit).

In the current state, this PR will be very hard to review.

ldez · 2017-07-10T20:17:49Z

Moreover this modification breaks the work done in the #1701

m3co-code · 2017-07-11T14:54:33Z

In regards to splitting up the commits, when it helps you significantly I can split the commits and rebase accordingly. Let me know please when you want me to do it, because it will be also effort on my time.

In regards to breaking #1701. This is unfortunate, I wasn't aware of the PR. Though to be honest it is also "dangerous" to rely or consider it when moving forward, given it is open for longer than a month and no activity since 13 days. Anyway I think it will be not too much effort to adapt one of the two PRs after the merge of the other one. To be honest this PR will make integration of other providers in general easier and clearer and its design already accomplishes what @emilevauge requested in one of his open PR comments.

What in general should be done and improved in this PR still are documentation. But I think it only makes sense for me to start with them when there is a clear chance that it gets merged.

The order in which the Servers are setup is not guaranteed. Therefore it is not guaranteed that two retries MUST happen. It is only guaranteed that at least one retry MUST happen. This is because in the integration test we have 2 Servers and 2 Requests. In a uniform rotation through the load balancers it MUST hit the invalid Server at least once. Added the weight configuration setting to make it more explicit.

m3co-code · 2017-07-21T10:01:22Z

I am closing this PR for now, as I don't have the time right now to update my PR and the other provider implementations. I plan to do this though in middle term so stay tuned.

extend and refactor prometheus metrics

59842ea

ldez added area/middleware/metrics kind/enhancement a new or improved feature. status/1-needs-design-review size/L labels Jul 10, 2017

m3co-code added 2 commits July 18, 2017 06:30

switch to new cmdTraefik util method

106b3de

m3co-code closed this Jul 21, 2017

m3co-code mentioned this pull request Aug 17, 2017

Extract metrics to own package and refactor implementations #1968

Merged

ldez removed the status/1-needs-design-review label Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extend and refactor prometheus metrics #1850

extend and refactor prometheus metrics #1850

m3co-code commented Jul 10, 2017

ldez commented Jul 10, 2017 •

edited

Loading

ldez commented Jul 10, 2017

m3co-code commented Jul 11, 2017

m3co-code commented Jul 21, 2017

extend and refactor prometheus metrics #1850

extend and refactor prometheus metrics #1850

Conversation

m3co-code commented Jul 10, 2017

ldez commented Jul 10, 2017 • edited Loading

ldez commented Jul 10, 2017

m3co-code commented Jul 11, 2017

m3co-code commented Jul 21, 2017

ldez commented Jul 10, 2017 •

edited

Loading