Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: fix potential conflict on metrics registration #27007

Merged
merged 1 commit into from
Aug 25, 2023

Conversation

ysksuzuki
Copy link
Member

@ysksuzuki ysksuzuki commented Jul 23, 2023

In Cilium v1.13, K8s rest client metrics, such as cilium_k8s_client_api_latency_time_seconds and cilium_k8s_client_api_calls_total, are missing because of a conflict with controller-runtime on the metrics registration.

controller-runtime(sigs.k8s.io/controller-runtime/pkg/metrics) also registers the client-go rest client metrics on its registry using metrics.Register method in its init function. The metrics.Register can be called only once, and subsequent calls will be ignored.

cilium-agent doesn't use controller-runtime, but there's an indirect dependency from github.com/cilium/cilium/daemon to github.com/cilium/cilium/operator/metrics package. (operator uses controller-runtime in its Gateway API implementation) For example,
github.com/cilium/cilium/daemon/cmd
-> github.com/cilium/cilium/pkg/ipam
-> github.com/cilium/cilium/pkg/ipam/metrics
-> github.com/cilium/cilium/operator/metrics

The init function of sigs.k8s.io/controller-runtime/pkg/metrics will be called because of this indirect dependency.

The issue(missing client-go metrics) doesn't happen with v1.14 and later, but it can happen if the order of imports changes again. This commit is to prevent the issue from happning.

This commit is a forward port of e4ff641

fixes: #25610

Please ensure your pull request adheres to the following guidelines:

  • For first time contributors, read Submitting a pull request
  • All code is covered by unit and/or runtime tests where feasible.
  • All commits contain a well written commit description including a title,
    description and a Fixes: #XXX line if the commit addresses a particular
    GitHub issue.
  • If your commit description contains a Fixes: <commit-id> tag, then
    please add the commit author[s] as reviewer[s] to this issue.
  • All commits are signed off. See the section Developer’s Certificate of Origin
  • Provide a title or release-note blurb suitable for the release notes.
  • Are you a user of Cilium? Please add yourself to the Users doc
  • Thanks for contributing!

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jul 23, 2023
@ysksuzuki
Copy link
Member Author

context: #26412 (review)

@ysksuzuki ysksuzuki force-pushed the fix-client-go-metrics-conflict branch from 453cdf7 to 891e79b Compare July 23, 2023 10:13
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki marked this pull request as ready for review July 23, 2023 13:46
@ysksuzuki ysksuzuki requested a review from a team as a code owner July 23, 2023 13:46
Copy link
Contributor

@tommyp1ckles tommyp1ckles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice fix, I think this makes sense but I had a few questions

@@ -93,13 +93,27 @@ func init() {
k8s.K8sErrorHandler,
}

k8s_metrics.Register(k8s_metrics.RegisterOpts{
// The client-go Register function can be called only once,
// but controller-runtime also calls it in its init function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can maybe generalize that any in case of transitive imports can cause this condition to happen (while also mentioning the specific controller-runtime scenario that was the motivation here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased it. Please take a look.

pkg/k8s/watchers/watcher.go Show resolved Hide resolved
@ysksuzuki ysksuzuki force-pushed the fix-client-go-metrics-conflict branch from 891e79b to a0d56f3 Compare July 30, 2023 11:19
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki
Copy link
Member Author

ginkgo build cache seems broken
https://github.com/cilium/cilium/actions/runs/5706051069/job/15462054023

@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki force-pushed the fix-client-go-metrics-conflict branch from a0d56f3 to f4c4ebc Compare July 30, 2023 13:21
In Cilium v1.13, K8s rest client metrics, such as
cilium_k8s_client_api_latency_time_seconds and
cilium_k8s_client_api_calls_total, are missing because of a conflict
with controller-runtime on the metrics registration.

controller-runtime(sigs.k8s.io/controller-runtime/pkg/metrics) also
registers the client-go rest client metrics on its registry using
metrics.Register method in its init function. The metrics.Register can be
called only once, and subsequent calls will be ignored.

cilium-agent doesn't use controller-runtime, but there's an indirect
dependency from github.com/cilium/cilium/daemon to
github.com/cilium/cilium/operator/metrics package.
(operator uses controller-runtime in its Gateway API implementation)
For example,
github.com/cilium/cilium/daemon/cmd
-> github.com/cilium/cilium/pkg/ipam
-> github.com/cilium/cilium/pkg/ipam/metrics
-> github.com/cilium/cilium/operator/metrics

The init function of sigs.k8s.io/controller-runtime/pkg/metrics will be
called because of this indirect dependency.

The issue(missing client-go metrics) doesn't happen with v1.14 and
later, but it can happen if the order of imports changes again.
This commit is to prevent the issue from happning.

This commit is a forward port of
cilium@e4ff641

Signed-off-by: Yusuke Suzuki <yusuke-suzuki@cybozu.co.jp>
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki
Copy link
Member Author

@tommyp1ckles Do you have any other feedback or question?

@tommyp1ckles tommyp1ckles added the release-note/bug This PR fixes an issue in a previous release of Cilium. label Aug 24, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Aug 24, 2023
pkg/k8s/watchers/watcher.go Show resolved Hide resolved
@maintainer-s-little-helper maintainer-s-little-helper bot added ready-to-merge This PR has passed all tests and received consensus from code owners to merge. labels Aug 25, 2023
@tommyp1ckles tommyp1ckles added the area/metrics Impacts statistics / metrics gathering, eg via Prometheus. label Aug 25, 2023
@borkmann borkmann merged commit 0eba8bc into cilium:main Aug 25, 2023
58 checks passed
@borkmann
Copy link
Member

@tommyp1ckles do we need backport labels?

@tommyp1ckles tommyp1ckles added backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Aug 29, 2023
@tommyp1ckles
Copy link
Contributor

@borkmann 👍 This fixues a bug in 1.13, but this sounds like it may prevent future issues in 1.14 as well.

@tommyp1ckles tommyp1ckles removed backport/1.13 This PR represents a backport for Cilium 1.13.x of a PR that was merged to main. backport/1.14 This PR represents a backport for Cilium 1.14.x of a PR that was merged to main. labels Nov 16, 2023
@tommyp1ckles tommyp1ckles added needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch affects/v1.14 This issue affects v1.14 branch labels Nov 16, 2023
@sayboras sayboras mentioned this pull request Nov 20, 2023
3 tasks
@github-actions github-actions bot added the backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. label Nov 24, 2023
@joamaki joamaki mentioned this pull request Nov 29, 2023
14 tasks
@joamaki joamaki added backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. and removed backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. needs-backport/1.14 This PR / issue needs backporting to the v1.14 branch labels Nov 29, 2023
@christarazi christarazi added backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. and removed backport-pending/1.14 The backport for Cilium 1.14.x for this PR is in progress. labels Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.14 This issue affects v1.14 branch area/metrics Impacts statistics / metrics gathering, eg via Prometheus. backport-done/1.14 The backport for Cilium 1.14.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
No open projects
Status: Released
Development

Successfully merging this pull request may close these issues.

Missing client-go Metrics and Potential Conflict with controller-runtime on Metrics Registration
5 participants