New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ipam: add metrics to track per node capacity #24776
ipam: add metrics to track per node capacity #24776
Conversation
Commit b98e92d31f6aca94e40acbdf2d089a3d5e1759bc does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
b98e92d
to
0f6ff5f
Compare
Commit b98e92d31f6aca94e40acbdf2d089a3d5e1759bc does not contain "Signed-off-by". Please follow instructions provided in https://docs.cilium.io/en/stable/contributing/development/contributing_guide/#developer-s-certificate-of-origin |
/test |
0f6ff5f
to
56ab89f
Compare
64d7c15
to
60c3c86
Compare
add59a5
to
7339b89
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks! I focused on the IPAM aspect, that change looks good to me. A few minor things that stood out to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, clean code and easy to read commits!
I don't have any other comment besides a nit to follow and what Sebastian already pointed out. When reading some of the commits, I was initially confused by what "CNI pods" meant and then realized it meant "managed (by Cilium) pods". The latter is typically how we convey the relationship as you probably already know, but thought I'd call it out.
7339b89
to
e7ed5e5
Compare
0ad83a1
to
97ac4c1
Compare
/test |
Updated metrics docs with new ipam metrics. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Mention the addition of ipam_{available,used,needed} metrics. As well, notify in upgrade guide about intended deprecation of ipam_ips metric. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Previously IsPrefixDelegationEnabled implementation on *ipam.Node would check if the receiver node pointer was nil. If so it defaulted to false. This is a bit dangerous as function on nil concrete types will still be invoked, whereas interface types will panic (i.e. because a nil interface doesn't actually have a function to lookup). This removes that code, and defers nil checking to the caller. Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
97ac4c1
to
0013798
Compare
/test |
@zacharysarah no problem, thanks for the review! |
@gandro @christarazi Made some changes to fix failing tests and lint issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tommyp1ckles Thanks for the updates! ✨ LGTM
Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in cilium#24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in #24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com>
[ upstream commit 5b7b3bb ] Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in #24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 5b7b3bb ] Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in #24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
[ upstream commit 5b7b3bb ] Affects: * operator_ipam_available_ips * operator_ipam_used_ips * operator_ipam_needed_ips Which have the label "target_name", previously when a Node was deleted the metric continued to be emitted by the Prometheus exporter, leading to confusing sum() values across a cluster. Fixes changes in cilium#24776 Signed-off-by: Tom Hadlaw <tom.hadlaw@isovalent.com> Signed-off-by: Gilberto Bertin <jibi@cilium.io>
This PR adds metrics to help track IPAM Node capacity with more granularity.
Specifically, the goal is to help add alerting/dashboards for such questions such as:
This adds three new Operator metrics:
As well, this seeks to deprecate the :
cilium_operator_ipam_ips
metric, aside from it being redundant it also is very confusing - thetype=available
does not track what you expect it to track.Which are all labelled by "target_node" (i.e. the name of the CiliumNode).
How does this differ from the existing IPAM metrics?
There's a bit of a blind spot in the current set of metrics: