-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v1.12 Backports 2023-12-05 #29639
v1.12 Backports 2023-12-05 #29639
Conversation
@viktor-kurchenko @christarazi Some minor conflicts were hit, please check commit notes for details. |
…lement error handling. [ upstream commit 6f227fb ] [ backporter's notes: conflicts due to `cilium/cmd` having been renamed to `cilium-dbg/cmd`. ] Signed-off-by: viktor-kurchenko <viktor.kurchenko@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 4787f8e ] [ backporter's notes: had to resolve rename conflicts. ] Similar to how useful log msgs are when endpoints created and deleted, this log is useful for understanding when nodes are added and deleted in production clusters. Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
[ upstream commit 7c7b723 ] Given the order of operations in prober.OnIdle, it is possible for the health probe to have a stale references to a deleted nodes. When that occurs, node connectivity metrics which were previously deleted [1] would be brought back, causing confusion. If users defined alerts for node connectivity health checks metrics (see example below), then this would erroneously trigger because the old nodes would appear in the metric labels as a failing health check. Example given deletion of "kind-worker2" node: ``` cilium_node_connectivity_status source_cluster="kind-kind" source_node_name="kind-worker" target_cluster="kind-kind" target_node_name="kind-control-plane" target_nod e_type="remote_intra_cluster" type="endpoint" 1.000000 cilium_node_connectivity_status source_cluster="kind-kind" source_node_name="kind-worker" target_cluster="kind-kind" target_node_name="kind-control-plane" target_nod e_type="remote_intra_cluster" type="node" 1.000000 cilium_node_connectivity_status source_cluster="kind-kind" source_node_name="kind-worker" target_cluster="kind-kind" target_node_name="kind-worker" target_node_type= "local_node" type="endpoint" 1.000000 cilium_node_connectivity_status source_cluster="kind-kind" source_node_name="kind-worker" target_cluster="kind-kind" target_node_name="kind-worker" target_node_type= "local_node" type="node" 1.000000 cilium_node_connectivity_status source_cluster="kind-kind" source_node_name="kind-worker" target_cluster="kind-kind" target_node_name="kind-worker2" target_node_type ="remote_intra_cluster" type="endpoint" 0.000000 ``` Fixes: d9e1ff8 ("cilium-health: Remove unnecessary goroutine") [1]: e9f97cd ("Ensures prometheus metrics associated with a deleted node are no longer reported.") Signed-off-by: Chris Tarazi <chris@isovalent.com> Signed-off-by: Nicolas Busseneau <nicolas@isovalent.com>
c982c11
to
39cbf19
Compare
/test-backport-1.12
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metrics changes LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My commit looks good. Thanks!
/test-1.17-4.9 |
/test-1.23-4.19 |
Once this PR is merged, a GitHub action will update the labels of these PRs: