Skip to content

Commit

Permalink
Ensures prometheus metrics associated with a deleted node are no long…
Browse files Browse the repository at this point in the history
…er reported.

[ upstream commit e9f97cd ]

When a node is deleted from a cluster, metrics associated with that node
are still being exported to prometheus. Short of restarting the agent,
we want to dynamically delete these metrics when a node is removed from the cluster.

This PR ensures node_connectivity_status and node_connectivity_latency
no longer report metrics for nodes that are no longer present on the
cluster.

[ Backporter's notes: Original PR was adapted! ]

The original PR depends (mainly!) on 2 other PRs that haven't been
backported and are fairly substential.
Given this, I've opted to adapt the original implementation to surface the
fix while minimizing impact with 2 updates:
1. pkg/metrics/interfaces did not introduce pkg/metrics/metric wrappers
  as of this release. Hence adapted deletableVec to use the current
implementation. (Referring to commit: 84ea383)
2. pkg/node/manager/manager was adapted to provide for metrics deletion when a
   node is deleted. Subsequent PR refactored the manager metrics structure which
   the original PR used. (Referring to commit: c49ef45)

Signed-off-by: Fernand Galiana <fernand.galiana@isovalent.com>
  • Loading branch information
derailed committed Nov 2, 2023
1 parent ecb2250 commit 472ca7f
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 0 deletions.
19 changes: 19 additions & 0 deletions pkg/metrics/interfaces.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,19 @@ type CounterVec interface {
}

type GaugeVec interface {
deletableVec

WithLabelValues(lvls ...string) prometheus.Gauge
prometheus.Collector
}

type deletableVec interface {
Delete(ll prometheus.Labels) bool
DeleteLabelValues(lvs ...string) bool
DeletePartialMatch(labels prometheus.Labels) int
Reset()
}

var (
NoOpMetric prometheus.Metric = &metric{}
NoOpCollector prometheus.Collector = &collector{}
Expand Down Expand Up @@ -130,6 +139,16 @@ type gaugeVec struct {
prometheus.Collector
}

func (*gaugeVec) Delete(ll prometheus.Labels) bool {
return false
}
func (*gaugeVec) DeleteLabelValues(lvs ...string) bool {
return false
}
func (*gaugeVec) DeletePartialMatch(labels prometheus.Labels) int {
return 0
}
func (*gaugeVec) Reset() {}
func (gv *gaugeVec) WithLabelValues(lvls ...string) prometheus.Gauge {
return NoOpGauge
}
23 changes: 23 additions & 0 deletions pkg/node/manager/manager.go
Original file line number Diff line number Diff line change
Expand Up @@ -663,6 +663,7 @@ func (m *Manager) NodeDeleted(n nodeTypes.Node) {
}

m.metricNumNodes.Dec()
processNodeDeletion(n.Cluster, n.Name)

entry.mutex.Lock()
delete(m.nodes, nodeIdentity)
Expand All @@ -673,6 +674,28 @@ func (m *Manager) NodeDeleted(n nodeTypes.Node) {
entry.mutex.Unlock()
}

func processNodeDeletion(clusterName, nodeName string) {
// Removes all connectivity status associated with the deleted node.
_ = metrics.NodeConnectivityStatus.DeletePartialMatch(prometheus.Labels{
metrics.LabelSourceCluster: clusterName,
metrics.LabelSourceNodeName: nodeName,
})
_ = metrics.NodeConnectivityStatus.DeletePartialMatch(prometheus.Labels{
metrics.LabelTargetCluster: clusterName,
metrics.LabelTargetNodeName: nodeName,
})

// Removes all connectivity latency associated with the deleted node.
_ = metrics.NodeConnectivityLatency.DeletePartialMatch(prometheus.Labels{
metrics.LabelSourceCluster: clusterName,
metrics.LabelSourceNodeName: nodeName,
})
_ = metrics.NodeConnectivityLatency.DeletePartialMatch(prometheus.Labels{
metrics.LabelTargetCluster: clusterName,
metrics.LabelTargetNodeName: nodeName,
})
}

// GetNodeIdentities returns a list of all node identities store in node
// manager.
func (m *Manager) GetNodeIdentities() []nodeTypes.Identity {
Expand Down

0 comments on commit 472ca7f

Please sign in to comment.