Skip to content

Commit 8750b34

Browse files
authored
CNS Prometheus and Grafana examples (#1366)
* cns prometheus examples Signed-off-by: Evan Baker <rbtr@users.noreply.github.com> * grafana samples Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
1 parent 2c77774 commit 8750b34

File tree

4 files changed

+1248
-0
lines changed

4 files changed

+1248
-0
lines changed

cns/doc/examples/metrics/README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Azure CNS metrics
2+
azure-cns exposes metrics via Prometheus on `:10092/metrics`
3+
4+
## Scraping
5+
Prometheus can be configured using these examples:
6+
- a [podMonitor](podMonitor.yaml), if using promotheus-operator or kube-prometheus
7+
- manually via this equivalent [scrape_config](scrape_config.yaml)
8+
9+
## Monitoring
10+
To view all available CNS metrics once Prometheus is correctly configured to scrape:
11+
```promql
12+
count ({job="kube-system/azure-cns"}) by (__name__)
13+
```
14+
15+
CNS exposes standard Go and Prom metrics such as `go_goroutines`, `go_gc*`, `up`, and more.
16+
17+
Metrics designed to be customer-facing are generally prefixed with `cx_` and can be listed similarly:
18+
```promql
19+
count ({__name__=~"cx.*",job="kube-system/azure-cns"}) by (__name__)
20+
```
21+
At time of writing, the following cx metrics are exposed (key metrics in **bold**):
22+
- **cx_ipam_available_ips** (IPs reserved by the Node but not assigned to Pods yet)
23+
- cx_ipam_batch_size
24+
- cx_ipam_current_available_ips
25+
- cx_ipam_expect_available_ips
26+
- **cx_ipam_max_ips** (maximum IPs the Node can reserve from the Subnet)
27+
- cx_ipam_pending_programming_ips
28+
- cx_ipam_pending_release_ips
29+
- **cx_ipam_pod_allocated_ips** (IPs assigned to Pods on the Node)
30+
- cx_ipam_requested_ips
31+
- **cx_ipam_total_ips** (IPs reserved by the Node from the Subnet)
32+
33+
These metrics may be used to gain insight in to the current state of the cluster's IPAM.
34+
35+
For example, to view the current IP count requested by each node:
36+
```promql
37+
sum (cx_ipam_requested_ips{job="kube-system/azure-cns"}) by (instance)
38+
```
39+
To view the current IP count allocated to each node:
40+
```promql
41+
sum (cx_ipam_total_ips{job="kube-system/azure-cns"}) by (instance)
42+
```
43+
> Note: if these two values aren't converging after some time, that indicates an IP provisioning error.
44+
45+
To view the current IP count assigned to pods, per node:
46+
```promql
47+
sum (cx_ipam_pod_allocated_ips{job="kube-system/azure-cns"}) by (instance)
48+
```
49+
50+
## Visualizing
51+
A sample Grafana dashboard is included at [grafan.json](grafana.json).
52+
53+
Visualizations included are:
54+
- Per Node
55+
- CNS Status (Up/Down)
56+
- Requested IPs
57+
- Reserved IPs
58+
- Used IPs
59+
- Request/Reserved/Used vs Time
60+
- Per Cluster
61+
- Total Reserver IPs vs Time
62+
- Total Used IPs vs Time
63+
- Reserved and Assigned vs Time
64+
- Cluster Subnet Utilization Percentage vs Time
65+
- Cluster Subnet Utilization Total vs Time
66+
- Node Headroom (how many additional Nodes can be added to the Cluster based on the Subnet capacity)

0 commit comments

Comments
 (0)