|
| 1 | +# Azure CNS metrics |
| 2 | +azure-cns exposes metrics via Prometheus on `:10092/metrics` |
| 3 | + |
| 4 | +## Scraping |
| 5 | +Prometheus can be configured using these examples: |
| 6 | +- a [podMonitor](podMonitor.yaml), if using promotheus-operator or kube-prometheus |
| 7 | +- manually via this equivalent [scrape_config](scrape_config.yaml) |
| 8 | + |
| 9 | +## Monitoring |
| 10 | +To view all available CNS metrics once Prometheus is correctly configured to scrape: |
| 11 | +```promql |
| 12 | +count ({job="kube-system/azure-cns"}) by (__name__) |
| 13 | +``` |
| 14 | + |
| 15 | +CNS exposes standard Go and Prom metrics such as `go_goroutines`, `go_gc*`, `up`, and more. |
| 16 | + |
| 17 | +Metrics designed to be customer-facing are generally prefixed with `cx_` and can be listed similarly: |
| 18 | +```promql |
| 19 | +count ({__name__=~"cx.*",job="kube-system/azure-cns"}) by (__name__) |
| 20 | +``` |
| 21 | +At time of writing, the following cx metrics are exposed (key metrics in **bold**): |
| 22 | +- **cx_ipam_available_ips** (IPs reserved by the Node but not assigned to Pods yet) |
| 23 | +- cx_ipam_batch_size |
| 24 | +- cx_ipam_current_available_ips |
| 25 | +- cx_ipam_expect_available_ips |
| 26 | +- **cx_ipam_max_ips** (maximum IPs the Node can reserve from the Subnet) |
| 27 | +- cx_ipam_pending_programming_ips |
| 28 | +- cx_ipam_pending_release_ips |
| 29 | +- **cx_ipam_pod_allocated_ips** (IPs assigned to Pods on the Node) |
| 30 | +- cx_ipam_requested_ips |
| 31 | +- **cx_ipam_total_ips** (IPs reserved by the Node from the Subnet) |
| 32 | + |
| 33 | +These metrics may be used to gain insight in to the current state of the cluster's IPAM. |
| 34 | + |
| 35 | +For example, to view the current IP count requested by each node: |
| 36 | +```promql |
| 37 | +sum (cx_ipam_requested_ips{job="kube-system/azure-cns"}) by (instance) |
| 38 | +``` |
| 39 | +To view the current IP count allocated to each node: |
| 40 | +```promql |
| 41 | +sum (cx_ipam_total_ips{job="kube-system/azure-cns"}) by (instance) |
| 42 | +``` |
| 43 | +> Note: if these two values aren't converging after some time, that indicates an IP provisioning error. |
| 44 | +
|
| 45 | +To view the current IP count assigned to pods, per node: |
| 46 | +```promql |
| 47 | +sum (cx_ipam_pod_allocated_ips{job="kube-system/azure-cns"}) by (instance) |
| 48 | +``` |
| 49 | + |
| 50 | +## Visualizing |
| 51 | +A sample Grafana dashboard is included at [grafan.json](grafana.json). |
| 52 | + |
| 53 | +Visualizations included are: |
| 54 | +- Per Node |
| 55 | + - CNS Status (Up/Down) |
| 56 | + - Requested IPs |
| 57 | + - Reserved IPs |
| 58 | + - Used IPs |
| 59 | + - Request/Reserved/Used vs Time |
| 60 | +- Per Cluster |
| 61 | + - Total Reserver IPs vs Time |
| 62 | + - Total Used IPs vs Time |
| 63 | + - Reserved and Assigned vs Time |
| 64 | + - Cluster Subnet Utilization Percentage vs Time |
| 65 | + - Cluster Subnet Utilization Total vs Time |
| 66 | + - Node Headroom (how many additional Nodes can be added to the Cluster based on the Subnet capacity) |
0 commit comments