Skip to content

Resources ‐ Resources Utilization

hoberger-rh edited this page May 25, 2026 · 3 revisions

Description

The Resources Utilization rule collects and aggregates cluster resource utilization data across all nodes. This is an informational rule that provides comprehensive visibility into how cluster resources are allocated and consumed.

Collected Metrics:

  • Capacity: Total resource capacity per node
  • Allocatable: Resources available for pod scheduling (capacity minus system reservations)
  • Requests: Sum of all pod resource requests on the node
  • Limits: Sum of all pod resource limits on the node
  • Utilization Levels: Categorization of resource usage (low < 50%, medium 50-74%, high ≥ 75%)

Resource Categories:

  • Core Resources: cpu, memory, ephemeral-storage, hugepages-*
  • Extended Resources: Custom resources (GPUs, SR-IOV virtual functions, FPGAs, etc.)

This rule runs at the orchestrator level and aggregates data from all cluster nodes to provide a cluster-wide resource utilization view.

Value

Understanding resource utilization is critical for:

  • Capacity Planning: Identify when cluster needs scaling (nodes near capacity)
  • Cost Optimization: Detect over-provisioned resources (low utilization)
  • Performance Troubleshooting: Find resource contention or throttling
  • Scheduling Issues: Understand why pods are pending (insufficient allocatable resources)
  • Resource Quotas: Validate that workload resource requests align with actual capacity
  • Hardware Validation: Ensure nodes expose expected resources (GPUs, SR-IOV devices, hugepages)

Diagnostics

Manually check resource utilization using these commands:

# List all nodes with capacity and allocatable resources
oc get nodes -o custom-columns=NAME:.metadata.name,ROLES:.metadata.labels.'node-role\.kubernetes\.io/*',CPU-CAPACITY:.status.capacity.cpu,CPU-ALLOCATABLE:.status.allocatable.cpu,MEMORY-CAPACITY:.status.capacity.memory,MEMORY-ALLOCATABLE:.status.allocatable.memory

# Get detailed resource breakdown for a specific node
oc describe node <node-name>

# Check allocated resources percentage (from "Allocated resources" section)
oc describe node <node-name> | grep -A 10 "Allocated resources:"

# List extended resources (GPUs, SR-IOV, etc.) across all nodes
oc get nodes -o json | jq '.items[] | {name: .metadata.name, capacity: .status.capacity, allocatable: .status.allocatable}'

# Check pod resource requests and limits on a node
oc describe node <node-name> | grep -E "(cpu|memory|ephemeral-storage)" | grep -E "(Requests|Limits)"

Interpreting Results

The rule returns resource data with utilization levels for each node:

  • low (< 50%): Healthy utilization, room for growth
  • medium (50-74%): Moderate utilization, monitor for growth
  • high (≥ 75%): Near capacity, consider scaling or rebalancing workloads

Example Output Structure:

{
  "nodes": [
    {
      "name": "worker-0",
      "roles": ["worker"],
      "schedulable": true,
      "core_resources": {
        "cpu": {
          "capacity": "16 cores",
          "allocatable": "15800m",
          "requests": {
            "allocated": "6933m",
            "percentage": "92%",
            "utilization_level": "high"
          },
          "limits": {
            "allocated": "2660m",
            "percentage": "35%",
            "utilization_level": "low"
          }
        },
        "memory": {
          "capacity": "64Gi",
          "allocatable": "62Gi",
          "requests": {
            "allocated": "25724Mi",
            "percentage": "83%",
            "utilization_level": "high"
          },
          "limits": {
            "allocated": "30212Mi",
            "percentage": "97%",
            "utilization_level": "high"
          }
        },
        "ephemeral-storage": {
          "capacity": "191655242229B",
          "allocatable": "176303616Ki",
          "requests": {
            "allocated": "0",
            "percentage": "0%",
            "utilization_level": "low"
          },
          "limits": {
            "allocated": "0",
            "percentage": "0%",
            "utilization_level": "low"
          }
        }
      },
      "extended_resources": {
        "nvidia.com/gpu": {
          "capacity": "2",
          "allocatable": "2",
          "requests": {
            "allocated": "1",
            "percentage": "50%",
            "utilization_level": "medium"
          }
        }
      }
    }
  ]
}

Resources

Clone this wiki locally