Skip to content

feat(ci): add HPA pod autoscaling validation to inference workflow#163

Merged
dims merged 1 commit intoNVIDIA:mainfrom
dims:add-hpa-gpu-metrics-test
Feb 20, 2026
Merged

feat(ci): add HPA pod autoscaling validation to inference workflow#163
dims merged 1 commit intoNVIDIA:mainfrom
dims:add-hpa-gpu-metrics-test

Conversation

@dims
Copy link
Collaborator

@dims dims commented Feb 20, 2026

Summary

  • Add pod autoscaling (HPA) validation to the H100 inference workflow, covering CNCF AI conformance requirement pod_autoscaling
  • Create HPA manifest targeting vLLM worker with gpu_utilization custom metric from prometheus-adapter
  • Use maxReplicas=1 to validate the metrics pipeline (DCGM exporter → Prometheus → prometheus-adapter → HPA) without triggering actual scaling
  • Add HPA debug diagnostics on failure

Test plan

  • Verify HPA reads gpu_utilization from prometheus-adapter (AbleToScale=True, currentMetrics non-empty)
  • Verify existing inference test steps are unaffected
  • Verify HPA cleanup runs unconditionally

Validate the custom metrics pipeline (DCGM → Prometheus →
prometheus-adapter → custom metrics API) that HPA consumes for
GPU-aware pod autoscaling. Queries the custom.metrics.k8s.io API
directly for gpu_utilization, gpu_memory_used, and gpu_power_usage
metrics.

DCGM exporter runs as a DaemonSet in gpu-operator namespace, so
Prometheus labels GPU metrics with namespace=gpu-operator. We query
that namespace to validate the full metrics pipeline.

Dynamo uses PodCliqueSets (not Deployments), so we validate the
metrics API availability rather than creating an HPA object.

This covers CNCF AI Conformance requirement #8b (pod autoscaling).

Signed-off-by: Davanum Srinivas <dsrinivas@nvidia.com>
@dims dims force-pushed the add-hpa-gpu-metrics-test branch from 8fbe5ab to 46148df Compare February 20, 2026 11:58
@dims dims merged commit e55f9a2 into NVIDIA:main Feb 20, 2026
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant