Skip to content

Add inference performance validation to GKE inference-dynamo overlay #937

@yuanchen8911

Description

@yuanchen8911

Problem

aicr validate --phase performance against the GKE H100 COS Dynamo
inference recipe is a no-op:

[cli] phase requested but no checks defined in recipe;
      phase will be empty: phase=performance
[cli] running validation phase: phase=performance catalog=4 selected=0
[cli] phase completed: phase=performance status=skipped

The validator catalog ships 4 performance-phase validators, but the
recipe's overlay doesn't subscribe to any of them, so the phase
reports status=skipped. End-to-end perf signal is silently missing
for the GKE flow.

Root cause

recipes/overlays/h100-gke-cos-inference-dynamo.yaml has no
validation.performance block. The EKS counterpart
recipes/overlays/h100-eks-ubuntu-inference-dynamo.yaml has one
(added in #641, "feat(validator): add inference performance
validation"). PR #641 wired the new validator into the EKS overlay
but didn't extend it to GKE — a sibling-overlay omission, not an
intentional gap (git history shows no performance block ever
existed on the GKE overlay).

Reproduction

aicr recipe --service gke --accelerator h100 --os cos \
  --intent inference --platform dynamo --output gke-rec.yaml
aicr validate --recipe gke-rec.yaml --phase performance

Expected: at least one performance-phase check runs against the
deployed inference workload.
Actual: validators=0 passed=0 failed=0 status=skipped.

Suggested fix

Add a performance block to
recipes/overlays/h100-gke-cos-inference-dynamo.yaml mirroring the
EKS overlay's stanza:

performance:
  checks:
    - inference-perf
  constraints:
    - name: inference-throughput
      value: ">= 5000"   # tok/s — placeholder, tune for H100+COS
    - name: inference-ttft-p99
      value: "<= 200"    # ms p99 — placeholder, tune for H100+COS

The EKS thresholds are noted in-source as "placeholder thresholds —
pending empirical tuning". Adopting them on GKE keeps the two flows
aligned and unblocks --phase performance. Threshold tuning for
COS / H100-mega-80gb (vs EKS Ubuntu / H100-80gb) is a follow-up.

Context

Surfaced while validating a freshly-deployed vllm-agg.yaml workload
on GKE H100 COS — --phase deployment (4/4 pass) and --phase conformance (11/11 pass) both ran; only --phase performance was
silently empty.

Related: #641 (where this should have been extended).

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions