Summary
The L40 accelerator is declared in pkg/recipe/criteria.go (CriteriaAcceleratorL40 = \"l40\") but has zero overlays in recipes/overlays/. A user running aicr recipe --accelerator l40 --service <any> cannot resolve a usable recipe.
Motivation / Context
Surfaced as an explicit "out of scope (track separately)" item in #969 — the validation phase coverage audit excluded L40 because no overlays exist to audit. Filing this so the gap has a dedicated tracker.
L40 / L40S are the workhorse inference cards (Ada Lovelace, 48GB) and are the preferred Cost/$TPS option on every hyperscaler for inference workloads. Their absence means AICR cannot serve the inference user segment without manual recipe authoring.
L40 / L40S cloud SKU availability
| Cloud |
SKU |
| AWS EKS |
g6e.{2,4,8,12,16,24,48}xlarge (L40S, 1–8x) |
| GCP GKE |
g2-standard-* (L4 and L40S variants) |
| Azure AKS |
Standard_NV*ads_A10_v5 (A10 — note: not L40), L40S on newer NVads-v6 series as it lands |
| OCI OKE |
BM.GPU.L40S.4 (4x L40S 48GB) |
| Lambda Labs / CoreWeave / etc. |
L40S widely available |
Note: This issue intentionally groups L40 and L40S because the criteria type is a single l40 enum value. If L40S vs L40 needs distinct overlays (different power/TDP, different NCCL profiles), file a separate issue to split the enum.
Suggested scope
Minimum viable for the first PR, extensions as follow-ups:
PR 1 (minimum): L40S on EKS (best-attested cloud):
l40-eks-inference.yaml (primary use case)
l40-eks-ubuntu-inference.yaml
l40-eks-ubuntu-inference-dynamo.yaml and / or l40-eks-ubuntu-inference-nim.yaml (matches the H100 inference pattern)
- Per-accelerator constraint:
Deployment.gpu-operator.version floor — L40 supported since v23.6; recommend >= v23.9.0 baseline
- No NCCL bandwidth threshold needed for single-node inference (the primary L40 use case); revisit if multi-node training overlays are added
PR 2+: Same patterns for GKE, OKE; AKS once a non-A10 SKU is current there.
Training overlays are deferred. L40 is rarely chosen for multi-node training; if a use case emerges, file a follow-up issue with the NCCL threshold spec.
Each PR should:
Out of scope (file separately)
- L40 multi-node training overlays — defer until requested.
- L40 vs L40S enum split — if SKU-specific config diverges, file a separate
pkg/recipe/criteria.go change.
- A10 / A40 / L4 — same Ada/Ampere inference-card class but distinct SKUs; not declared in
criteria.go today.
Related
Summary
The L40 accelerator is declared in
pkg/recipe/criteria.go(CriteriaAcceleratorL40 = \"l40\") but has zero overlays inrecipes/overlays/. A user runningaicr recipe --accelerator l40 --service <any>cannot resolve a usable recipe.Motivation / Context
Surfaced as an explicit "out of scope (track separately)" item in #969 — the validation phase coverage audit excluded L40 because no overlays exist to audit. Filing this so the gap has a dedicated tracker.
L40 / L40S are the workhorse inference cards (Ada Lovelace, 48GB) and are the preferred Cost/$TPS option on every hyperscaler for inference workloads. Their absence means AICR cannot serve the inference user segment without manual recipe authoring.
L40 / L40S cloud SKU availability
g6e.{2,4,8,12,16,24,48}xlarge(L40S, 1–8x)g2-standard-*(L4 and L40S variants)Standard_NV*ads_A10_v5(A10 — note: not L40), L40S on newer NVads-v6 series as it landsBM.GPU.L40S.4(4x L40S 48GB)Note: This issue intentionally groups L40 and L40S because the criteria type is a single
l40enum value. If L40S vs L40 needs distinct overlays (different power/TDP, different NCCL profiles), file a separate issue to split the enum.Suggested scope
Minimum viable for the first PR, extensions as follow-ups:
PR 1 (minimum): L40S on EKS (best-attested cloud):
l40-eks-inference.yaml(primary use case)l40-eks-ubuntu-inference.yamll40-eks-ubuntu-inference-dynamo.yamland / orl40-eks-ubuntu-inference-nim.yaml(matches the H100 inference pattern)Deployment.gpu-operator.versionfloor — L40 supported since v23.6; recommend>= v23.9.0baselinePR 2+: Same patterns for GKE, OKE; AKS once a non-A10 SKU is current there.
Training overlays are deferred. L40 is rarely chosen for multi-node training; if a use case emerges, file a follow-up issue with the NCCL threshold spec.
Each PR should:
recipes/registry.yamlif accelerator-specific component pins are neededTestOverlayValidationPhaseFloor(deployment + conformance inherited from service-root via PR feat(recipe): deliver deployment-phase floor at per-accelerator wildcards #1001)make bom-docs) if any chart pin differsOut of scope (file separately)
pkg/recipe/criteria.gochange.criteria.gotoday.Related
recipes/overlays/h100-*-inference*.yaml(reference pattern)recipes/overlays/rtx-pro-6000-lke-*.yaml(single-node inference card pattern reference)