docs: document criteria-wildcard overlay composition by yuanchen8911 · Pull Request #657 · NVIDIA/aicr

yuanchen8911 · 2026-04-23T19:37:38Z

Summary

Document the criteria-wildcard overlay pattern (e.g., gb200-any-training.yaml, b200-any-training.yaml) as a third composition mechanism alongside spec.base inheritance and spec.mixins composition. Adds header comments to the two existing wildcard overlays so readers don't mistake them for dead code.

Motivation / Context

The -any-* overlays are picked up by the resolver via wildcard criteria matching, not by any explicit spec.base or spec.mixins reference. This behavior is used in production (e.g., the GB200 NCCL bandwidth target applies to every GB200 + training query regardless of service), but wasn't explained anywhere user-facing. A reviewer reasonably assumed gb200-any-training.yaml was dead code because nothing references it — the docs didn't surface the third composition mechanism.

Fixes: N/A
Related: ADR-005: Overlay Refactoring

Type of Change

Documentation update

Component(s) Affected

Recipe engine / data (pkg/recipe)
Docs/examples (docs/, examples/)

Implementation Notes

docs/contributor/data.md — Adds a ### Criteria-Wildcard Overlays subsection under ## Multi-Level Inheritance, alongside the existing ### Mixin Composition section. Covers:

How the resolver's all-match behavior drives this pattern
Worked example with gb200-any-training.yaml and resulting appliedOverlays list
-any- naming convention
When to prefer criteria-wildcard overlays vs mixins (table)
Caveat: only appropriate when values are uniform across the wildcard dimension (H100 NCCL targets diverge by cloud, so they correctly stay inline)

docs/integrator/recipe-development.md — Adds a cross-cutting-overlays bullet to the overview and a short inline example. Links to the contributor-doc section for details.

recipes/overlays/{gb200,b200}-any-training.yaml — 8-line header comment on each, explaining the pattern and pointing to the doc section.

No code changes. No schema changes. Pure documentation + YAML comments.

Testing

yamllint recipes/overlays/gb200-any-training.yaml recipes/overlays/b200-any-training.yaml
# Exit 0 — no issues

# Sanity check: gb200-any-training still loads and contributes nccl-all-reduce-bw >= 720
aicr recipe --service eks --accelerator gb200 --intent training --os ubuntu | grep -E "gb200-any-training|>= 720"
# appliedOverlays includes gb200-any-training
# nccl-all-reduce-bw constraint with value '>= 720' present in hydrated recipe

Full make qualify skipped per CLAUDE.local.md for doc-only + YAML-comment-only changes. make lint skipped because no Go files changed (no golangci-lint gate applies). YAML changes are comment additions only — cannot affect overlay parsing or merge behavior. Sanity-queried the recipe to confirm no regression.

Risk Assessment

Low — Doc-only and YAML-comment-only. No behavior change.

Rollout notes: N/A

Checklist

I updated docs if user-facing behavior changed (this PR is the doc update)
Changes follow existing patterns in the codebase
Commits are cryptographically signed (git commit -S)

coderabbitai · 2026-04-23T19:44:54Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds documentation describing "criteria-wildcard overlays" and clarifies resolver behavior when criteria matching yields multiple independent maximal leaves: candidate de-duplication filters out ancestor matches, each maximal leaf’s spec.base chain is resolved independently, and results are merged (reintroducing ancestors via inheritance-chain resolution). Revises Example 4 and the Recipe Generation Process mermaid flow to reflect base-as-seed, maximal-leaf handling, and parallel chain merges. Adds an end-to-end wildcard-overlay example showing appliedOverlays, contrasts wildcard overlays with mixins (with guidance on uniformity across wildcard dimensions), and inserts header comments into two overlay YAML files. Documentation-only changes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main change: documenting the criteria-wildcard overlay composition mechanism as a new pattern in the recipe engine.
Description check	✅ Passed	The description is directly related to the changeset, clearly explaining the motivation, implementation details, testing performed, and risk assessment for the documentation updates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/contributor/data.md`:
- Around line 471-479: The fenced code block that shows the YAML structure
beginning with "appliedOverlays:" is missing a language identifier; update that
fenced block to include the "yaml" language tag (e.g., change ``` to ```yaml) so
markdownlint stops flagging it and the YAML list (appliedOverlays, - base, -
gb200-any-training, etc.) is correctly annotated.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 22b7b3a3-6df9-4c89-9d46-8559e4385876

📥 Commits

Reviewing files that changed from the base of the PR and between 997589a and 346a85e.

📒 Files selected for processing (4)

docs/contributor/data.md
docs/integrator/recipe-development.md
recipes/overlays/b200-any-training.yaml
recipes/overlays/gb200-any-training.yaml

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/contributor/data.md (1)
471-479: ⚠️ Potential issue | 🟡 Minor

Add language identifier to fenced code block.

The code block showing the appliedOverlays list is missing a language specifier. This was flagged by markdownlint and in a previous review comment.
📝 Proposed fix
-```
+```yaml
 appliedOverlays:
   - base
   - monitoring-hpa
As per coding guidelines, all fenced code blocks should have a language specified.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/contributor/data.md` around lines 471 - 479, The fenced code block that
shows the appliedOverlays list is missing a language identifier; update the
opening fence from ``` to ```yaml so the block containing "appliedOverlays", "-
base", "- monitoring-hpa", "- gb200-any-training", "- eks", "- eks-training", "-
gb200-eks-training" is marked as YAML and satisfies markdownlint and the
project's coding guidelines.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/contributor/data.md`:
- Line 469: Replace the phrase "merged together" with the simpler word "merged"
in the sentence that describes how the two maximal leaves (`gb200-eks-training`
and `gb200-any-training`) are combined; locate the line containing "When a query
specifies `{service: eks, accelerator: gb200, intent: training}`" and update the
ending to read "...Their inheritance chains are resolved and merged."

---

Duplicate comments:
In `@docs/contributor/data.md`:
- Around line 471-479: The fenced code block that shows the appliedOverlays list
is missing a language identifier; update the opening fence from ``` to ```yaml
so the block containing "appliedOverlays", "- base", "- monitoring-hpa", "-
gb200-any-training", "- eks", "- eks-training", "- gb200-eks-training" is marked
as YAML and satisfies markdownlint and the project's coding guidelines.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 60cef04c-9378-4fc6-8aa7-b3daf8603562

📥 Commits

Reviewing files that changed from the base of the PR and between 346a85e and 03fad42.

📒 Files selected for processing (4)

docs/contributor/data.md
docs/integrator/recipe-development.md
recipes/overlays/b200-any-training.yaml
recipes/overlays/gb200-any-training.yaml

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/contributor/data.md`:
- Line 706: Add a blank line immediately before the fenced code block starting
with "```yaml" in the docs contributor file so the Markdown linter passes;
locate the fenced block (the snippet beginning with "```yaml" and the YAML
content "User Query: { service: "eks", os: "ubuntu", accelerator: "gb200",
intent: "training" }") and insert one blank line above it.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: c0821ab6-5da3-4538-b5f0-961e73eb377f

📥 Commits

Reviewing files that changed from the base of the PR and between f3ff9ed and 66569c3.

📒 Files selected for processing (4)

docs/contributor/data.md
docs/integrator/recipe-development.md
recipes/overlays/b200-any-training.yaml
recipes/overlays/gb200-any-training.yaml

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

docs/contributor/data.md (1)

705-706: ⚠️ Potential issue | 🟡 Minor

Add a blank line before the fenced YAML block.

Markdownlint MD031 is still triggered here; insert one empty line between the heading line and ```yaml.

Proposed fix

 **Example 4: Multiple Maximal Matches (Fully Specific Query)**
+
 ```yaml
 User Query: { service: "eks", os: "ubuntu", accelerator: "gb200", intent: "training" }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/contributor/data.md` around lines 705 - 706, Add a single blank line
between the heading "**Example 4: Multiple Maximal Matches (Fully Specific
Query)**" and the following fenced YAML block delimiter "```yaml" to satisfy
markdownlint MD031; locate the heading text and insert one empty line
immediately after it so the file shows a blank line before the ```yaml fence.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/contributor/data.md`:
- Around line 450-470: Update the docs/example in docs/contributor/data.md to
explicitly call out the GB200-on-EKS exception: add a short note explaining that
GB200 on EKS is handled by transport-explicit variants (nccl-all-reduce-bw-net /
nccl-all-reduce-bw-nvls) and NOT matched by the default nccl-all-reduce-bw
validator; adjust the example and/or wildcard explanation for the
gb200-any-training.yaml snippet so it does not imply EKS will match the default
nccl-all-reduce-bw rule, and reference the implementation in
validators/performance/nccl_all_reduce_bw_constraint.go (mentioning
variantDefault vs variantNET/variantNVLS) to guide readers to the authoritative
behavior.

---

Duplicate comments:
In `@docs/contributor/data.md`:
- Around line 705-706: Add a single blank line between the heading "**Example 4:
Multiple Maximal Matches (Fully Specific Query)**" and the following fenced YAML
block delimiter "```yaml" to satisfy markdownlint MD031; locate the heading text
and insert one empty line immediately after it so the file shows a blank line
before the ```yaml fence.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 3dedd548-6b36-47de-8d0b-35e0b43f3fa3

📥 Commits

Reviewing files that changed from the base of the PR and between 66569c3 and 347dc2e.

📒 Files selected for processing (4)

docs/contributor/data.md
docs/integrator/recipe-development.md
recipes/overlays/b200-any-training.yaml
recipes/overlays/gb200-any-training.yaml

mchmarny

Good PR — this fills a real documentation gap. The criteria-wildcard pattern is non-obvious and the "dead code" misunderstanding is exactly the failure mode you'd expect without docs.

Verified the technical claims against the Go code (FindMatchingOverlays, filterToMaximalLeaves, initBaseMergedSpec, Specificity()) — the documentation accurately describes the resolver mechanics. The mermaid flowchart correction (removing the incorrectly-shown gb200-eks-ubuntu-training for a no-os query) is a good fix of a pre-existing error.

One thing to fix: the recipe-development.md inline YAML example omits the checks: field that exists in both the actual file and the data.md example — see inline comment.

Minor suggestions: readability of the opening paragraph, and a note about merge ordering when wildcard and specific overlays set the same constraint name.

mchmarny · 2026-04-23T21:03:24Z

+    service: any         # Wildcard — matches eks, oke, gke, etc.
+    accelerator: gb200
+    intent: training
+  validation:
+    performance:
+      constraints:
+        - name: nccl-all-reduce-bw
+          value: ">= 720"
+```
+
+Only use this pattern when the content is truly uniform across the wildcard dimension — if values diverge per service, keep them inline in each service-specific overlay. See [Data Architecture](../contributor/data.md#criteria-wildcard-overlays) for when to use wildcard overlays vs mixins.


This example is labeled # gb200-any-training.yaml but omits the checks: field. The actual file has both checks: and constraints: under validation.performance, and the data.md example in this same PR includes them correctly:

validation: performance: checks: # ← missing here - nccl-all-reduce-bw constraints: - name: nccl-all-reduce-bw value: \">= 720\"

Since the comment implies this is the file content, readers will expect it to match. Add the checks: block to keep the two doc pages consistent.

mchmarny · 2026-04-23T21:03:35Z

 - Conflict detection: a mixin constraint or component that conflicts with the inheritance chain or a previously applied mixin produces an error
 - When a snapshot is provided, mixin constraints are evaluated against it after merging; if any fail, the entire composed candidate is invalid and falls back to base-only output. In plain query mode (no snapshot), mixin constraints are merged but not evaluated

+### Criteria-Wildcard Overlays


Nit: this sentence packs a lot of machinery into one paragraph. Consider splitting it into two sentences after "...not just one". The parenthetical reference to two different anchor links mid-sentence makes it harder to parse on first read.

Suggestion:

The resolver picks them up automatically because FindMatchingOverlays can return multiple independent maximal-leaf overlays for a single query. The criteria matching algorithm returns every overlay that matches, and ancestors of a matched leaf are filtered out — but sibling leaves whose criteria independently match are kept and their inheritance chains are resolved and merged in parallel. See Criteria Matching Algorithm and Recipe Generation Process for details.

Moved the two xref links to a trailing "See … for details." sentence; kept the middle sentence intact since its "siblings kept, ancestors filtered" contrast is the whole point of the section.

mchmarny · 2026-04-23T21:03:44Z

+| Content applies based on query criteria | Content applies based on explicit opt-in |
+| The set of consumers is determined by criteria matching | The set of consumers is an enumerated list of overlays |
+| Adopt-by-default is desired for new matching overlays | Each consumer should reference it explicitly |


Good table. One gap: it doesn't mention ordering/priority. When a criteria-wildcard overlay and a service-specific leaf both set the same constraint name, which wins? The merge algorithm section says "same-named constraints are overridden" — so ordering matters. Worth adding a row or a note clarifying that wildcard overlays (lower specificity) are merged first and can be overridden by more-specific leaves.

Added a "Precedence" paragraph after the table with a concrete gb200-any-training vs gb200-eks-training example. Verified against FindMatchingOverlays (pkg/recipe/metadata_store.go:280, sort ascending on Specificity()) — wildcards merge first, more-specific leaves override.

Went with a paragraph rather than a table row since precedence is cross-cutting, not a wildcard-vs-mixin distinction.

mchmarny · 2026-04-23T21:03:56Z

+(base.yaml is the root spec, not an overlay candidate: FindMatchingOverlays
+iterates s.Overlays only. The base spec is always applied as the seed for
+the merged output — it is not selected by criteria matching.)
+
+Maximal leaves (after filterToMaximalLeaves):
+  - monitoring-hpa             (no matching descendant)
+  - gb200-any-training         (no matching descendant)
+  - gb200-eks-ubuntu-training  (most-specific overlay; eks, eks-training,
+                                gb200-eks-training are ancestors and are filtered out)
+
+Result: Each maximal leaf's inheritance chain is resolved and merged onto
+the base spec. Ancestors removed by the filter re-enter the output via
+chain resolution (step 3), so the final appliedOverlays is
+[base, monitoring-hpa, gb200-any-training, eks, eks-training,
+gb200-eks-training, gb200-eks-ubuntu-training].
 ```



This is a big improvement over the original Example 4. The old version incorrectly showed gb200-eks-ubuntu-training matching a query that included os: "ubuntu" without explaining the maximal-leaf filtering. The new version clearly shows the pre-filter → filter → result pipeline, and the parenthetical about base.yaml being held in s.Base separately is a useful clarification.

One minor thing: monitoring-hpa has intent: any which means all its criteria are wildcards — the label says "Specificity: 0" which is correct per the Specificity() implementation. But the table shows it alongside eks at "Specificity: 1" in the numbering (rows 1 and 2). Might be clearer to add a column label like "(row number)" vs "(specificity score)" so readers don't conflate the two.

mchmarny · 2026-04-23T21:04:03Z

+# This overlay is NOT referenced by any recipe via spec.base or spec.mixins.
+# The resolver picks it up for every GB200+training query because its
+# `service: any` criterion wildcard-matches any service (EKS, OKE, etc.),
+# contributing the GB200-wide NCCL bandwidth target without duplicating it
+# in each service-specific overlay.
+#
+# See docs/contributor/data.md#criteria-wildcard-overlays for details.
+


Good addition — this is exactly the kind of header comment that prevents someone from deleting the file thinking it's dead code. The cross-reference to the doc section is a nice touch.

yuanchen8911 · 2026-04-23T22:54:24Z

@mchmarny thanks for the review — fixed all four. PTAL.

Clarify the third composition mechanism alongside spec.base inheritance and spec.mixins composition. The resolver returns every overlay whose criteria match a query, so overlays like gb200-any-training.yaml and b200-any-training.yaml get applied automatically without being referenced by any other recipe. - docs/contributor/data.md: add Criteria-Wildcard Overlays subsection with worked example, naming convention, and when-to-use-which guidance comparing wildcard overlays vs mixins - docs/integrator/recipe-development.md: add cross-cutting overlay bullet to the overview and a short inline example - recipes/overlays/{gb200,b200}-any-training.yaml: add header comments pointing to the new doc section so readers don't assume these files are dead code

yuanchen8911 requested review from a team as code owners April 23, 2026 19:37

yuanchen8911 added area/recipes area/docs P2 Minor defects; minor implications (no SLA commitment) documentation labels Apr 23, 2026

github-actions Bot added the size/M label Apr 23, 2026

yuanchen8911 requested review from ayuskauskas and mchmarny April 23, 2026 19:40

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread docs/contributor/data.md Outdated

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch 2 times, most recently from 03fad42 to f3ff9ed Compare April 23, 2026 19:47

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread docs/contributor/data.md Outdated

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch 3 times, most recently from d625238 to 347dc2e Compare April 23, 2026 19:57

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread docs/contributor/data.md

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch from 347dc2e to ccf5686 Compare April 23, 2026 20:01

coderabbitai Bot reviewed Apr 23, 2026

View reviewed changes

Comment thread docs/contributor/data.md

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch from ccf5686 to 7fe6fb3 Compare April 23, 2026 20:05

github-actions Bot added size/L and removed size/M labels Apr 23, 2026

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch from 7fe6fb3 to ead34b5 Compare April 23, 2026 20:41

mchmarny reviewed Apr 23, 2026

View reviewed changes

mchmarny removed the P2 Minor defects; minor implications (no SLA commitment) label Apr 23, 2026

mchmarny assigned yuanchen8911 Apr 23, 2026

yuanchen8911 requested a review from mchmarny April 23, 2026 22:53

yuanchen8911 enabled auto-merge (squash) April 23, 2026 23:19

yuanchen8911 modified the milestone: v0.12 Apr 23, 2026

yuanchen8911 added P1 Usable but not as documented; workaround exists (6-month SLA) P2 Minor defects; minor implications (no SLA commitment) and removed P1 Usable but not as documented; workaround exists (6-month SLA) P2 Minor defects; minor implications (no SLA commitment) labels Apr 23, 2026

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch 2 times, most recently from 363fb81 to c72a957 Compare April 24, 2026 04:26

yuanchen8911 force-pushed the docs/overlay-composition-mechanisms branch from c72a957 to 4d9630a Compare April 24, 2026 04:39

mchmarny approved these changes Apr 24, 2026

View reviewed changes

Merge branch 'main' into docs/overlay-composition-mechanisms

f83765d

yuanchen8911 merged commit fd16996 into NVIDIA:main Apr 24, 2026
65 checks passed

Conversation

yuanchen8911 commented Apr 23, 2026

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Testing

Risk Assessment

Checklist

Uh oh!

coderabbitai Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

mchmarny Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mchmarny Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuanchen8911 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

mchmarny Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuanchen8911 Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

mchmarny Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mchmarny Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

yuanchen8911 commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Apr 23, 2026 •

edited

Loading