docs: document criteria-wildcard overlay composition#657
docs: document criteria-wildcard overlay composition#657yuanchen8911 merged 2 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds documentation describing "criteria-wildcard overlays" and clarifies resolver behavior when criteria matching yields multiple independent maximal leaves: candidate de-duplication filters out ancestor matches, each maximal leaf’s Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/contributor/data.md`:
- Around line 471-479: The fenced code block that shows the YAML structure
beginning with "appliedOverlays:" is missing a language identifier; update that
fenced block to include the "yaml" language tag (e.g., change ``` to ```yaml) so
markdownlint stops flagging it and the YAML list (appliedOverlays, - base, -
gb200-any-training, etc.) is correctly annotated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 22b7b3a3-6df9-4c89-9d46-8559e4385876
📒 Files selected for processing (4)
docs/contributor/data.mddocs/integrator/recipe-development.mdrecipes/overlays/b200-any-training.yamlrecipes/overlays/gb200-any-training.yaml
03fad42 to
f3ff9ed
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/contributor/data.md (1)
471-479:⚠️ Potential issue | 🟡 MinorAdd language identifier to fenced code block.
The code block showing the
appliedOverlayslist is missing a language specifier. This was flagged by markdownlint and in a previous review comment.📝 Proposed fix
-``` +```yaml appliedOverlays: - base - monitoring-hpaAs per coding guidelines, all fenced code blocks should have a language specified.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/contributor/data.md` around lines 471 - 479, The fenced code block that shows the appliedOverlays list is missing a language identifier; update the opening fence from ``` to ```yaml so the block containing "appliedOverlays", "- base", "- monitoring-hpa", "- gb200-any-training", "- eks", "- eks-training", "- gb200-eks-training" is marked as YAML and satisfies markdownlint and the project's coding guidelines.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/contributor/data.md`:
- Line 469: Replace the phrase "merged together" with the simpler word "merged"
in the sentence that describes how the two maximal leaves (`gb200-eks-training`
and `gb200-any-training`) are combined; locate the line containing "When a query
specifies `{service: eks, accelerator: gb200, intent: training}`" and update the
ending to read "...Their inheritance chains are resolved and merged."
---
Duplicate comments:
In `@docs/contributor/data.md`:
- Around line 471-479: The fenced code block that shows the appliedOverlays list
is missing a language identifier; update the opening fence from ``` to ```yaml
so the block containing "appliedOverlays", "- base", "- monitoring-hpa", "-
gb200-any-training", "- eks", "- eks-training", "- gb200-eks-training" is marked
as YAML and satisfies markdownlint and the project's coding guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 60cef04c-9378-4fc6-8aa7-b3daf8603562
📒 Files selected for processing (4)
docs/contributor/data.mddocs/integrator/recipe-development.mdrecipes/overlays/b200-any-training.yamlrecipes/overlays/gb200-any-training.yaml
d625238 to
347dc2e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/contributor/data.md`:
- Line 706: Add a blank line immediately before the fenced code block starting
with "```yaml" in the docs contributor file so the Markdown linter passes;
locate the fenced block (the snippet beginning with "```yaml" and the YAML
content "User Query: { service: "eks", os: "ubuntu", accelerator: "gb200",
intent: "training" }") and insert one blank line above it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: c0821ab6-5da3-4538-b5f0-961e73eb377f
📒 Files selected for processing (4)
docs/contributor/data.mddocs/integrator/recipe-development.mdrecipes/overlays/b200-any-training.yamlrecipes/overlays/gb200-any-training.yaml
347dc2e to
ccf5686
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
docs/contributor/data.md (1)
705-706:⚠️ Potential issue | 🟡 MinorAdd a blank line before the fenced YAML block.
Markdownlint MD031 is still triggered here; insert one empty line between the heading line and ```yaml.
Proposed fix
**Example 4: Multiple Maximal Matches (Fully Specific Query)** + ```yaml User Query: { service: "eks", os: "ubuntu", accelerator: "gb200", intent: "training" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/contributor/data.md` around lines 705 - 706, Add a single blank line between the heading "**Example 4: Multiple Maximal Matches (Fully Specific Query)**" and the following fenced YAML block delimiter "```yaml" to satisfy markdownlint MD031; locate the heading text and insert one empty line immediately after it so the file shows a blank line before the ```yaml fence.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/contributor/data.md`:
- Around line 450-470: Update the docs/example in docs/contributor/data.md to
explicitly call out the GB200-on-EKS exception: add a short note explaining that
GB200 on EKS is handled by transport-explicit variants (nccl-all-reduce-bw-net /
nccl-all-reduce-bw-nvls) and NOT matched by the default nccl-all-reduce-bw
validator; adjust the example and/or wildcard explanation for the
gb200-any-training.yaml snippet so it does not imply EKS will match the default
nccl-all-reduce-bw rule, and reference the implementation in
validators/performance/nccl_all_reduce_bw_constraint.go (mentioning
variantDefault vs variantNET/variantNVLS) to guide readers to the authoritative
behavior.
---
Duplicate comments:
In `@docs/contributor/data.md`:
- Around line 705-706: Add a single blank line between the heading "**Example 4:
Multiple Maximal Matches (Fully Specific Query)**" and the following fenced YAML
block delimiter "```yaml" to satisfy markdownlint MD031; locate the heading text
and insert one empty line immediately after it so the file shows a blank line
before the ```yaml fence.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 3dedd548-6b36-47de-8d0b-35e0b43f3fa3
📒 Files selected for processing (4)
docs/contributor/data.mddocs/integrator/recipe-development.mdrecipes/overlays/b200-any-training.yamlrecipes/overlays/gb200-any-training.yaml
ccf5686 to
7fe6fb3
Compare
7fe6fb3 to
ead34b5
Compare
mchmarny
left a comment
There was a problem hiding this comment.
Good PR — this fills a real documentation gap. The criteria-wildcard pattern is non-obvious and the "dead code" misunderstanding is exactly the failure mode you'd expect without docs.
Verified the technical claims against the Go code (FindMatchingOverlays, filterToMaximalLeaves, initBaseMergedSpec, Specificity()) — the documentation accurately describes the resolver mechanics. The mermaid flowchart correction (removing the incorrectly-shown gb200-eks-ubuntu-training for a no-os query) is a good fix of a pre-existing error.
One thing to fix: the recipe-development.md inline YAML example omits the checks: field that exists in both the actual file and the data.md example — see inline comment.
Minor suggestions: readability of the opening paragraph, and a note about merge ordering when wildcard and specific overlays set the same constraint name.
| service: any # Wildcard — matches eks, oke, gke, etc. | ||
| accelerator: gb200 | ||
| intent: training | ||
| validation: | ||
| performance: | ||
| constraints: | ||
| - name: nccl-all-reduce-bw | ||
| value: ">= 720" | ||
| ``` | ||
|
|
||
| Only use this pattern when the content is truly uniform across the wildcard dimension — if values diverge per service, keep them inline in each service-specific overlay. See [Data Architecture](../contributor/data.md#criteria-wildcard-overlays) for when to use wildcard overlays vs mixins. |
There was a problem hiding this comment.
This example is labeled # gb200-any-training.yaml but omits the checks: field. The actual file has both checks: and constraints: under validation.performance, and the data.md example in this same PR includes them correctly:
validation:
performance:
checks: # ← missing here
- nccl-all-reduce-bw
constraints:
- name: nccl-all-reduce-bw
value: \">= 720\"Since the comment implies this is the file content, readers will expect it to match. Add the checks: block to keep the two doc pages consistent.
| - Conflict detection: a mixin constraint or component that conflicts with the inheritance chain or a previously applied mixin produces an error | ||
| - When a snapshot is provided, mixin constraints are evaluated against it after merging; if any fail, the entire composed candidate is invalid and falls back to base-only output. In plain query mode (no snapshot), mixin constraints are merged but not evaluated | ||
|
|
||
| ### Criteria-Wildcard Overlays |
There was a problem hiding this comment.
Nit: this sentence packs a lot of machinery into one paragraph. Consider splitting it into two sentences after "...not just one". The parenthetical reference to two different anchor links mid-sentence makes it harder to parse on first read.
Suggestion:
The resolver picks them up automatically because
FindMatchingOverlayscan return multiple independent maximal-leaf overlays for a single query. The criteria matching algorithm returns every overlay that matches, and ancestors of a matched leaf are filtered out — but sibling leaves whose criteria independently match are kept and their inheritance chains are resolved and merged in parallel. See Criteria Matching Algorithm and Recipe Generation Process for details.
There was a problem hiding this comment.
Moved the two xref links to a trailing "See … for details." sentence; kept the middle sentence intact since its "siblings kept, ancestors filtered" contrast is the whole point of the section.
| | Content applies based on query criteria | Content applies based on explicit opt-in | | ||
| | The set of consumers is determined by criteria matching | The set of consumers is an enumerated list of overlays | | ||
| | Adopt-by-default is desired for new matching overlays | Each consumer should reference it explicitly | |
There was a problem hiding this comment.
Good table. One gap: it doesn't mention ordering/priority. When a criteria-wildcard overlay and a service-specific leaf both set the same constraint name, which wins? The merge algorithm section says "same-named constraints are overridden" — so ordering matters. Worth adding a row or a note clarifying that wildcard overlays (lower specificity) are merged first and can be overridden by more-specific leaves.
There was a problem hiding this comment.
Added a "Precedence" paragraph after the table with a concrete gb200-any-training vs gb200-eks-training example. Verified against FindMatchingOverlays (pkg/recipe/metadata_store.go:280, sort ascending on Specificity()) — wildcards merge first, more-specific leaves override.
Went with a paragraph rather than a table row since precedence is cross-cutting, not a wildcard-vs-mixin distinction.
| (base.yaml is the root spec, not an overlay candidate: FindMatchingOverlays | ||
| iterates s.Overlays only. The base spec is always applied as the seed for | ||
| the merged output — it is not selected by criteria matching.) | ||
|
|
||
| Maximal leaves (after filterToMaximalLeaves): | ||
| - monitoring-hpa (no matching descendant) | ||
| - gb200-any-training (no matching descendant) | ||
| - gb200-eks-ubuntu-training (most-specific overlay; eks, eks-training, | ||
| gb200-eks-training are ancestors and are filtered out) | ||
|
|
||
| Result: Each maximal leaf's inheritance chain is resolved and merged onto | ||
| the base spec. Ancestors removed by the filter re-enter the output via | ||
| chain resolution (step 3), so the final appliedOverlays is | ||
| [base, monitoring-hpa, gb200-any-training, eks, eks-training, | ||
| gb200-eks-training, gb200-eks-ubuntu-training]. | ||
| ``` | ||
|
|
There was a problem hiding this comment.
This is a big improvement over the original Example 4. The old version incorrectly showed gb200-eks-ubuntu-training matching a query that included os: "ubuntu" without explaining the maximal-leaf filtering. The new version clearly shows the pre-filter → filter → result pipeline, and the parenthetical about base.yaml being held in s.Base separately is a useful clarification.
One minor thing: monitoring-hpa has intent: any which means all its criteria are wildcards — the label says "Specificity: 0" which is correct per the Specificity() implementation. But the table shows it alongside eks at "Specificity: 1" in the numbering (rows 1 and 2). Might be clearer to add a column label like "(row number)" vs "(specificity score)" so readers don't conflate the two.
| # This overlay is NOT referenced by any recipe via spec.base or spec.mixins. | ||
| # The resolver picks it up for every GB200+training query because its | ||
| # `service: any` criterion wildcard-matches any service (EKS, OKE, etc.), | ||
| # contributing the GB200-wide NCCL bandwidth target without duplicating it | ||
| # in each service-specific overlay. | ||
| # | ||
| # See docs/contributor/data.md#criteria-wildcard-overlays for details. | ||
|
|
There was a problem hiding this comment.
Good addition — this is exactly the kind of header comment that prevents someone from deleting the file thinking it's dead code. The cross-reference to the doc section is a nice touch.
|
@mchmarny thanks for the review — fixed all four. PTAL. |
363fb81 to
c72a957
Compare
Clarify the third composition mechanism alongside spec.base inheritance
and spec.mixins composition. The resolver returns every overlay whose
criteria match a query, so overlays like gb200-any-training.yaml and
b200-any-training.yaml get applied automatically without being referenced
by any other recipe.
- docs/contributor/data.md: add Criteria-Wildcard Overlays subsection
with worked example, naming convention, and when-to-use-which
guidance comparing wildcard overlays vs mixins
- docs/integrator/recipe-development.md: add cross-cutting overlay
bullet to the overview and a short inline example
- recipes/overlays/{gb200,b200}-any-training.yaml: add header comments
pointing to the new doc section so readers don't assume these files
are dead code
c72a957 to
4d9630a
Compare
Summary
Document the criteria-wildcard overlay pattern (e.g.,
gb200-any-training.yaml,b200-any-training.yaml) as a third composition mechanism alongsidespec.baseinheritance andspec.mixinscomposition. Adds header comments to the two existing wildcard overlays so readers don't mistake them for dead code.Motivation / Context
The
-any-*overlays are picked up by the resolver via wildcard criteria matching, not by any explicitspec.baseorspec.mixinsreference. This behavior is used in production (e.g., the GB200 NCCL bandwidth target applies to every GB200 + training query regardless of service), but wasn't explained anywhere user-facing. A reviewer reasonably assumedgb200-any-training.yamlwas dead code because nothing references it — the docs didn't surface the third composition mechanism.Fixes: N/A
Related: ADR-005: Overlay Refactoring
Type of Change
Component(s) Affected
pkg/recipe)docs/,examples/)Implementation Notes
docs/contributor/data.md— Adds a### Criteria-Wildcard Overlayssubsection under## Multi-Level Inheritance, alongside the existing### Mixin Compositionsection. Covers:gb200-any-training.yamland resultingappliedOverlayslist-any-naming conventiondocs/integrator/recipe-development.md— Adds a cross-cutting-overlays bullet to the overview and a short inline example. Links to the contributor-doc section for details.recipes/overlays/{gb200,b200}-any-training.yaml— 8-line header comment on each, explaining the pattern and pointing to the doc section.No code changes. No schema changes. Pure documentation + YAML comments.
Testing
Full
make qualifyskipped perCLAUDE.local.mdfor doc-only + YAML-comment-only changes.make lintskipped because no Go files changed (nogolangci-lintgate applies). YAML changes are comment additions only — cannot affect overlay parsing or merge behavior. Sanity-queried the recipe to confirm no regression.Risk Assessment
Rollout notes: N/A
Checklist
git commit -S)