docs(reference): add authoritative node labels and annotations reference by resker · Pull Request #254 · NVIDIA/topograph

resker · 2026-04-17T06:11:07Z

Summary

Adds docs/reference/node-labels.md — the authoritative reference for every label and annotation key written by Topograph, requested in #179.

Covers:

The four default label keys (network.topology.nvidia.com/{accelerator,leaf,spine,core}) with topology type and semantics
Per-provider matrix showing which providers emit accelerator (block) vs. leaf/spine/core (tree), including value format details (IB providers produce ClusterUUID.CliqueId; NetQ uses NMX DomainUUID)
Relationship to nvidia.com/gpu.clique — on MNNVL systems the IB accelerator value matches; on non-MNNVL systems (B200/B300) gpu.clique is not set at all, making Topograph the only topology source
FNV-64a hash truncation behavior for label values >63 chars
Helm topologyNodeLabels configuration for customizing key prefixes
"Without Topograph" reference — standard Kubernetes + cloud provider + GPU Operator labels that are available by default
topograph.nvidia.com/ annotation keys (internal bookkeeping)
NVSentinel integration pattern for topology-aware fault blast-radius analysis

Test plan

Every label key verified against pkg/engines/k8s/labeler.go constants
Provider matrix cross-checked against each provider's topology emission
ClusterUUID.CliqueId format verified against pkg/providers/infiniband/common.go (nvidia-smi -q | grep "ClusterUUID\|CliqueId") and NVIDIA/k8s-device-plugin/internal/lm/imex.go
DomainUUID format verified against pkg/providers/netq/nmx.go
GPU_FABRIC_STATE_COMPLETED behavior verified against NVIDIA/go-nvlib/pkg/nvlib/device/device.go:IsFabricAttached()
FNV-64a behavior verified against pkg/engines/k8s/labeler.go:checkLabel

Closes #179

Documents all Kubernetes node labels and annotations set by Topograph, including the four network.topology.nvidia.com/ labels written by the k8s and slinky engines, the topograph.nvidia.com/ annotation keys, FNV-64a hash truncation behavior for long values, and an NVSentinel integration example. Closes NVIDIA#179 Signed-off-by: Rob Esker <resker@nvidia.com>

… node-labels - Clarify accelerator value format per provider: IB providers use ClusterUUID.CliqueId (same as nvidia.com/gpu.clique), NetQ uses NMX DomainUUID (different identifier format) - Add note that gpu.clique is not set on non-MNNVL systems and Topograph is the only topology source in those environments - Add nvidia.com/gpu.clique to the "Without Topograph" label table Signed-off-by: Rob Esker <resker@nvidia.com>

The companion NVSentinel PR is not being filed at this time; removing the dormant-link trailer lets the NVSentinel integration section stand on its own as a self-contained reference. Signed-off-by: Rob Esker <resker@nvidia.com>

codecov · 2026-04-17T06:13:04Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.46%. Comparing base (1875ab8) to head (4480ab4).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #254   +/-   ##
=======================================
  Coverage   68.46%   68.46%           
=======================================
  Files          82       82           
  Lines        4842     4842           
=======================================
  Hits         3315     3315           
  Misses       1395     1395           
  Partials      132      132

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

greptile-apps · 2026-04-17T06:15:39Z

Greptile Summary

Adds docs/reference/node-labels.md as the authoritative reference for every label and annotation key written by Topograph (closes #179). All annotation key constants were verified against pkg/topology/topology.go, all four default label key constants were verified against pkg/engines/k8s/labeler.go, the FNV-64a hash truncation with x-prefix was confirmed from checkLabel(), the cw provider's incompatible vertex structure is now correctly documented, and the lambdai and netq block-value formats were cross-checked against their respective provider code.

Confidence Score: 5/5

Safe to merge — documentation only, all key facts verified against source code, one minor notation ambiguity that does not affect correctness.

All P0/P1 concerns from prior review rounds have been addressed: the x-prefixed FNV hash is now documented correctly, and cw/lambdai are now included in the matrix with accurate descriptions. The single remaining finding (lambdai value format notation) is a P2 style suggestion that does not affect functional correctness.

No files require special attention.

Important Files Changed

Filename	Overview
docs/reference/node-labels.md	New authoritative reference for all Topograph node labels and annotations; all annotation keys, label key constants, provider matrix entries, and hash-truncation behavior verified against source code — one minor notation ambiguity in the lambdai block value format.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    Provider["Provider\n(aws / gcp / oci / nebius / netq /\ninfiniband-bm / infiniband-k8s /\nlambdai / cw / dra)"]
    Provider -->|GenerateTopologyConfig| Root["Vertex root"]
    Root -->|topology/block| BlockRoot["Block root\n(optional)"]
    Root -->|topology/tree| TreeRoot["Tree root\n(optional)"]

    BlockRoot --> Block["Block vertex\n(NVLink domain / CapacityBlock)"]
    Block --> NodeB["Compute node"]

    TreeRoot --> Core["Core switch\n(optional)"]
    Core --> Spine["Spine switch\n(optional)"]
    Spine --> Leaf["Leaf switch"]
    Leaf --> NodeT["Compute node"]

    Block -->|checkLabel| AccLabel["network.topology.nvidia.com/accelerator"]
    Leaf -->|checkLabel| LeafLabel["network.topology.nvidia.com/leaf"]
    Spine -->|checkLabel| SpineLabel["network.topology.nvidia.com/spine"]
    Core -->|checkLabel| CoreLabel["network.topology.nvidia.com/core"]

    checkLabel{"len > 63?"}
    AccLabel --> checkLabel
    checkLabel -->|No| RawVal["raw value"]
    checkLabel -->|Yes| Hash["x-prefixed FNV-64a hex\ne.g. x3e4f1a2b3c4d5e6f"]

_{Reviews (4): Last reviewed commit: "docs(reference): fix broken engine links..." | Re-trigger Greptile}

greptile-apps · 2026-04-17T06:15:42Z

+
+### Label value behavior
+
+Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their **FNV-64a hash** (hex-encoded) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone.


FNV-64a hash value includes an x prefix

The description says "hex-encoded," but checkLabel() in pkg/engines/k8s/labeler.go formats the value as fmt.Sprintf("x%x", h.Sum64()) — the resulting label value is a lowercase hex string with a leading x (e.g., x3e4f1a2b3c4d5e6f), not a bare hex string. Operators who grep for or compare label values need to know this prefix is present.

Suggested change

Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their **FNV-64a hash** (hex-encoded) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone.

Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their **FNV-64a hash** (formatted as `x` followed by lowercase hex, e.g., `x3e4f1a2b3c4d5e6f`) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone.

The hash is rendered as fmt.Sprintf("x%x", h.Sum64()) in pkg/engines/k8s/labeler.go:checkLabel, producing an x-prefixed lowercase hex string. Previous wording said 'hex-encoded' without the prefix, which matters for operators parsing or filtering label values. Signed-off-by: Rob Esker <resker@nvidia.com>

greptile-apps · 2026-04-17T12:04:44Z

+|---|---|---|
+| `aws` | Yes (CapacityBlockId) | Yes |
+| `gcp` | No | Yes |
+| `oci` | No | Yes |
+| `nebius` | No | Yes |
+| `netq` | Yes (NMX `DomainUUID`) | Yes (Spectrum-X switch hierarchy) |
+| `dra` | Yes (reads `nvidia.com/gpu.clique`) | No |
+| `infiniband-bm` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) |
+| `infiniband-k8s` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) |
+


Provider matrix omits lambdai and cw

Two registered providers are absent from the matrix even though both write topology labels. lambdai (pkg/providers/lambdai/instance_topology.go) sets AcceleratorID = NVLink.DomainID + "." + NVLink.CliqueID and passes it through ToThreeTierGraph, so it emits both block (accelerator) and tree (leaf/spine/core) labels. cw (pkg/providers/cw/provider.go) calls ib.GenerateTopologyConfig and wraps the output as a bare tree root (no toGraph() call), so it emits tree-only labels.

Suggested additions to the matrix:

Suggested change

|---|---|---|

| `aws` | Yes (CapacityBlockId) | Yes |

| `gcp` | No | Yes |

| `oci` | No | Yes |

| `nebius` | No | Yes |

| `netq` | Yes (NMX `DomainUUID`) | Yes (Spectrum-X switch hierarchy) |

| `dra` | Yes (reads `nvidia.com/gpu.clique`) | No |

| `infiniband-bm` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) |

| `infiniband-k8s` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) |

| `aws` | Yes (CapacityBlockId) | Yes |

| `cw` | No | Yes (IB switch hierarchy) |

| `gcp` | No | Yes |

| `lambdai` | Yes (`NVLink.DomainID.CliqueID`) | Yes |

| `oci` | No | Yes |

| `nebius` | No | Yes |

| `netq` | Yes (NMX `DomainUUID`) | Yes (Spectrum-X switch hierarchy) |

| `dra` | Yes (reads `nvidia.com/gpu.clique`) | No |

| `infiniband-bm` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) |

| `infiniband-k8s` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) |

Folds high-value context from the unpublished blog draft into the repository docs so the blog can reference canonical sources rather than being the sole host of this material. - README.md: expand "Motivation and Problem Statement" with the disaggregated inference scenario — prefill/decode separation, KV cache transfer, NVLink vs. Ethernet performance cliff - docs/providers/infiniband.md: add a "why automate IB discovery" paragraph with the scale argument (hand-maintained labels work at 32 nodes, break at 1,000) - docs/providers/netq.md: add "Observed vs. Intended Topology" section describing NetQ's distinctive ability to observe degraded-but-up links via live telemetry - CONTRIBUTING.md: add "Community" section with Kubernetes Slack channels (#topology-aware-scheduling, #gpu-nvidia) and "Areas where contributions are especially welcome" covering new providers, on-prem fabrics, KEP-5732 Workload API, KEP-4962 upstream label schema, and Grove ClusterTopology integration The KEP-4962 note on docs/reference/node-labels.md is intentionally deferred to a follow-up PR once NVIDIA#254 lands, since node-labels.md does not yet exist on main. Signed-off-by: Rob Esker <resker@nvidia.com>

Folds high-value context from the unpublished blog draft into the repository docs so the blog can reference canonical sources rather than being the sole host of this material. - README.md: expand "Motivation and Problem Statement" with the disaggregated inference scenario — prefill/decode separation, KV cache transfer, NVLink vs. Ethernet performance cliff - docs/providers/infiniband.md: add a "why automate IB discovery" paragraph with the scale argument (hand-maintained labels work at 32 nodes, break at 1,000) - docs/providers/netq.md: add "Observed vs. Intended Topology" section describing NetQ's distinctive ability to observe degraded-but-up links via live telemetry - CONTRIBUTING.md: add "Community" section with Kubernetes Slack channels (#topology-aware-scheduling, #gpu-nvidia) and a pointer to the pinned roadmap/focus-areas issue The KEP-4962 note on docs/reference/node-labels.md is intentionally deferred to a follow-up PR once NVIDIA#254 lands, since node-labels.md does not yet exist on main. Forward-looking contribution areas are captured as a pinned GitHub issue rather than inline in CONTRIBUTING.md, per the convention that CONTRIBUTING.md describes how to contribute while roadmap/direction lives in a pinned issue. Signed-off-by: Rob Esker <resker@nvidia.com>

…264) Folds high-value context from the unpublished blog draft into the repository docs so the blog can reference canonical sources rather than being the sole host of this material. - README.md: expand "Motivation and Problem Statement" with the disaggregated inference scenario — prefill/decode separation, KV cache transfer, NVLink vs. Ethernet performance cliff - docs/providers/infiniband.md: add a "why automate IB discovery" paragraph with the scale argument (hand-maintained labels work at 32 nodes, break at 1,000) - docs/providers/netq.md: add "Observed vs. Intended Topology" section describing NetQ's distinctive ability to observe degraded-but-up links via live telemetry - CONTRIBUTING.md: add "Community" section with Kubernetes Slack channels (#topology-aware-scheduling, #gpu-nvidia) and a pointer to the pinned roadmap/focus-areas issue The KEP-4962 note on docs/reference/node-labels.md is intentionally deferred to a follow-up PR once #254 lands, since node-labels.md does not yet exist on main. Forward-looking contribution areas are captured as a pinned GitHub issue rather than inline in CONTRIBUTING.md, per the convention that CONTRIBUTING.md describes how to contribute while roadmap/direction lives in a pinned issue. Signed-off-by: Rob Esker <resker@nvidia.com>

Addresses Greptile's P1 finding on NVIDIA#254. Both are registered providers that write topology labels but were missing from the matrix: - lambdai: emits ClusterUUID.CliqueID-style accelerator via NVLink.DomainID + NVLink.CliqueID, plus tree labels via ToThreeTierGraph (pkg/providers/lambdai/instance_topology.go) - cw: calls ib.GenerateTopologyConfig and wraps the output as a bare tree root (pkg/providers/cw/provider.go), emitting tree labels only Signed-off-by: Rob Esker <resker@nvidia.com>

greptile-apps · 2026-04-18T03:37:24Z

+| Provider | Block (`accelerator`) | Tree (`leaf`/`spine`/`core`) |
+|---|---|---|
+| `aws` | Yes (CapacityBlockId) | Yes |
+| `cw` | No | Yes (IB switch hierarchy) |


cw tree column is incorrect — no labels are emitted

cw.GenerateTopologyConfig returns a raw treeRoot vertex where children are keyed by switch IDs, not wrapped under the "topology/tree" key. Both the k8s labeler (ApplyNodeLabels, labeler.go:77) and the Slinky engine's initTree (translate/topology.go:118) gate all tree processing on root.Vertices[topology.TopologyTree] — a key that is never present in the cw output. As a result, the cw provider produces no node labels at all through either engine.

Compare cw's return value:

// cw/provider.go — switch IDs as keys, no "topology/tree" wrapper treeRoot := &topology.Vertex{Vertices: make(map[string]*topology.Vertex)} for _, v := range roots { treeRoot.Vertices[v.ID] = v // keys are switch identifiers } return treeRoot, nil

With netq's correct structure:

// netq/provider.go — properly wrapped root.Vertices[topology.TopologyTree] = treeRoot

The tree column for cw should be "No" until the provider is updated to wrap its output, or the code should be fixed and the doc updated to "Yes" once the provider emits the expected structure.

Suggested change

| `cw` | No | Yes (IB switch hierarchy) |

| `cw` | No | No (vertex structure incompatible with labeler — see pkg/providers/cw/provider.go) |

Two fixes from the PR NVIDIA#254 review: - Fern Check was failing because lines 7 points at `../engines.md`, which is not a single file — `docs/engines/` is a directory with per-engine files. Fixed the Kubernetes and Slinky engine links to point at `../engines/k8s.md` and `../engines/slinky.md` respectively. - Greptile's latest P1 finding is correct: `cw.GenerateTopologyConfig` returns a tree root whose children are switch IDs directly, not wrapped under `topology.TopologyTree`. Both the k8s labeler (`ApplyNodeLabels`, `labeler.go:77`) and Slinky's `initTree` (`translate/topology.go:118`) gate processing on `root.Vertices[topology.TopologyTree]`, which is never present in cw's output. Compare with `netq/provider.go:92` which correctly wraps `root.Vertices[topology.TopologyTree] = treeRoot`. The cw provider emits zero labels today; updated the matrix row to reflect current behavior with a note about the underlying issue. Signed-off-by: Rob Esker <resker@nvidia.com>

Adds a short subsection under `## Labels` covering KEP-4962 ("Standardizing the Representation of Cluster Network Topology"), which is pre-GA and still under upstream review at kubernetes/enhancements#4962 (draft PR #4965). Notes that the KEP's framing allows vendor prefixes like `network.topology.nvidia.com/*` to coexist with the standard `topology.kubernetes.io/` keys rather than replace them, and that Topograph will evaluate aligning or providing both if and when the KEP reaches GA. For now, the `network.topology.nvidia.com/*` keys remain authoritative for Topograph-deployed clusters. This note was deferred from PR NVIDIA#264 because the target file (`docs/reference/node-labels.md`) did not yet exist on `main`; it was introduced by PR NVIDIA#254, which merged 2026-04-19. Signed-off-by: Rob Esker <resker@nvidia.com>

Adds back two rows to the Documentation Impact Evaluation table in `AGENTS.md` and `.claude/CLAUDE.md` that were removed from PR NVIDIA#269 to avoid cross-referencing content not yet on `main`: - Chart template row pointing at `docs/engines/k8s.md` "Exposing the Topograph API" section (added to main by PR NVIDIA#259) - Label or annotation key row pointing at `docs/reference/node-labels.md` (added to main by PR NVIDIA#254) Both gating PRs have now merged, so the rows can be restored without broken cross-references. Paired edit preserves the byte-identical invariant between `AGENTS.md` and `.claude/CLAUDE.md` from line 6 onward (verified with `cmp`). Signed-off-by: Rob Esker <resker@nvidia.com>

) * docs(reference): add KEP-4962 upstream-alignment note to node-labels.md Adds a short subsection under `## Labels` covering KEP-4962 ("Standardizing the Representation of Cluster Network Topology"), which is pre-GA and still under upstream review at kubernetes/enhancements#4962 (draft PR #4965). Notes that the KEP's framing allows vendor prefixes like `network.topology.nvidia.com/*` to coexist with the standard `topology.kubernetes.io/` keys rather than replace them, and that Topograph will evaluate aligning or providing both if and when the KEP reaches GA. For now, the `network.topology.nvidia.com/*` keys remain authoritative for Topograph-deployed clusters. This note was deferred from PR #264 because the target file (`docs/reference/node-labels.md`) did not yet exist on `main`; it was introduced by PR #254, which merged 2026-04-19. Signed-off-by: Rob Esker <resker@nvidia.com> * docs(agents): restore two Doc-Impact Evaluation table rows Adds back two rows to the Documentation Impact Evaluation table in `AGENTS.md` and `.claude/CLAUDE.md` that were removed from PR #269 to avoid cross-referencing content not yet on `main`: - Chart template row pointing at `docs/engines/k8s.md` "Exposing the Topograph API" section (added to main by PR #259) - Label or annotation key row pointing at `docs/reference/node-labels.md` (added to main by PR #254) Both gating PRs have now merged, so the rows can be restored without broken cross-references. Paired edit preserves the byte-identical invariant between `AGENTS.md` and `.claude/CLAUDE.md` from line 6 onward (verified with `cmp`). Signed-off-by: Rob Esker <resker@nvidia.com> --------- Signed-off-by: Rob Esker <resker@nvidia.com>

…284) * docs(fern): restore Reference section to nav `docs/reference/node-labels.md` has been on main since PR #254 merged (2026-04-19) but is not listed in `docs/index.yml`, the Fern nav source-of-truth. The last successful Fern publish (2026-04-20T16:52Z) resolved 13 pages — exactly the count declared in the nav without a Reference section — confirming the page is invisible on the live site at https://topograph.docs.buildwithfern.com/topograph despite being on the filesystem. Add a Reference section listing `reference/node-labels.md`. This is the paired-file update that PR #254 should have included. Verified filesystem vs. nav discrepancy: | docs/ file | In index.yml? | |----------------------------------|---------------| | overview.md | yes | | architecture.md + api.md | yes | | providers/{aws,gcp,oci,nebius,...| yes (7) | | engines/{slurm,k8s,slinky}.md | yes (3) | | reference/node-labels.md | NO (fixed) | Signed-off-by: Rob Esker <resker@nvidia.com> * docs(fern): label current version as v0.3.0 The published Fern site shows a version label of "dev" because `fern/docs.yml` declared the only version as `display-name: dev`. The repo is at tagged release v0.3.0 with no divergence between the released content and the in-flight docs, so "dev" is misleading: readers see "development docs" framing when what they're actually looking at corresponds to the v0.3.0 tag. Rename the version to `v0.3.0`. No content split yet — both versions would point at the same `docs/index.yml` anyway. When content actually diverges (post-v0.4.0 breaking doc changes, or a doc change that only applies to the future release), re-introduce a separate `dev` entry alongside v0.3.0 and repoint paths to differentiated content. Drop the pre-existing `/topograph/dev/index.html` -> `/topograph/dev/` redirect because the `/dev/` destination no longer exists after the rename. Not adding `/dev` -> new-version forwarding: site has only been live since 2026-04-17 (four days), so dev-URL bookmarks are unlikely; the Fern root redirect + sidebar navigation recovers any user who lands on a 404. If dead-bookmark complaints surface later, adding a one-line redirect is trivial. Verified: `fern check` reports 0 errors (1 pre-existing warning about the NVIDIA-green accent-color contrast ratio, unrelated to this PR). `fern docs dev` renders the sidebar with all five sections intact (Getting Started, Architecture, Providers, Engines, Reference) and the version label reads "v0.3.0". Signed-off-by: Rob Esker <resker@nvidia.com> * chore(ci): surface fern errors in publish-fern-docs workflow The publish step used `OUTPUT=$(fern generate --docs 2>&1)` to capture fern's output so the script could grep for the "Published docs to" URL and post it to the GitHub Actions step summary. Side effect: on non-zero exit from `fern generate`, `bash -e` aborted the step BEFORE the subsequent `echo "$OUTPUT"` ran, so fern's error output never reached the Actions log. All three failed publishes between 2026-04-17 and 2026-04-20 (plus the 2026-04-18 manual dispatches) logged only "Process completed with exit code 1" with no diagnostic context. Switch to `fern generate ... 2>&1 | tee /tmp/fern-output.log`. The `tee` streams fern's stdout/stderr to the Actions log in real time (visible in both success and failure cases) and mirrors to a file that the URL-extraction grep reads after. `set -o pipefail` is added so the step still fails on fern's non-zero exit (tee's own exit status would otherwise mask it). The `|| true` on grep ensures a missing URL in fern's output does not fail the step on its own. Net effect: future publish failures are diagnosable from the Actions log directly. No behavior change on the success path. No ability to test the failure path locally without reproducing a deliberate fern error; validated only that the happy-path shell snippet is syntactically valid. Future failures will surface the actual fern error, which is the whole point. Signed-off-by: Rob Esker <resker@nvidia.com> --------- Signed-off-by: Rob Esker <resker@nvidia.com>

resker added 3 commits April 17, 2026 01:06

resker requested a review from dmitsh as a code owner April 17, 2026 06:11

greptile-apps Bot reviewed Apr 17, 2026

View reviewed changes

resker mentioned this pull request Apr 17, 2026

docs: small-batch reconciliation of blog draft content into repo docs #264

Merged

4 tasks

resker mentioned this pull request Apr 18, 2026

docs(agents): add Documentation Impact Evaluation section and refresh stale references #269

Merged

4 tasks

greptile-apps Bot reviewed Apr 18, 2026

View reviewed changes

dmitsh approved these changes Apr 19, 2026

View reviewed changes

dmitsh merged commit e3fd2b3 into NVIDIA:main Apr 19, 2026
3 checks passed

resker mentioned this pull request Apr 19, 2026

docs: post-#254 follow-ups — KEP-4962 note + Doc-Impact table rows #278

Merged

4 tasks

resker mentioned this pull request Apr 21, 2026

docs(fern): restore Reference section, label v0.3.0, surface errors #284

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(reference): add authoritative node labels and annotations reference#254

docs(reference): add authoritative node labels and annotations reference#254
dmitsh merged 6 commits into
NVIDIA:mainfrom
resker:docs/reference-node-labels

resker commented Apr 17, 2026

Uh oh!

codecov Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot Apr 17, 2026

Uh oh!

greptile-apps Bot Apr 17, 2026

Uh oh!

greptile-apps Bot Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		### Label value behavior

		Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their FNV-64a hash (hex-encoded) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone.

	\| `cw` \| No \| Yes (IB switch hierarchy) \|
	\| `cw` \| No \| No (vertex structure incompatible with labeler — see pkg/providers/cw/provider.go) \|

Conversation

resker commented Apr 17, 2026

Summary

Test plan

Uh oh!

codecov Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Apr 17, 2026 •

edited

Loading

greptile-apps Bot commented Apr 17, 2026 •

edited

Loading