docs(reference): add authoritative node labels and annotations reference#254
Conversation
Documents all Kubernetes node labels and annotations set by Topograph, including the four network.topology.nvidia.com/ labels written by the k8s and slinky engines, the topograph.nvidia.com/ annotation keys, FNV-64a hash truncation behavior for long values, and an NVSentinel integration example. Closes NVIDIA#179 Signed-off-by: Rob Esker <resker@nvidia.com>
… node-labels - Clarify accelerator value format per provider: IB providers use ClusterUUID.CliqueId (same as nvidia.com/gpu.clique), NetQ uses NMX DomainUUID (different identifier format) - Add note that gpu.clique is not set on non-MNNVL systems and Topograph is the only topology source in those environments - Add nvidia.com/gpu.clique to the "Without Topograph" label table Signed-off-by: Rob Esker <resker@nvidia.com>
The companion NVSentinel PR is not being filed at this time; removing the dormant-link trailer lets the NVSentinel integration section stand on its own as a self-contained reference. Signed-off-by: Rob Esker <resker@nvidia.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #254 +/- ##
=======================================
Coverage 68.46% 68.46%
=======================================
Files 82 82
Lines 4842 4842
=======================================
Hits 3315 3315
Misses 1395 1395
Partials 132 132 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Greptile SummaryAdds Confidence Score: 5/5Safe to merge — documentation only, all key facts verified against source code, one minor notation ambiguity that does not affect correctness. All P0/P1 concerns from prior review rounds have been addressed: the x-prefixed FNV hash is now documented correctly, and cw/lambdai are now included in the matrix with accurate descriptions. The single remaining finding (lambdai value format notation) is a P2 style suggestion that does not affect functional correctness. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Provider["Provider\n(aws / gcp / oci / nebius / netq /\ninfiniband-bm / infiniband-k8s /\nlambdai / cw / dra)"]
Provider -->|GenerateTopologyConfig| Root["Vertex root"]
Root -->|topology/block| BlockRoot["Block root\n(optional)"]
Root -->|topology/tree| TreeRoot["Tree root\n(optional)"]
BlockRoot --> Block["Block vertex\n(NVLink domain / CapacityBlock)"]
Block --> NodeB["Compute node"]
TreeRoot --> Core["Core switch\n(optional)"]
Core --> Spine["Spine switch\n(optional)"]
Spine --> Leaf["Leaf switch"]
Leaf --> NodeT["Compute node"]
Block -->|checkLabel| AccLabel["network.topology.nvidia.com/accelerator"]
Leaf -->|checkLabel| LeafLabel["network.topology.nvidia.com/leaf"]
Spine -->|checkLabel| SpineLabel["network.topology.nvidia.com/spine"]
Core -->|checkLabel| CoreLabel["network.topology.nvidia.com/core"]
checkLabel{"len > 63?"}
AccLabel --> checkLabel
checkLabel -->|No| RawVal["raw value"]
checkLabel -->|Yes| Hash["x-prefixed FNV-64a hex\ne.g. x3e4f1a2b3c4d5e6f"]
Reviews (4): Last reviewed commit: "docs(reference): fix broken engine links..." | Re-trigger Greptile |
|
|
||
| ### Label value behavior | ||
|
|
||
| Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their **FNV-64a hash** (hex-encoded) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone. |
There was a problem hiding this comment.
FNV-64a hash value includes an
x prefix
The description says "hex-encoded," but checkLabel() in pkg/engines/k8s/labeler.go formats the value as fmt.Sprintf("x%x", h.Sum64()) — the resulting label value is a lowercase hex string with a leading x (e.g., x3e4f1a2b3c4d5e6f), not a bare hex string. Operators who grep for or compare label values need to know this prefix is present.
| Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their **FNV-64a hash** (hex-encoded) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone. | |
| Label values are used as-is when they are 63 characters or shorter (the Kubernetes label value limit). Values longer than 63 characters are replaced with their **FNV-64a hash** (formatted as `x` followed by lowercase hex, e.g., `x3e4f1a2b3c4d5e6f`) to stay within the limit. This means two nodes with the same long switch identifier will carry the same hash value — locality is preserved, but the original identifier is not recoverable from the label alone. |
The hash is rendered as fmt.Sprintf("x%x", h.Sum64()) in
pkg/engines/k8s/labeler.go:checkLabel, producing an x-prefixed lowercase
hex string. Previous wording said 'hex-encoded' without the prefix,
which matters for operators parsing or filtering label values.
Signed-off-by: Rob Esker <resker@nvidia.com>
| |---|---|---| | ||
| | `aws` | Yes (CapacityBlockId) | Yes | | ||
| | `gcp` | No | Yes | | ||
| | `oci` | No | Yes | | ||
| | `nebius` | No | Yes | | ||
| | `netq` | Yes (NMX `DomainUUID`) | Yes (Spectrum-X switch hierarchy) | | ||
| | `dra` | Yes (reads `nvidia.com/gpu.clique`) | No | | ||
| | `infiniband-bm` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) | | ||
| | `infiniband-k8s` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) | | ||
|
|
There was a problem hiding this comment.
Provider matrix omits
lambdai and cw
Two registered providers are absent from the matrix even though both write topology labels. lambdai (pkg/providers/lambdai/instance_topology.go) sets AcceleratorID = NVLink.DomainID + "." + NVLink.CliqueID and passes it through ToThreeTierGraph, so it emits both block (accelerator) and tree (leaf/spine/core) labels. cw (pkg/providers/cw/provider.go) calls ib.GenerateTopologyConfig and wraps the output as a bare tree root (no toGraph() call), so it emits tree-only labels.
Suggested additions to the matrix:
| |---|---|---| | |
| | `aws` | Yes (CapacityBlockId) | Yes | | |
| | `gcp` | No | Yes | | |
| | `oci` | No | Yes | | |
| | `nebius` | No | Yes | | |
| | `netq` | Yes (NMX `DomainUUID`) | Yes (Spectrum-X switch hierarchy) | | |
| | `dra` | Yes (reads `nvidia.com/gpu.clique`) | No | | |
| | `infiniband-bm` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) | | |
| | `infiniband-k8s` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) | | |
| | `aws` | Yes (CapacityBlockId) | Yes | | |
| | `cw` | No | Yes (IB switch hierarchy) | | |
| | `gcp` | No | Yes | | |
| | `lambdai` | Yes (`NVLink.DomainID.CliqueID`) | Yes | | |
| | `oci` | No | Yes | | |
| | `nebius` | No | Yes | | |
| | `netq` | Yes (NMX `DomainUUID`) | Yes (Spectrum-X switch hierarchy) | | |
| | `dra` | Yes (reads `nvidia.com/gpu.clique`) | No | | |
| | `infiniband-bm` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) | | |
| | `infiniband-k8s` | Yes (`ClusterUUID.CliqueId`) | Yes (IB switch hierarchy) | |
Folds high-value context from the unpublished blog draft into the repository docs so the blog can reference canonical sources rather than being the sole host of this material. - README.md: expand "Motivation and Problem Statement" with the disaggregated inference scenario — prefill/decode separation, KV cache transfer, NVLink vs. Ethernet performance cliff - docs/providers/infiniband.md: add a "why automate IB discovery" paragraph with the scale argument (hand-maintained labels work at 32 nodes, break at 1,000) - docs/providers/netq.md: add "Observed vs. Intended Topology" section describing NetQ's distinctive ability to observe degraded-but-up links via live telemetry - CONTRIBUTING.md: add "Community" section with Kubernetes Slack channels (#topology-aware-scheduling, #gpu-nvidia) and "Areas where contributions are especially welcome" covering new providers, on-prem fabrics, KEP-5732 Workload API, KEP-4962 upstream label schema, and Grove ClusterTopology integration The KEP-4962 note on docs/reference/node-labels.md is intentionally deferred to a follow-up PR once NVIDIA#254 lands, since node-labels.md does not yet exist on main. Signed-off-by: Rob Esker <resker@nvidia.com>
Folds high-value context from the unpublished blog draft into the repository docs so the blog can reference canonical sources rather than being the sole host of this material. - README.md: expand "Motivation and Problem Statement" with the disaggregated inference scenario — prefill/decode separation, KV cache transfer, NVLink vs. Ethernet performance cliff - docs/providers/infiniband.md: add a "why automate IB discovery" paragraph with the scale argument (hand-maintained labels work at 32 nodes, break at 1,000) - docs/providers/netq.md: add "Observed vs. Intended Topology" section describing NetQ's distinctive ability to observe degraded-but-up links via live telemetry - CONTRIBUTING.md: add "Community" section with Kubernetes Slack channels (#topology-aware-scheduling, #gpu-nvidia) and a pointer to the pinned roadmap/focus-areas issue The KEP-4962 note on docs/reference/node-labels.md is intentionally deferred to a follow-up PR once NVIDIA#254 lands, since node-labels.md does not yet exist on main. Forward-looking contribution areas are captured as a pinned GitHub issue rather than inline in CONTRIBUTING.md, per the convention that CONTRIBUTING.md describes how to contribute while roadmap/direction lives in a pinned issue. Signed-off-by: Rob Esker <resker@nvidia.com>
…264) Folds high-value context from the unpublished blog draft into the repository docs so the blog can reference canonical sources rather than being the sole host of this material. - README.md: expand "Motivation and Problem Statement" with the disaggregated inference scenario — prefill/decode separation, KV cache transfer, NVLink vs. Ethernet performance cliff - docs/providers/infiniband.md: add a "why automate IB discovery" paragraph with the scale argument (hand-maintained labels work at 32 nodes, break at 1,000) - docs/providers/netq.md: add "Observed vs. Intended Topology" section describing NetQ's distinctive ability to observe degraded-but-up links via live telemetry - CONTRIBUTING.md: add "Community" section with Kubernetes Slack channels (#topology-aware-scheduling, #gpu-nvidia) and a pointer to the pinned roadmap/focus-areas issue The KEP-4962 note on docs/reference/node-labels.md is intentionally deferred to a follow-up PR once #254 lands, since node-labels.md does not yet exist on main. Forward-looking contribution areas are captured as a pinned GitHub issue rather than inline in CONTRIBUTING.md, per the convention that CONTRIBUTING.md describes how to contribute while roadmap/direction lives in a pinned issue. Signed-off-by: Rob Esker <resker@nvidia.com>
Addresses Greptile's P1 finding on NVIDIA#254. Both are registered providers that write topology labels but were missing from the matrix: - lambdai: emits ClusterUUID.CliqueID-style accelerator via NVLink.DomainID + NVLink.CliqueID, plus tree labels via ToThreeTierGraph (pkg/providers/lambdai/instance_topology.go) - cw: calls ib.GenerateTopologyConfig and wraps the output as a bare tree root (pkg/providers/cw/provider.go), emitting tree labels only Signed-off-by: Rob Esker <resker@nvidia.com>
| | Provider | Block (`accelerator`) | Tree (`leaf`/`spine`/`core`) | | ||
| |---|---|---| | ||
| | `aws` | Yes (CapacityBlockId) | Yes | | ||
| | `cw` | No | Yes (IB switch hierarchy) | |
There was a problem hiding this comment.
cw tree column is incorrect — no labels are emitted
cw.GenerateTopologyConfig returns a raw treeRoot vertex where children are keyed by switch IDs, not wrapped under the "topology/tree" key. Both the k8s labeler (ApplyNodeLabels, labeler.go:77) and the Slinky engine's initTree (translate/topology.go:118) gate all tree processing on root.Vertices[topology.TopologyTree] — a key that is never present in the cw output. As a result, the cw provider produces no node labels at all through either engine.
Compare cw's return value:
// cw/provider.go — switch IDs as keys, no "topology/tree" wrapper
treeRoot := &topology.Vertex{Vertices: make(map[string]*topology.Vertex)}
for _, v := range roots {
treeRoot.Vertices[v.ID] = v // keys are switch identifiers
}
return treeRoot, nilWith netq's correct structure:
// netq/provider.go — properly wrapped
root.Vertices[topology.TopologyTree] = treeRootThe tree column for cw should be "No" until the provider is updated to wrap its output, or the code should be fixed and the doc updated to "Yes" once the provider emits the expected structure.
| | `cw` | No | Yes (IB switch hierarchy) | | |
| | `cw` | No | No (vertex structure incompatible with labeler — see pkg/providers/cw/provider.go) | |
Two fixes from the PR NVIDIA#254 review: - Fern Check was failing because lines 7 points at `../engines.md`, which is not a single file — `docs/engines/` is a directory with per-engine files. Fixed the Kubernetes and Slinky engine links to point at `../engines/k8s.md` and `../engines/slinky.md` respectively. - Greptile's latest P1 finding is correct: `cw.GenerateTopologyConfig` returns a tree root whose children are switch IDs directly, not wrapped under `topology.TopologyTree`. Both the k8s labeler (`ApplyNodeLabels`, `labeler.go:77`) and Slinky's `initTree` (`translate/topology.go:118`) gate processing on `root.Vertices[topology.TopologyTree]`, which is never present in cw's output. Compare with `netq/provider.go:92` which correctly wraps `root.Vertices[topology.TopologyTree] = treeRoot`. The cw provider emits zero labels today; updated the matrix row to reflect current behavior with a note about the underlying issue. Signed-off-by: Rob Esker <resker@nvidia.com>
Adds a short subsection under `## Labels` covering KEP-4962
("Standardizing the Representation of Cluster Network Topology"),
which is pre-GA and still under upstream review at
kubernetes/enhancements#4962 (draft PR #4965). Notes that the KEP's
framing allows vendor prefixes like `network.topology.nvidia.com/*`
to coexist with the standard `topology.kubernetes.io/` keys rather
than replace them, and that Topograph will evaluate aligning or
providing both if and when the KEP reaches GA. For now, the
`network.topology.nvidia.com/*` keys remain authoritative for
Topograph-deployed clusters.
This note was deferred from PR NVIDIA#264 because the target file
(`docs/reference/node-labels.md`) did not yet exist on `main`; it
was introduced by PR NVIDIA#254, which merged 2026-04-19.
Signed-off-by: Rob Esker <resker@nvidia.com>
Adds back two rows to the Documentation Impact Evaluation table in `AGENTS.md` and `.claude/CLAUDE.md` that were removed from PR NVIDIA#269 to avoid cross-referencing content not yet on `main`: - Chart template row pointing at `docs/engines/k8s.md` "Exposing the Topograph API" section (added to main by PR NVIDIA#259) - Label or annotation key row pointing at `docs/reference/node-labels.md` (added to main by PR NVIDIA#254) Both gating PRs have now merged, so the rows can be restored without broken cross-references. Paired edit preserves the byte-identical invariant between `AGENTS.md` and `.claude/CLAUDE.md` from line 6 onward (verified with `cmp`). Signed-off-by: Rob Esker <resker@nvidia.com>
) * docs(reference): add KEP-4962 upstream-alignment note to node-labels.md Adds a short subsection under `## Labels` covering KEP-4962 ("Standardizing the Representation of Cluster Network Topology"), which is pre-GA and still under upstream review at kubernetes/enhancements#4962 (draft PR #4965). Notes that the KEP's framing allows vendor prefixes like `network.topology.nvidia.com/*` to coexist with the standard `topology.kubernetes.io/` keys rather than replace them, and that Topograph will evaluate aligning or providing both if and when the KEP reaches GA. For now, the `network.topology.nvidia.com/*` keys remain authoritative for Topograph-deployed clusters. This note was deferred from PR #264 because the target file (`docs/reference/node-labels.md`) did not yet exist on `main`; it was introduced by PR #254, which merged 2026-04-19. Signed-off-by: Rob Esker <resker@nvidia.com> * docs(agents): restore two Doc-Impact Evaluation table rows Adds back two rows to the Documentation Impact Evaluation table in `AGENTS.md` and `.claude/CLAUDE.md` that were removed from PR #269 to avoid cross-referencing content not yet on `main`: - Chart template row pointing at `docs/engines/k8s.md` "Exposing the Topograph API" section (added to main by PR #259) - Label or annotation key row pointing at `docs/reference/node-labels.md` (added to main by PR #254) Both gating PRs have now merged, so the rows can be restored without broken cross-references. Paired edit preserves the byte-identical invariant between `AGENTS.md` and `.claude/CLAUDE.md` from line 6 onward (verified with `cmp`). Signed-off-by: Rob Esker <resker@nvidia.com> --------- Signed-off-by: Rob Esker <resker@nvidia.com>
…284) * docs(fern): restore Reference section to nav `docs/reference/node-labels.md` has been on main since PR #254 merged (2026-04-19) but is not listed in `docs/index.yml`, the Fern nav source-of-truth. The last successful Fern publish (2026-04-20T16:52Z) resolved 13 pages — exactly the count declared in the nav without a Reference section — confirming the page is invisible on the live site at https://topograph.docs.buildwithfern.com/topograph despite being on the filesystem. Add a Reference section listing `reference/node-labels.md`. This is the paired-file update that PR #254 should have included. Verified filesystem vs. nav discrepancy: | docs/ file | In index.yml? | |----------------------------------|---------------| | overview.md | yes | | architecture.md + api.md | yes | | providers/{aws,gcp,oci,nebius,...| yes (7) | | engines/{slurm,k8s,slinky}.md | yes (3) | | reference/node-labels.md | NO (fixed) | Signed-off-by: Rob Esker <resker@nvidia.com> * docs(fern): label current version as v0.3.0 The published Fern site shows a version label of "dev" because `fern/docs.yml` declared the only version as `display-name: dev`. The repo is at tagged release v0.3.0 with no divergence between the released content and the in-flight docs, so "dev" is misleading: readers see "development docs" framing when what they're actually looking at corresponds to the v0.3.0 tag. Rename the version to `v0.3.0`. No content split yet — both versions would point at the same `docs/index.yml` anyway. When content actually diverges (post-v0.4.0 breaking doc changes, or a doc change that only applies to the future release), re-introduce a separate `dev` entry alongside v0.3.0 and repoint paths to differentiated content. Drop the pre-existing `/topograph/dev/index.html` -> `/topograph/dev/` redirect because the `/dev/` destination no longer exists after the rename. Not adding `/dev` -> new-version forwarding: site has only been live since 2026-04-17 (four days), so dev-URL bookmarks are unlikely; the Fern root redirect + sidebar navigation recovers any user who lands on a 404. If dead-bookmark complaints surface later, adding a one-line redirect is trivial. Verified: `fern check` reports 0 errors (1 pre-existing warning about the NVIDIA-green accent-color contrast ratio, unrelated to this PR). `fern docs dev` renders the sidebar with all five sections intact (Getting Started, Architecture, Providers, Engines, Reference) and the version label reads "v0.3.0". Signed-off-by: Rob Esker <resker@nvidia.com> * chore(ci): surface fern errors in publish-fern-docs workflow The publish step used `OUTPUT=$(fern generate --docs 2>&1)` to capture fern's output so the script could grep for the "Published docs to" URL and post it to the GitHub Actions step summary. Side effect: on non-zero exit from `fern generate`, `bash -e` aborted the step BEFORE the subsequent `echo "$OUTPUT"` ran, so fern's error output never reached the Actions log. All three failed publishes between 2026-04-17 and 2026-04-20 (plus the 2026-04-18 manual dispatches) logged only "Process completed with exit code 1" with no diagnostic context. Switch to `fern generate ... 2>&1 | tee /tmp/fern-output.log`. The `tee` streams fern's stdout/stderr to the Actions log in real time (visible in both success and failure cases) and mirrors to a file that the URL-extraction grep reads after. `set -o pipefail` is added so the step still fails on fern's non-zero exit (tee's own exit status would otherwise mask it). The `|| true` on grep ensures a missing URL in fern's output does not fail the step on its own. Net effect: future publish failures are diagnosable from the Actions log directly. No behavior change on the success path. No ability to test the failure path locally without reproducing a deliberate fern error; validated only that the happy-path shell snippet is syntactically valid. Future failures will surface the actual fern error, which is the whole point. Signed-off-by: Rob Esker <resker@nvidia.com> --------- Signed-off-by: Rob Esker <resker@nvidia.com>
Summary
Adds
docs/reference/node-labels.md— the authoritative reference for every label and annotation key written by Topograph, requested in #179.Covers:
network.topology.nvidia.com/{accelerator,leaf,spine,core}) with topology type and semanticsaccelerator(block) vs.leaf/spine/core(tree), including value format details (IB providers produceClusterUUID.CliqueId; NetQ uses NMXDomainUUID)nvidia.com/gpu.clique— on MNNVL systems the IBacceleratorvalue matches; on non-MNNVL systems (B200/B300)gpu.cliqueis not set at all, making Topograph the only topology sourcetopologyNodeLabelsconfiguration for customizing key prefixestopograph.nvidia.com/annotation keys (internal bookkeeping)Test plan
pkg/engines/k8s/labeler.goconstantsClusterUUID.CliqueIdformat verified againstpkg/providers/infiniband/common.go(nvidia-smi -q | grep "ClusterUUID\|CliqueId") andNVIDIA/k8s-device-plugin/internal/lm/imex.goDomainUUIDformat verified againstpkg/providers/netq/nmx.goGPU_FABRIC_STATE_COMPLETEDbehavior verified againstNVIDIA/go-nvlib/pkg/nvlib/device/device.go:IsFabricAttached()pkg/engines/k8s/labeler.go:checkLabelCloses #179