What's Changed
- chore(github): add issue template by @dmitsh in #244
- docs: add InfiniBand, NetQ, and DRA provider documentation, update README by @resker in #243
- feat(k8s): node observer to retry failed requests after delay by @dmitsh in #242
- feat: Slinky Engine - Dynamic Nodes Reconciliation by @ravisoundar in #241
- chore(github): add copy-pr-bot.yaml config by @dmitsh in #245
- docs(readme): add 'Motivation and Problem Statement' section by @dmitsh in #247
- fix: Topology Spec with the switch hierarchy by @ravisoundar in #246
- chore(deps): Bump github.com/moby/spdystream from 0.5.0 to 0.5.1 by @dependabot[bot] in #251
- chore(github): set gh runners by @dmitsh in #250
- docs: add Fern documentation scaffolding by @pdmack in #249
- chore: fix print_env.sh template placeholder and copyright year by @resker in #252
- docs: add AGENTS.md and .claude/CLAUDE.md for AI coding agents by @resker in #253
- build: add make qualify pre-push aggregator by @resker in #256
- docs: small-batch reconciliation of blog draft content into repo docs by @resker in #264
- docs: document gpu.clique relationship and non-MNNVL topology source by @resker in #255
- docs: organize docs into sections by @dmitsh in #267
- docs(agents): add Documentation Impact Evaluation section and refresh stale references by @resker in #269
- docs(k8s): add Kubernetes access patterns section by @resker in #259
- docs(fern): add fern/README.md describing the Fern docs workflow by @resker in #270
- chore(github): update fern github actions by @dmitsh in #268
- docs(governance): add CODE_OF_CONDUCT, SECURITY, and pull request template by @resker in #273
- docs(reference): add authoritative node labels and annotations reference by @resker in #254
- chore(go): update go version and packages by @dmitsh in #277
- docs: post-#254 follow-ups — KEP-4962 note + Doc-Impact table rows by @resker in #278
- fix(ci): extract PR metadata from ref_name for copy-PR-bot push trigger by @pdmack in #280
- fix(docs/ci): self-close img tags and add MDX safety check to CI by @pdmack in #282
- chore(chart): add values schema, helm test hooks, and chart README by @resker in #275
- fix(ci): extend MDX img check to fern/ and .mdx files by @pdmack in #283
- feat(chart): add Gateway API (HTTPRoute) support by @resker in #276
- chore: standardize SPDX file headers by @pdmack in #286
- chore: unify naming in Helm values examples by @dmitsh in #288
- docs(fern): restore Reference section, label v0.3.0, surface errors by @resker in #284
- docs(reference): clarify accelerator / Capacity Block / clique semantics by @resker in #289
- fix(ci): include docs/ in Fern preview artifact by @resker in #290
- chore(chart): declare kubeVersion ">=1.27.0-0" in chart + subcharts by @resker in #291
- docs(get-started): add install-path quickstarts for Kubernetes and Slurm by @resker in #292
- feat(slinky): support podSelector in partition topologies by @dmitsh in #295
- fix: Docker buildkit proxy setting for nv-gh-runners by @ravisoundar in #296
- fix(slurm/slinky): by @dmitsh in #297
- chore(docs): update logo by @dmitsh in #298
- chore(docs): add documentation about 'test' provider and update test payloads by @dmitsh in #299
- fix(topograph): flatten error messages into single-line strings by @dmitsh in #301
- fix(ci): restore branches filter on Fern publish push trigger by @resker in #304
- docs(reference): clarify gpu.clique distinguishability vs proximity by @resker in #305
- feat(dsx): provider simulator by @ravisoundar in #287
- fix(ci): harden Fern docs CI and configure custom domain by @pdmack in #308
- feat(topograph): refactor topology graph to reduce complexity by @dmitsh in #306
- chore(topograph): remove obsolete toposim and protobufs by @dmitsh in #310
- feat(model): refactor simulation models to include node attributes by @dmitsh in #309
- feat(slinky): Support ConfigUpdateMode parameter by @ravisoundar in #300
- feat(fern): add versioned docs dropdown with CI version stamping by @pdmack in #313
- feat(model): allow to specify nodes explicitly and implicitly by @dmitsh in #311
- fix(fern): pin frozen version content via git archive at publish time by @pdmack in #316
- fix(fern): align v0.3.0 nav with actual tag content by @pdmack in #318
- fix(slinky): Skip exec scontrol show partition for DynamicNodes by @ravisoundar in #319
- feat(nscale): implement topology provider by @dmitsh in #239
- fix(ci): populate frozen version content in preview build and surface fern errors by @pdmack in #322
- fix(slinky): tighten pods exec RBAC for partition discovery by @dmitsh in #320
- chore(topograph): release v0.4.0 by @dmitsh in #323
- feat(fern): add v0.4.0 version entry and automate version registration by @pdmack in #325
- chore(docs): add nscale docs by @dmitsh in #326
- fix(ci): use heredoc for version entry insertion to avoid YAML parse error by @pdmack in #327
- chore(docs): update docs by @dmitsh in #328
- fix(ci): use yq for YAML manipulation in publish workflow by @pdmack in #329
- fix(ci): install yq to user-local path with arch detection by @pdmack in #330
- fix(helm): minor fixes by @dmitsh in #331
- feat(build): opt-in env-var knobs for downstream packaging by @resker in #333
- fix(build): quote remaining $REPO_HOME-derived paths in build-deb.sh by @resker in #335
- fix(build): quote DEB_OUTPUT_DIR-derived paths in build-deb.sh by @resker in #334
- feat(ci): port AICR publish pattern — tag resolution, pruning, PR persistence by @pdmack in #337
- test(charts): Helm Chart Tests by @ravisoundar in #336
- feat(engine): Graph engine simulation by @ravisoundar in #314
- fix(ci): skip version registration for latest release and sort by semver by @pdmack in #338
- feat(fern): adopt global-theme nvidia, remove per-repo theme assets by @pdmack in #339
- fix(fern): add multi-source to enable global theme JS assets by @pdmack in #340
- fix(k8s): prefer GPU clique label for accelerator domains by @dmitsh in #341
- feat(helm): add node-data-broker ConfigMap mounts by @dmitsh in #347
- fix(server): preserve replacement timer in trailing delay queue by @dmitsh in #348
- chore(topograph): simplify node attributes by @dmitsh in #349
- fix(server): snapshot queue completion results by @fallintoplace in #351
- feat: Empty Block Complementing by @ravisoundar in #343
- feat: Restart deployments when there are changes to config maps by @ravisoundar in #346
- fix: formatting by @dmitsh in #354
- refactor(providers): replace string map helper with mapstructure by @dmitsh in #355
- fix(providers): reuse retrying HTTP helper by @dmitsh in #356
- docs(helm): use chart repo install instructions by @dmitsh in #357
namespaceadded to Helm Chart by @kevin-kho in #345- chore: update CODEOWNERS by @dmitsh in #358
- fix(provider/nebius): read metadata from IMDS by @dmitsh in #353
- fix: Ensuring the correct helm version for the chart test by @ravisoundar in #359
- chore(helm): default image tag to chart appVersion by @giuliocalzo in #360
- test(helm): replace in-house chart test script with helm-unittest by @giuliocalzo in #361
- fix(chart): support existing service accounts with managed RBAC by @dmitsh in #364
- feat(slinky): allow gpu.clique to override provider accelerator domains by @dmitsh in #342
- fix(node-observer): refresh topology after API restarts by @dmitsh in #367
- docs: add CHANGELOG and wire it into agent guidance by @giuliocalzo in #369
- refactor(server): replace golang-lru with local cache by @dmitsh in #371
- feat(node-data-broker): run broker as main container with health probes by @giuliocalzo in #368
- feat(node-observer): wait for topograph health in-process by @dmitsh in #370
- chore(release): prepare v0.5.0 by @dmitsh in #372
New Contributors
- @pdmack made their first contribution in #249
- @fallintoplace made their first contribution in #351
- @giuliocalzo made their first contribution in #360
Full Changelog: v0.3.0...v0.5.0