Skip to content

v0.5.0

Latest

Choose a tag to compare

@dmitsh dmitsh released this 30 Jun 18:04
253cb10

What's Changed

  • chore(github): add issue template by @dmitsh in #244
  • docs: add InfiniBand, NetQ, and DRA provider documentation, update README by @resker in #243
  • feat(k8s): node observer to retry failed requests after delay by @dmitsh in #242
  • feat: Slinky Engine - Dynamic Nodes Reconciliation by @ravisoundar in #241
  • chore(github): add copy-pr-bot.yaml config by @dmitsh in #245
  • docs(readme): add 'Motivation and Problem Statement' section by @dmitsh in #247
  • fix: Topology Spec with the switch hierarchy by @ravisoundar in #246
  • chore(deps): Bump github.com/moby/spdystream from 0.5.0 to 0.5.1 by @dependabot[bot] in #251
  • chore(github): set gh runners by @dmitsh in #250
  • docs: add Fern documentation scaffolding by @pdmack in #249
  • chore: fix print_env.sh template placeholder and copyright year by @resker in #252
  • docs: add AGENTS.md and .claude/CLAUDE.md for AI coding agents by @resker in #253
  • build: add make qualify pre-push aggregator by @resker in #256
  • docs: small-batch reconciliation of blog draft content into repo docs by @resker in #264
  • docs: document gpu.clique relationship and non-MNNVL topology source by @resker in #255
  • docs: organize docs into sections by @dmitsh in #267
  • docs(agents): add Documentation Impact Evaluation section and refresh stale references by @resker in #269
  • docs(k8s): add Kubernetes access patterns section by @resker in #259
  • docs(fern): add fern/README.md describing the Fern docs workflow by @resker in #270
  • chore(github): update fern github actions by @dmitsh in #268
  • docs(governance): add CODE_OF_CONDUCT, SECURITY, and pull request template by @resker in #273
  • docs(reference): add authoritative node labels and annotations reference by @resker in #254
  • chore(go): update go version and packages by @dmitsh in #277
  • docs: post-#254 follow-ups — KEP-4962 note + Doc-Impact table rows by @resker in #278
  • fix(ci): extract PR metadata from ref_name for copy-PR-bot push trigger by @pdmack in #280
  • fix(docs/ci): self-close img tags and add MDX safety check to CI by @pdmack in #282
  • chore(chart): add values schema, helm test hooks, and chart README by @resker in #275
  • fix(ci): extend MDX img check to fern/ and .mdx files by @pdmack in #283
  • feat(chart): add Gateway API (HTTPRoute) support by @resker in #276
  • chore: standardize SPDX file headers by @pdmack in #286
  • chore: unify naming in Helm values examples by @dmitsh in #288
  • docs(fern): restore Reference section, label v0.3.0, surface errors by @resker in #284
  • docs(reference): clarify accelerator / Capacity Block / clique semantics by @resker in #289
  • fix(ci): include docs/ in Fern preview artifact by @resker in #290
  • chore(chart): declare kubeVersion ">=1.27.0-0" in chart + subcharts by @resker in #291
  • docs(get-started): add install-path quickstarts for Kubernetes and Slurm by @resker in #292
  • feat(slinky): support podSelector in partition topologies by @dmitsh in #295
  • fix: Docker buildkit proxy setting for nv-gh-runners by @ravisoundar in #296
  • fix(slurm/slinky): by @dmitsh in #297
  • chore(docs): update logo by @dmitsh in #298
  • chore(docs): add documentation about 'test' provider and update test payloads by @dmitsh in #299
  • fix(topograph): flatten error messages into single-line strings by @dmitsh in #301
  • fix(ci): restore branches filter on Fern publish push trigger by @resker in #304
  • docs(reference): clarify gpu.clique distinguishability vs proximity by @resker in #305
  • feat(dsx): provider simulator by @ravisoundar in #287
  • fix(ci): harden Fern docs CI and configure custom domain by @pdmack in #308
  • feat(topograph): refactor topology graph to reduce complexity by @dmitsh in #306
  • chore(topograph): remove obsolete toposim and protobufs by @dmitsh in #310
  • feat(model): refactor simulation models to include node attributes by @dmitsh in #309
  • feat(slinky): Support ConfigUpdateMode parameter by @ravisoundar in #300
  • feat(fern): add versioned docs dropdown with CI version stamping by @pdmack in #313
  • feat(model): allow to specify nodes explicitly and implicitly by @dmitsh in #311
  • fix(fern): pin frozen version content via git archive at publish time by @pdmack in #316
  • fix(fern): align v0.3.0 nav with actual tag content by @pdmack in #318
  • fix(slinky): Skip exec scontrol show partition for DynamicNodes by @ravisoundar in #319
  • feat(nscale): implement topology provider by @dmitsh in #239
  • fix(ci): populate frozen version content in preview build and surface fern errors by @pdmack in #322
  • fix(slinky): tighten pods exec RBAC for partition discovery by @dmitsh in #320
  • chore(topograph): release v0.4.0 by @dmitsh in #323
  • feat(fern): add v0.4.0 version entry and automate version registration by @pdmack in #325
  • chore(docs): add nscale docs by @dmitsh in #326
  • fix(ci): use heredoc for version entry insertion to avoid YAML parse error by @pdmack in #327
  • chore(docs): update docs by @dmitsh in #328
  • fix(ci): use yq for YAML manipulation in publish workflow by @pdmack in #329
  • fix(ci): install yq to user-local path with arch detection by @pdmack in #330
  • fix(helm): minor fixes by @dmitsh in #331
  • feat(build): opt-in env-var knobs for downstream packaging by @resker in #333
  • fix(build): quote remaining $REPO_HOME-derived paths in build-deb.sh by @resker in #335
  • fix(build): quote DEB_OUTPUT_DIR-derived paths in build-deb.sh by @resker in #334
  • feat(ci): port AICR publish pattern — tag resolution, pruning, PR persistence by @pdmack in #337
  • test(charts): Helm Chart Tests by @ravisoundar in #336
  • feat(engine): Graph engine simulation by @ravisoundar in #314
  • fix(ci): skip version registration for latest release and sort by semver by @pdmack in #338
  • feat(fern): adopt global-theme nvidia, remove per-repo theme assets by @pdmack in #339
  • fix(fern): add multi-source to enable global theme JS assets by @pdmack in #340
  • fix(k8s): prefer GPU clique label for accelerator domains by @dmitsh in #341
  • feat(helm): add node-data-broker ConfigMap mounts by @dmitsh in #347
  • fix(server): preserve replacement timer in trailing delay queue by @dmitsh in #348
  • chore(topograph): simplify node attributes by @dmitsh in #349
  • fix(server): snapshot queue completion results by @fallintoplace in #351
  • feat: Empty Block Complementing by @ravisoundar in #343
  • feat: Restart deployments when there are changes to config maps by @ravisoundar in #346
  • fix: formatting by @dmitsh in #354
  • refactor(providers): replace string map helper with mapstructure by @dmitsh in #355
  • fix(providers): reuse retrying HTTP helper by @dmitsh in #356
  • docs(helm): use chart repo install instructions by @dmitsh in #357
  • namespace added to Helm Chart by @kevin-kho in #345
  • chore: update CODEOWNERS by @dmitsh in #358
  • fix(provider/nebius): read metadata from IMDS by @dmitsh in #353
  • fix: Ensuring the correct helm version for the chart test by @ravisoundar in #359
  • chore(helm): default image tag to chart appVersion by @giuliocalzo in #360
  • test(helm): replace in-house chart test script with helm-unittest by @giuliocalzo in #361
  • fix(chart): support existing service accounts with managed RBAC by @dmitsh in #364
  • feat(slinky): allow gpu.clique to override provider accelerator domains by @dmitsh in #342
  • fix(node-observer): refresh topology after API restarts by @dmitsh in #367
  • docs: add CHANGELOG and wire it into agent guidance by @giuliocalzo in #369
  • refactor(server): replace golang-lru with local cache by @dmitsh in #371
  • feat(node-data-broker): run broker as main container with health probes by @giuliocalzo in #368
  • feat(node-observer): wait for topograph health in-process by @dmitsh in #370
  • chore(release): prepare v0.5.0 by @dmitsh in #372

New Contributors

Full Changelog: v0.3.0...v0.5.0