Skip to content

v0.8.8

Choose a tag to compare

@github-actions github-actions released this 17 Jun 07:48
b6add4a

0.8.8 (2026-06-17)

Features

  • AMD/Vulkan runtime image selection (hardware.gpu.runtime) (#727) (1a4544f)
  • crd: make GPU resource name configurable to support AMD/Vulkan/Intel scheduling (#709) (c88becf)
  • gateway: active HTTP health checks on the ModelRouter BTP for fast backend ejection (#662) (#704) (ba99060)
  • gateway: event-driven route-level ejection of unhealthy backends (#662) (#706) (815f2bf)
  • gateway: gateway-scoped audit access log + fail-loud auditLog in Gateway mode (2c) (#703) (b874b5e)
  • gateway: header-only data-classification routing + fail-closed sensitive guard (2e-core) (#707) (0249665)
  • gateway: InferenceService Envoy AI Gateway exposure (MVP) (#692) (3b095dc)
  • gateway: ModelRouter dataPlane Gateway mode with cross-tier failover (2a) (#693) (2842634)
  • gateway: ModelRouter JWT authentication via SecurityPolicy (2d-core) (#695) (73a2ea9)
  • gateway: ModelRouter per-team model allowlists via SecurityPolicy authorization (2d.2) (#702) (94428b4)
  • gateway: ModelRouter token budgets and 429 enforcement (2b) (#694) (627e85a)
  • metal-agent: withdraw endpoint when runtime is unhealthy (#662) (#705) (5ed9395)
  • selfupdate: bound download size + GC old agent versions (#690) (5205a62)
  • webhook: ModelRouter validating webhook for apply-time honest-boundary rejection (#708) (13d9321)

Bug Fixes

  • cache: restore shared model cache as the default (perService becomes opt-in) (#732) (44ab7dc)
  • per-node model cache so GPU on a second node can schedule (#728) (#729) (79bccce)

Documentation

  • DGX Spark (GB10) on MicroK8s setup guide (#717) (bf7d7a7)
  • fix DGX Spark guide for ARM64 (GPU operator + GB10 image) (#718) (45a4237)
  • proposal for owned AMD/Vulkan runtime image and build pipeline (#726) (3a1a150)