v0.8.8

github-actions released this 17 Jun 07:48

b6add4a

0.8.8 (2026-06-17)

Features

AMD/Vulkan runtime image selection (hardware.gpu.runtime) (#727) (1a4544f)
crd: make GPU resource name configurable to support AMD/Vulkan/Intel scheduling (#709) (c88becf)
gateway: active HTTP health checks on the ModelRouter BTP for fast backend ejection (#662) (#704) (ba99060)
gateway: event-driven route-level ejection of unhealthy backends (#662) (#706) (815f2bf)
gateway: gateway-scoped audit access log + fail-loud auditLog in Gateway mode (2c) (#703) (b874b5e)
gateway: header-only data-classification routing + fail-closed sensitive guard (2e-core) (#707) (0249665)
gateway: InferenceService Envoy AI Gateway exposure (MVP) (#692) (3b095dc)
gateway: ModelRouter dataPlane Gateway mode with cross-tier failover (2a) (#693) (2842634)
gateway: ModelRouter JWT authentication via SecurityPolicy (2d-core) (#695) (73a2ea9)
gateway: ModelRouter per-team model allowlists via SecurityPolicy authorization (2d.2) (#702) (94428b4)
gateway: ModelRouter token budgets and 429 enforcement (2b) (#694) (627e85a)
metal-agent: withdraw endpoint when runtime is unhealthy (#662) (#705) (5ed9395)
selfupdate: bound download size + GC old agent versions (#690) (5205a62)
webhook: ModelRouter validating webhook for apply-time honest-boundary rejection (#708) (13d9321)

Bug Fixes

cache: restore shared model cache as the default (perService becomes opt-in) (#732) (44ab7dc)
per-node model cache so GPU on a second node can schedule (#728) (#729) (79bccce)

Documentation

DGX Spark (GB10) on MicroK8s setup guide (#717) (bf7d7a7)
fix DGX Spark guide for ARM64 (GPU operator + GB10 image) (#718) (45a4237)
proposal for owned AMD/Vulkan runtime image and build pipeline (#726) (3a1a150)

Assets 14