Skip to content

BareMetalRT 0.13.12

Choose a tag to compare

@brianhabana brianhabana released this 22 Jun 12:21

What's new in 0.13.12

Model loading and serving is now reliable instead of fingers-crossed, and the node no longer misreports its state.

  • Loads retry instead of failing on a transient hiccup (VRAM blip, slow import, flaky download).
  • No leaked GPU memory between model loads — a worker that times out or crashes is now reaped, so the next load no longer fails for lack of VRAM ("Impossible to fit in kvCache").
  • The node tells the truth — after an unload or an unexpected drop it shows idle (not a stale ready), and a crashed inference worker now self-heals (reloads) instead of erroring on every chat.
  • A broken tokenizer is a real error, not a silent node that accepts chats then 503s forever.
  • A corrupt model registry can't wipe your downloaded models — it's backed up and fails loud instead of silently resetting to empty.

Experimental (off by default): a new native serving path with rock-solid model swapping, opt-in via BMRT_NATIVE_RESIDENCY=1. Normal behavior is unchanged.


Installation

  1. Download and run BareMetalRT-0.13.12-Setup.exe
  2. The installer sets up everything it needs — the Python runtime and GPU libraries are bundled. No separate CUDA Toolkit or TensorRT install is required; it runs a quick NVIDIA-driver check only.
  3. The app opens on your machine at http://localhost:8080. Sign in once to link this GPU to your account — after that it runs locally, no login, nothing leaves your machine.

System Requirements

  • Windows 10/11 (64-bit)
  • NVIDIA GPU (RTX 2000-series or newer recommended)
  • A recent NVIDIA driver (580+) — no system CUDA Toolkit or TensorRT needed