Release BareMetalRT 0.13.12 · baremetalrt/dist

What's new in 0.13.12

Model loading and serving is now reliable instead of fingers-crossed, and the node no longer misreports its state.

Loads retry instead of failing on a transient hiccup (VRAM blip, slow import, flaky download).
No leaked GPU memory between model loads — a worker that times out or crashes is now reaped, so the next load no longer fails for lack of VRAM ("Impossible to fit in kvCache").
The node tells the truth — after an unload or an unexpected drop it shows idle (not a stale ready), and a crashed inference worker now self-heals (reloads) instead of erroring on every chat.
A broken tokenizer is a real error, not a silent node that accepts chats then 503s forever.
A corrupt model registry can't wipe your downloaded models — it's backed up and fails loud instead of silently resetting to empty.

Experimental (off by default): a new native serving path with rock-solid model swapping, opt-in via BMRT_NATIVE_RESIDENCY=1. Normal behavior is unchanged.

Installation

Download and run BareMetalRT-0.13.12-Setup.exe
The installer sets up everything it needs — the Python runtime and GPU libraries are bundled. No separate CUDA Toolkit or TensorRT install is required; it runs a quick NVIDIA-driver check only.
The app opens on your machine at http://localhost:8080. Sign in once to link this GPU to your account — after that it runs locally, no login, nothing leaves your machine.

System Requirements

Windows 10/11 (64-bit)
NVIDIA GPU (RTX 2000-series or newer recommended)
A recent NVIDIA driver (580+) — no system CUDA Toolkit or TensorRT needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BareMetalRT 0.13.12

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's new in 0.13.12

Installation

System Requirements

Uh oh!