-
Notifications
You must be signed in to change notification settings - Fork 0
On device AI
MTSistemi edited this page Jun 9, 2026
·
1 revision
SkillFishOS can run local LLMs on the BC-250's integrated GPU, accelerated in Vulkan — nothing leaves the machine.
- Ollama as the model runner, accelerated via Vulkan on the Cyan Skillfish GPU.
- OpenWebUI as the chat front-end.
- A one-click AI panel (
skillfish-ai-panel) that starts/stops the whole stack.
- A first-run wizard installs the stack and lets you pick a model from a curated list of 30+ options that fit the hardware (≤14B).
- Live readout of CPU / GPU / VRAM / RAM.
- A slider to grow the shared memory (GTT) available to the model — see Memory VRAM and GTT.
- Turn the engine off with one click to free the GPU and memory for gaming.
- Bigger models need more memory: raise the GTT budget (and/or the UMA VRAM) before loading a large model.
- The GPU governor's idle behaviour means the card drops to 350 MHz between prompts; under inference it ramps up. For sustained throughput you can pick Performance GPU Governor and Tuning — though for pure compute the 2230 MHz point can help (with adequate voltage/cooling), the shipped safe cap is 2200 MHz.
- Unlocking Compute Units (40-CU) roughly 1.8×'s the GPU's compute throughput.
Nothing here phones home — the models and the chat run entirely on your BC-250.
Getting started
Apps
Tuning & hardware
Using it
Developers