On device AI

Jump to bottom

MTSistemi edited this page Jun 9, 2026 · 1 revision

On-device AI

SkillFishOS can run local LLMs on the BC-250's integrated GPU, accelerated in Vulkan — nothing leaves the machine.

The stack

Ollama as the model runner, accelerated via Vulkan on the Cyan Skillfish GPU.
OpenWebUI as the chat front-end.
A one-click AI panel (skillfish-ai-panel) that starts/stops the whole stack.

The AI panel

A first-run wizard installs the stack and lets you pick a model from a curated list of 30+ options that fit the hardware (≤14B).
Live readout of CPU / GPU / VRAM / RAM.
A slider to grow the shared memory (GTT) available to the model — see Memory VRAM and GTT.
Turn the engine off with one click to free the GPU and memory for gaming.

Tips

Bigger models need more memory: raise the GTT budget (and/or the UMA VRAM) before loading a large model.
The GPU governor's idle behaviour means the card drops to 350 MHz between prompts; under inference it ramps up. For sustained throughput you can pick Performance GPU Governor and Tuning — though for pure compute the 2230 MHz point can help (with adequate voltage/cooling), the shipped safe cap is 2200 MHz.
Unlocking Compute Units (40-CU) roughly 1.8×'s the GPU's compute throughput.

Nothing here phones home — the models and the chat run entirely on your BC-250.