InferenceBridge is a local LLM desktop app for running GGUF models on your own hardware with a shared OpenAI-compatible API and a clean desktop UI.
Built with Tauri (Rust) and React, it wraps llama-server from llama.cpp and manages model discovery, downloads, loading, streaming chat, session history, and API serving in one app.
Pre-built installers are available on the Releases page.
| Platform | Status |
|---|---|
| Windows (x64) | NSIS installer and MSI |
| macOS (Apple Silicon) | DMG |
| Linux (Ubuntu/Debian x64) | DEB and AppImage |
You do not need Rust, Node.js, or llama.cpp preinstalled. InferenceBridge can download and manage llama-server from inside the app.
- Browse and download GGUF models from Hugging Face
- Auto-detect local model directories and scan for
.gguffiles - Load and unload models from the GUI or API
- Shared desktop UI and OpenAI-compatible API on the same local app state
- Streaming chat, session history, and context monitoring
- Vision image paste support for compatible models
- Interactive in-app API editor and logs workspace
- Optional API key protection for the public endpoint
- Download and install the latest release from Releases.
- Open the app and go to
Settings > llama-server. - Download the managed
llama-serverbuild for your machine. - Add or scan model directories, or download a model from the Browse tab.
- Load a model from the Models tab.
- Chat in the app or point external tools at
http://127.0.0.1:8800/v1.
If you already have GGUF files on disk, add their folder under Settings > Model Directories.
Base URL:
http://127.0.0.1:8800/v1
If you set an API key in Settings, pass it as a Bearer token. If no API key is configured, the local endpoint is open.
| Method | Path | Description |
|---|---|---|
GET |
/v1/health |
Health check |
GET |
/v1/models |
List discovered models |
GET |
/v1/models/{name} |
Get details for one model |
POST |
/v1/models/load |
Begin loading a model |
POST |
/v1/models/unload |
Unload the active model |
POST |
/v1/models/stats |
Get status or load progress for a model |
POST |
/v1/chat/completions |
OpenAI-style chat completions |
POST |
/v1/completions |
Text completions |
GET |
/v1/context/status |
Context and KV-cache status |
GET |
/v1/sessions |
List saved chat sessions |
POST |
/v1/sessions |
Create a chat session |
DELETE |
/v1/sessions/{id} |
Delete a chat session |
GET |
/v1/sessions/{id}/messages |
Get session messages |
curl "http://127.0.0.1:8800/v1/models"curl -X POST "http://127.0.0.1:8800/v1/models/load" \
-H "Content-Type: application/json" \
-d "{\"model\":\"Qwen3-14B-Q4_K_M.gguf\"}"curl -X POST "http://127.0.0.1:8800/v1/models/stats" \
-H "Content-Type: application/json" \
-d "{\"model\":\"Qwen3-14B-Q4_K_M.gguf\"}"curl "http://127.0.0.1:8800/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-key-here" \
-d '{
"model": "Qwen3-14B-Q4_K_M.gguf",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false
}'The Debug tab includes:
- API serve controls
- a built-in API editor
- recent request history
- logs
- raw prompt and parse trace views
See docs/04-debug-api-workspace.md for example flows and cURL snippets.
InferenceBridge stores configuration in:
| OS | Path |
|---|---|
| Windows | %APPDATA%\InferenceBridge\inference-bridge.toml |
| macOS | ~/Library/Application Support/InferenceBridge/inference-bridge.toml |
| Linux | ~/.config/InferenceBridge/inference-bridge.toml |
Example:
[server]
host = "127.0.0.1"
port = 8800
api_key = ""
autostart = true
[models]
scan_dirs = [
"C:\\Users\\You\\models",
]
[process]
gpu_layers = -1
threads = 0Requirements:
- Rust 1.75+
- Node.js 18+
- platform build tools for Tauri
git clone https://github.com/AssassinUKG/InferenceBridge
cd InferenceBridge
npm install
npm run tauri buildRelease bundles are written to src-tauri/target/release/bundle/.
For development:
npm run tauri devThis repo uses three GitHub Actions workflows:
CIruns quick validation on pushes and pull requests includesnpm run buildandcargo checkBuildbuilds desktop artifacts on pushes and by manual trigger from the Actions tab uploads build artifacts to the workflow runReleasecreates a draft GitHub Release when you push a version tag likev0.1.0
Push your branch:
git push origin masterOr open the Actions tab on GitHub and run the Build workflow manually.
git tag v0.1.0
git push origin v0.1.0That will run the Release workflow and create a draft release with platform installers.
- docs/01-migration-design-note.md
- docs/02-architecture.md
- docs/03-implementation-plan.md
- docs/04-debug-api-workspace.md
- docs/05-inference-runtime-roadmap.md
MIT. See LICENSE.



