InferenceBridge

InferenceBridge is a local LLM desktop app for running GGUF models on your own hardware with a shared OpenAI-compatible API and a clean desktop UI.

Built with Tauri (Rust) and React, it wraps llama-server from llama.cpp and manages model discovery, downloads, loading, streaming chat, session history, and API serving in one app.

Screenshots

Chat window

Browse HF window

Models window

API window

Download

Pre-built installers are available on the Releases page.

Platform	Status
Windows (x64)	NSIS installer and MSI
macOS (Apple Silicon)	DMG
Linux (Ubuntu/Debian x64)	DEB and AppImage

You do not need Rust, Node.js, or llama.cpp preinstalled. InferenceBridge can download and manage llama-server from inside the app.

Features

Browse and download GGUF models from Hugging Face
Auto-detect local model directories and scan for .gguf files
Load and unload models from the GUI or API
Shared desktop UI and OpenAI-compatible API on the same local app state
Streaming chat, session history, and context monitoring
Vision image paste support for compatible models
Interactive in-app API editor and logs workspace
Optional API key protection for the public endpoint

Quick Start

Download and install the latest release from Releases.
Open the app and go to Settings > llama-server.
Download the managed llama-server build for your machine.
Add or scan model directories, or download a model from the Browse tab.
Load a model from the Models tab.
Chat in the app or point external tools at http://127.0.0.1:8800/v1.

If you already have GGUF files on disk, add their folder under Settings > Model Directories.

OpenAI-Compatible API

Base URL:

http://127.0.0.1:8800/v1

If you set an API key in Settings, pass it as a Bearer token. If no API key is configured, the local endpoint is open.

Core Endpoints

Method	Path	Description
`GET`	`/v1/health`	Health check
`GET`	`/v1/models`	List discovered models
`GET`	`/v1/models/{name}`	Get details for one model
`POST`	`/v1/models/load`	Begin loading a model
`POST`	`/v1/models/unload`	Unload the active model
`POST`	`/v1/models/stats`	Get status or load progress for a model
`POST`	`/v1/chat/completions`	OpenAI-style chat completions
`POST`	`/v1/completions`	Text completions
`GET`	`/v1/context/status`	Context and KV-cache status
`GET`	`/v1/sessions`	List saved chat sessions
`POST`	`/v1/sessions`	Create a chat session
`DELETE`	`/v1/sessions/{id}`	Delete a chat session
`GET`	`/v1/sessions/{id}/messages`	Get session messages

Example: List Models

curl "http://127.0.0.1:8800/v1/models"

Example: Load a Model

curl -X POST "http://127.0.0.1:8800/v1/models/load" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"Qwen3-14B-Q4_K_M.gguf\"}"

Example: Poll Model Status

curl -X POST "http://127.0.0.1:8800/v1/models/stats" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"Qwen3-14B-Q4_K_M.gguf\"}"

Example: Chat Completion

curl "http://127.0.0.1:8800/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-key-here" \
  -d '{
    "model": "Qwen3-14B-Q4_K_M.gguf",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false
  }'

Debug Workspace

The Debug tab includes:

API serve controls
a built-in API editor
recent request history
logs
raw prompt and parse trace views

See docs/04-debug-api-workspace.md for example flows and cURL snippets.

Configuration

InferenceBridge stores configuration in:

OS	Path
Windows	`%APPDATA%\InferenceBridge\inference-bridge.toml`
macOS	`~/Library/Application Support/InferenceBridge/inference-bridge.toml`
Linux	`~/.config/InferenceBridge/inference-bridge.toml`

Example:

[server]
host = "127.0.0.1"
port = 8800
api_key = ""
autostart = true

[models]
scan_dirs = [
  "C:\\Users\\You\\models",
]

[process]
gpu_layers = -1
threads = 0

Building From Source

Requirements:

Rust 1.75+
Node.js 18+
platform build tools for Tauri

git clone https://github.com/AssassinUKG/InferenceBridge
cd InferenceBridge
npm install
npm run tauri build

Release bundles are written to src-tauri/target/release/bundle/.

For development:

npm run tauri dev

GitHub Actions

This repo uses three GitHub Actions workflows:

CI runs quick validation on pushes and pull requests includes npm run build and cargo check
Build builds desktop artifacts on pushes and by manual trigger from the Actions tab uploads build artifacts to the workflow run
Release creates a draft GitHub Release when you push a version tag like v0.1.0

Trigger a normal build

Push your branch:

git push origin master

Or open the Actions tab on GitHub and run the Build workflow manually.

Trigger a release build

git tag v0.1.0
git push origin v0.1.0

That will run the Release workflow and create a draft release with platform installers.

Project Docs

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
docs		docs
public		public
src-tauri		src-tauri
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
inference-bridge.example.toml		inference-bridge.example.toml
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InferenceBridge

Screenshots

Download

Features

Quick Start

OpenAI-Compatible API

Core Endpoints

Example: List Models

Example: Load a Model

Example: Poll Model Status

Example: Chat Completion

Debug Workspace

Configuration

Building From Source

GitHub Actions

Trigger a normal build

Trigger a release build

Project Docs

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InferenceBridge

Screenshots

Download

Features

Quick Start

OpenAI-Compatible API

Core Endpoints

Example: List Models

Example: Load a Model

Example: Poll Model Status

Example: Chat Completion

Debug Workspace

Configuration

Building From Source

GitHub Actions

Trigger a normal build

Trigger a release build

Project Docs

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages