A sleek GUI frontend for llama.cpp
Search, download, and chat with local LLMs in one app.
Model Management
- Browse and download GGUF models directly from HuggingFace
- Sort by downloads, likes, date, or trending
- View available quantizations (Q4_K_M, Q8_0, IQ3_S, etc.) with file sizes
- Download with progress bar, speed display, ETA, and resume support
- Recursive model folder scanning with automatic detection
Server Control
- Launch and manage llama-server with full parameter control
- Or connect to an already-running server (any OpenAI-compatible endpoint)
- Auto-detect llama-server from PATH and common install locations
- Context size, GPU layers, threads, flash attention, mlock toggles
- Embedded server log viewer
Chat Interface
- Streaming responses with live token-by-token display
- Markdown rendering: code blocks, inline code, bold, italic
- Tokens/sec speed display during and after generation
- System prompt support
- Parameter presets: Default, Creative, Precise, Code, Roleplay
- Adjustable temperature, top_p, top_k, repeat penalty, max tokens
Chat History
- Auto-saves conversations locally
- Load, export (Markdown / JSON / Text), and delete past chats
Design
- Catppuccin Mocha dark theme throughout
- Responsive split-panel layout
- Window position and all settings persist between sessions
Download LlamaLink.exe from Releases and run it. No installation required.
git clone https://github.com/SysAdminDoc/LlamaLink.git
cd LlamaLink
python llamalink.pyDependencies (PyQt6, requests) are auto-installed on first run.
- Download a model - Go to the "Download Models" tab, search for a model (e.g.
llama,qwen,mistral), pick a quant, and download it - Set server path - Browse to your
llama-server.exe(auto-detected if on PATH) - Select model - Your downloaded model appears automatically in the dropdown
- Start server - Click "Start Server" and wait for the "Running" indicator
- Chat - Switch to the Chat tab and start talking
Uncheck "Launch server", enter the URL (e.g. http://127.0.0.1:8080), and click Connect. Works with any OpenAI-compatible API endpoint.
- llama.cpp - Download from llama.cpp releases
- Python 3.8+ (if running from source)
- NVIDIA GPU recommended (auto-detected, CPU-only works too)
Public models work without authentication. For gated/private models, set the HF_TOKEN environment variable:
set HF_TOKEN=hf_your_token_here
python llamalink.pypip install pyinstaller
pyinstaller llamalink.specThe executable will be in dist/LlamaLink.exe.
MIT