LlamaLink

A sleek GUI frontend for llama.cpp
Search, download, and chat with local LLMs in one app.

Features

Model Management

Browse and download GGUF models directly from HuggingFace
Sort by downloads, likes, date, or trending
View available quantizations (Q4_K_M, Q8_0, IQ3_S, etc.) with file sizes
Download with progress bar, speed display, ETA, and resume support
Recursive model folder scanning with automatic detection

Server Control

Launch and manage llama-server with full parameter control
Or connect to an already-running server (any OpenAI-compatible endpoint)
Auto-detect llama-server from PATH and common install locations
Context size, GPU layers, threads, flash attention, mlock toggles
Embedded server log viewer

Chat Interface

Streaming responses with live token-by-token display
Markdown rendering: code blocks, inline code, bold, italic
Tokens/sec speed display during and after generation
System prompt support
Parameter presets: Default, Creative, Precise, Code, Roleplay
Adjustable temperature, top_p, top_k, repeat penalty, max tokens

Chat History

Auto-saves conversations locally
Load, export (Markdown / JSON / Text), and delete past chats

Design

Catppuccin Mocha dark theme throughout
Responsive split-panel layout
Window position and all settings persist between sessions

Installation

Portable EXE (Recommended)

Download LlamaLink.exe from Releases and run it. No installation required.

From Source

git clone https://github.com/SysAdminDoc/LlamaLink.git
cd LlamaLink
python llamalink.py

Dependencies (PyQt6, requests) are auto-installed on first run.

Quick Start

Download a model - Go to the "Download Models" tab, search for a model (e.g. llama, qwen, mistral), pick a quant, and download it
Set server path - Browse to your llama-server.exe (auto-detected if on PATH)
Select model - Your downloaded model appears automatically in the dropdown
Start server - Click "Start Server" and wait for the "Running" indicator
Chat - Switch to the Chat tab and start talking

Connecting to an Existing Server

Uncheck "Launch server", enter the URL (e.g. http://127.0.0.1:8080), and click Connect. Works with any OpenAI-compatible API endpoint.

Requirements

llama.cpp - Download from llama.cpp releases
Python 3.8+ (if running from source)
NVIDIA GPU recommended (auto-detected, CPU-only works too)

HuggingFace Token

Public models work without authentication. For gated/private models, set the HF_TOKEN environment variable:

set HF_TOKEN=hf_your_token_here
python llamalink.py

Building from Source

pip install pyinstaller
pyinstaller llamalink.spec

The executable will be in dist/LlamaLink.exe.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
assets		assets
src		src
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
gen_icon.py		gen_icon.py
llamalink.py		llamalink.py
llamalink.spec		llamalink.spec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LlamaLink

Features

Installation

Portable EXE (Recommended)

From Source

Quick Start

Connecting to an Existing Server

Requirements

HuggingFace Token

Building from Source

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LlamaLink

Features

Installation

Portable EXE (Recommended)

From Source

Quick Start

Connecting to an Existing Server

Requirements

HuggingFace Token

Building from Source

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages