An opinionated CLI for managing local LLM inference. Handles vLLM (systemd) and NIM (Docker) model switching, live health checks, and registry searches.
- Two provider types —
vllm(systemd service + env-file symlink) andnim(Docker, TensorRT-LLM) - Interactive TUI — fuzzy model picker and multi-step add wizard (bubbletea)
- Atomic symlink swap — zero-gap model.env rotation
- Registry search — HuggingFace and NGC API
- Validation — quantization mismatch, GPU memory, served-model-name alias checks
- Privilege escalation — prompts for
sudoif not running as root (likesystemctl) - State tracking — persists active model, provider, and container ID across reboots
# .deb (Ubuntu/Debian)
curl -LO https://github.com/DavidXArnold/marlin/releases/latest/download/marlin_linux_arm64.deb
sudo dpkg -i marlin_linux_arm64.deb
# .rpm (RHEL/Fedora)
curl -LO https://github.com/DavidXArnold/marlin/releases/latest/download/marlin_linux_arm64.rpm
sudo rpm -i marlin_linux_arm64.rpmgit clone https://github.com/DavidXArnold/marlin.git
cd marlin
make installCopy the example config and edit for your environment:
sudo cp configs/marlin.toml.example /etc/marlin/config.toml
sudo $EDITOR /etc/marlin/config.tomlKey paths (all overridable in config):
| Setting | Default | Purpose |
|---|---|---|
paths.models_dir |
/etc/marlin/models |
TOML model configs and rendered .env files |
paths.active_symlink |
/etc/marlin/model.env |
Symlink pointing at the active model's env file |
paths.secrets_env |
/etc/marlin/secrets.env |
HF_TOKEN and NGC_API_KEY |
paths.state_file |
/var/lib/marlin/state.toml |
Active model/provider state |
paths.nim_cache |
/var/cache/nim |
Host path mounted into NIM containers |
server.alias |
gn100 |
Served-model-name alias expected by clients |
Secrets file format (/etc/marlin/secrets.env):
HF_TOKEN=hf_...
NGC_API_KEY=nvapi-...
SLUG TYPE STATUS MODEL ID
---- ---- ------ --------
qwen25-72b-awq vllm working Qwen/Qwen2.5-72B-Instruct-AWQ ◀ active
llama-3.1-8b-nim nim untested nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
Switch the active inference model. Presents an interactive fuzzy picker when no
argument is given. Prompts for sudo if not already root, then:
- Validates the target model config
- Writes the rendered
.envfile tomodels_dir - Atomically replaces the
active_symlink - Restarts the vLLM systemd unit (or stops the old NIM container and starts the new one)
marlin switch qwen25-72b-awq
marlin switch # interactive pickerInteractive wizard for creating a new model config. Steps:
provider type → model ID or NIM image → slug → quantization → GPU memory → served names → confirm.
Writes a .toml file to paths.models_dir.
marlin add
marlin add Qwen/Qwen2.5-72B-Instruct-AWQRun validation checks without switching.
marlin validate qwen25-72b-awq
# [warn] serve.gpu_memory_utilization 0.970 is very high (>0.95)active model : qwen25-72b-awq
provider : vllm
api health : ready at http://localhost:8000/v1
Stream inference service logs via journalctl (vLLM) or docker logs (NIM).
marlin logs -f
marlin logs --lines 200Search HuggingFace and NGC for models.
marlin search "Qwen 72B"
marlin search --registry ngc llamaEach model is a TOML file in paths.models_dir. Example for a vLLM model:
[model]
id = "Qwen/Qwen2.5-72B-Instruct-AWQ"
type = "vllm"
status = "working"
notes = "Best for tool-calling on GN100"
[serve]
quantization = "awq_marlin"
gpu_memory_utilization = 0.90
max_model_len = 131072
served_model_name = ["gn100", "qwen25-72b"]
tool_call_parser = "hermes"For a NIM model:
[model]
image = "nvcr.io/nim/meta/llama-3.1-8b-instruct:latest"
type = "nim"
status = "untested"make test # run tests
make coverage # coverage report + gate (85%)
make coverage-html # open HTML report
make lint # golangci-lint
make check # lint + coverage (CI target)
make build # compile to bin/marlinIntegration tests that require a running server are tagged Integration and skipped by default:
MARLIN_TEST_HOST=localhost:8000 make integration# Requires a running inference server
MARLIN_TEST_MODEL=qwen25-72b-awq make e2ecmd/ Cobra commands (switch, add, list, validate, status, logs, search)
internal/
config/ Global config + per-model TOML schema
provider/ Provider interface + VLLMProvider + NIMProvider
service/ Systemd wrapper (IsActive, Restart, Stop, Logs)
state/ Persistent state (active model, provider, container ID)
ui/ bubbletea TUI (fuzzy picker, add wizard, confirm)
validate/ Model config validation (quantization, GPU mem, aliases)
registry/ HuggingFace + NGC registry clients
secrets/ Dotenv secrets loader
privilege/ Sudo re-exec escalation
vllm/ OpenAI-compatible health + model list client
pkg/
render/ Env file renderer (model config → KEY=VALUE)
GNU General Public License v3.0 — see LICENSE.
