Skip to content

Prompt Assistant

Mooshieblob1 edited this page Jun 19, 2026 · 1 revision

Prompt Assistant

MooshieUI ships a local LLM (a llama.cpp GGUF server) that helps you write prompts. It runs on your own machine, so prompts are not sent to any external service.

What it does

  • Enhance - rewrites and improves your existing prompt.
  • Compose - generates a prompt from a plain-language description. You choose a length (short, medium, or detailed) and whether to include artist tags.

How it runs

  • The assistant starts a local llama.cpp server on demand.
  • It is hardware-aware: it offloads layers to the GPU when there is room and falls back to CPU if ComfyUI is already using your VRAM.
  • An idle watchdog stops the server automatically when it has not been used for a while, freeing memory.

Hosted / Docker notes

  • In GPU Docker builds you can point the assistant at a specific llama.cpp binary directory with the MOOSHIEUI_LLAMA_BIN_DIR environment variable.
  • On multi-GPU hosts you may want to pin the llama server to a specific device (for example CUDA_VISIBLE_DEVICES) so it does not compete with ComfyUI.
  • The model used by the assistant is configured by the deployment. If no assistant model is configured, a fallback tag-generation model may load instead.

See also

  • Prompting Guide for wildcards, presets, scheduling, and the rest of the prompt system.

Clone this wiki locally