Skip to content

koboldcpp-1.116

Latest

Choose a tag to compare

@LostRuins LostRuins released this 27 Jun 13:56

koboldcpp-1.116

image
  • NEW: Added Krea 2 Turbo support (Recommended). This is an extremely powerful image generation model with incredible knowledge.
  • NEW: Added Ideogram 4 support.
  • NEW: Added Boogu Edit support. A simple but good image editing model.
  • NEW: The built-in llama.cpp Web UI has been updated and now supports MCP servers, thanks to @henk717
  • Changed to attempt to use llama.cpp jinja tool call response parser first. This will allow better out-of-box support for new tool calling models. Please report any issues encountered.
  • Fix: MTP support has been fixed, and speeds should now be good across the board for both Qwen and Gemma assistant MTP.
  • Added a failsafe target for MacOS builds that does not use Metal, only Accelerate. This can be enabled with --failsafe for MacOS builds.
  • Fixed a regression with huggingface xet file download speed, this was needed due to changes on the HF side.
  • Added an option to save TTS outputs as MP3. This can be done from the API as well, by setting response_format to mp3 as per OpenAI docs.
  • Added an advanced CLI-only flag --allow-config-onready, which re-enables --onready commands from config files. This flag can only be manually triggered directly from the terminal and itself cannot be saved. This allows power users to run --onready commands when swapping .kcpps configs via API or the Web UI.
  • Fixed drafting during save and reload states, allow drafting when mmproj is loaded (but avoid using them together). Default draft amount set to 4.
  • Increase image gen prompt length limit to 3000 characters.
  • Cap n_outputs for MTP (thanks @Pento95)
  • Fixed a regression with Qwen VL mmproj vision quality.
  • Load image generation weights eagerly
  • Added Jinja toggle to the Quick Tab
  • Added experimental support for uploading reference audio for LTX2.3 (Audio-To-Video)
  • Updated Kobold Lite, multiple fixes and improvements
  • Merged fixes, new model support, and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Newer rolling experimental builds can be found here, these are auto-updated and may be unstable.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.