koboldcpp-1.116

NEW: Added Krea 2 Turbo support (Recommended). This is an extremely powerful image generation model with incredible knowledge.
- Grab and load the Krea 2 Turbo template containing all necessary files here.
- Recommended settings for Krea 2 Turbo: 8 steps at 1.0 guidance cfg. Prompt with descriptive natural language.
NEW: Added Ideogram 4 support.
- Grab and load the Ideogram 4 template containing all necessary files here.
- Note: Ideogram4 prompts should be in JSON format. Read how to prompt it here.
- For newbies, Krea2 is recommended instead. Thanks @wbruna for the sync and testing.
NEW: Added Boogu Edit support. A simple but good image editing model.
- Grab and load the Boogu Edit template containing all necessary files here.
NEW: The built-in llama.cpp Web UI has been updated and now supports MCP servers, thanks to @henk717
Changed to attempt to use llama.cpp jinja tool call response parser first. This will allow better out-of-box support for new tool calling models. Please report any issues encountered.
Fix: MTP support has been fixed, and speeds should now be good across the board for both Qwen and Gemma assistant MTP.
Added a failsafe target for MacOS builds that does not use Metal, only Accelerate. This can be enabled with --failsafe for MacOS builds.
Fixed a regression with huggingface xet file download speed, this was needed due to changes on the HF side.
Added an option to save TTS outputs as MP3. This can be done from the API as well, by setting response_format to mp3 as per OpenAI docs.
Added an advanced CLI-only flag --allow-config-onready, which re-enables --onready commands from config files. This flag can only be manually triggered directly from the terminal and itself cannot be saved. This allows power users to run --onready commands when swapping .kcpps configs via API or the Web UI.
Fixed drafting during save and reload states, allow drafting when mmproj is loaded (but avoid using them together). Default draft amount set to 4.
Increase image gen prompt length limit to 3000 characters.
Cap n_outputs for MTP (thanks @Pento95)
Fixed a regression with Qwen VL mmproj vision quality.
Load image generation weights eagerly
Added Jinja toggle to the Quick Tab
Added experimental support for uploading reference audio for LTX2.3 (Audio-To-Video)
Updated Kobold Lite, multiple fixes and improvements
Merged fixes, new model support, and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Newer rolling experimental builds can be found here, these are auto-updated and may be unstable.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

koboldcpp-1.116

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

koboldcpp-1.116

Contributors

Uh oh!