Skip to content

koboldcpp-1.115

Latest

Choose a tag to compare

@LostRuins LostRuins released this 12 Jun 13:44

koboldcpp-1.115

sad.mp4

(Base Image = Klein 9B, To Video = LTX2.3, Add Music = AceStep 1.5, all 100% done in Kobold)

  • NEW: Full support for the Anthropic /v1/messages API has been implemented, including multimodal and tool calling support.
  • Added an easy toggle for default reasoning effort in the GUI launcher, can also be set with --reasoningeffort. This uses --gendefaults under the hood to set API defaults.
  • Added support for the Gemma4 UV (12B) models and their vision/audio mmprojs.
  • Improvements for smartcache implementation for RNN models.
  • MTP and Gemma assistant models are now supported.
    • To enable MTP if it's built into the model, use --usemtp or the GUI launcher toggle. Otherwise, simply selecting the MTP drafter (e.g. gemma-4-26b-A4B-it-assistant-Q4_0.gguf as a --draftmodel along with the main model e.g. gemma-4-26B-A4B-it-UD-Q4_K_M.gguf is sufficient.
    • To see the drafting hitrate, use --debugmode. A good sample prompt is "Give me the first 100 integers".
  • Increased max prompt length for image gen. Increased launcher context size defaults to 16k.
  • Updates and various image gen fixes from @wbruna
  • Accept more SSL file extensions
  • Video generation reference image system overhauled.
    • You can now select videos to end on a specific frame in SDUI Img2Img, as well as specify both a start and end frame by uploaded 2 reference images in txt2img.
    • The reference image selector in SDUI has been overhauled and should work much better now.
    • Lower default VAE tiling threshold to 640, but you might want to lower it even more on low VRAM systems
    • Allow selecting up to 32 FPS if extended limits selected in settings.
  • NEW: Allow limiting VRAM for image and video generation with --sdvramlimit. This will guarantee that you can load any image gen model by splitting the compute graph.
  • If music gen was used without a --musicllm, fall back to a loaded text LLM.
  • Reordered CLI args to be more alphabetical
  • Retire the clip_quantize tool, since it can now be done with the regular gguf_quantize
  • Updated Kobold Lite, multiple fixes and improvements
    • Fixed a bug with incorrect multimodal images/audio order sent to backend
  • Merged fixes, new model support, and improvements from upstream

Download and run the koboldcpp.exe (Windows) or koboldcpp-linux-x64 (Linux), which is a one-file pyinstaller for NVIDIA GPU users.
If you have an older CPU or older NVIDIA GPU and koboldcpp does not work, try oldpc version instead (Cuda11 + AVX1).
If you don't have an NVIDIA GPU, or do not need CUDA, you can use the nocuda version which is smaller.
If you're using AMD, we recommend trying the Vulkan option in the nocuda build first, for best support. Alternatively, you can download our rolling ROCm binary here if you use Linux.
If you're on a modern MacOS (M-Series) you can use the koboldcpp-mac-arm64 MacOS binary.
Click here for .gguf conversion and quantization tools
Newer rolling experimental builds can be found here, these are auto-updated and may be unstable.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag. You can also refer to the readme and the wiki.