Skip to content

koboldcpp-1.37.1

Compare
Choose a tag to compare
@LostRuins LostRuins released this 24 Jul 15:22
· 2766 commits to concedo since this release

koboldcpp-1.37.1

  • NEW: KoboldCpp now comes with an embedded Horde Worker which allows anyone to share their ggml models with the AI Horde without downloading additional dependences. --hordeconfig now accepts 5 parameters [hordemodelname] [hordegenlength] [hordemaxctx] [hordeapikey] [hordeworkername], filling up all 5 will start a Horde worker for you that serves horde requests automatically in the background. For previous behavior, exclude the last 2 parameters to continue using your own Horde worker (e.g. HaidraScribe/KAIHordeBridge). This feature can also be enabled via the GUI.
  • Added Support for LLAMA2 70B models. This should work automatically, GQA will be set to 8 if it's detected.
  • Fixed a bug with mirostat v2 that was causing overly deterministic results. Please try it again. (Credit: @ycros)
  • Added addition information to /api/extra/perf for the last generation info, including the stopping reason as well as generated token counts.
  • Exposed the parameter for --tensor_split which works exactly like it does upstream. Only for CUDA.
  • Try to support Kepler as a target for CUDA as well on henky's suggestion, can't guarantee it will work as I don't have a K80, but it might.
  • Retained support for --blasbatchsize 1024 after it was removed upstream. Scratch & KV buffer sizes will be larger when using this.
  • Minor bugfixes, pulled other upstream fixes and optimizations, updated Kobold Lite (chat mode improvements)

Hotfix 1.37.1

  • Fixed clblast to work correctly for LLAMA2 70B
  • Fixed sending Client-Agent for embedded horde worker in addition to Bridge Agent and User Agent
  • Changed rms_norm_eps to 5e-6 for better results for both llama1 and 2
  • Fixed some streaming bugs in Lite

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.

Run it from the command line with the desired launch parameters (see --help), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program from command line with the --help flag.