Skip to content

v0.3.2: Patch Fix

Compare
Choose a tag to compare
@PanQiWei PanQiWei released this 26 Jul 11:25
· 247 commits to main since this release

Overview

  • Fix CUDA kernel bug that cause desc_act and group_size can't be used together
  • Improve user experience of manually installation
  • Improve user experience of loading quantized model
  • Add perplexity_utils.py to gracefully calculate PPL so that the result can be used to compare with other libraries fairly
  • Remove save_dir argument from from_quantized model, and now only model_name_or_path argument is supported in this method

Full Change Log

What's Changed

  • Fix cuda bug by @qwopqwop200 in #202
  • Fix revision and other huggingface_hub kwargs in .from_quantized() by @TheBloke in #205
  • Change the install script so it attempts to build the CUDA extension in all cases by @TheBloke in #206
  • Add a central version number by @TheBloke in #207
  • Add Safetensors metadata saving, with some values saved to each .safetensor file by @TheBloke in #208
  • [FEATURE] Implement perplexity metric to compare against llama.cpp by @casperbh96 in #166
  • Fix error raised when CUDA kernels are not installed by @PanQiWei in #209
  • Fix build on non-CUDA machines after #206 by @casperbh96 in #212

New Contributors

  • @casperbh96 made their first contribution in #166

Full Changelog: v0.3.0...v0.3.2