v0.3.2: Patch Fix

PanQiWei released this 26 Jul 11:25

· 247 commits to main since this release

Overview

Fix CUDA kernel bug that cause desc_act and group_size can't be used together
Improve user experience of manually installation
Improve user experience of loading quantized model
Add perplexity_utils.py to gracefully calculate PPL so that the result can be used to compare with other libraries fairly
Remove save_dir argument from from_quantized model, and now only model_name_or_path argument is supported in this method

Full Change Log

What's Changed

Fix cuda bug by @qwopqwop200 in #202
Fix revision and other huggingface_hub kwargs in .from_quantized() by @TheBloke in #205
Change the install script so it attempts to build the CUDA extension in all cases by @TheBloke in #206
Add a central version number by @TheBloke in #207
Add Safetensors metadata saving, with some values saved to each .safetensor file by @TheBloke in #208
[FEATURE] Implement perplexity metric to compare against llama.cpp by @casperbh96 in #166
Fix error raised when CUDA kernels are not installed by @PanQiWei in #209
Fix build on non-CUDA machines after #206 by @casperbh96 in #212

New Contributors

@casperbh96 made their first contribution in #166

Full Changelog: v0.3.0...v0.3.2

Contributors

TheBloke, casper-hansen, and 2 other contributors

Assets 14