v0.3.2: Patch Fix
Overview
- Fix CUDA kernel bug that cause
desc_act
andgroup_size
can't be used together - Improve user experience of manually installation
- Improve user experience of loading quantized model
- Add
perplexity_utils.py
to gracefully calculate PPL so that the result can be used to compare with other libraries fairly - Remove
save_dir
argument fromfrom_quantized
model, and now onlymodel_name_or_path
argument is supported in this method
Full Change Log
What's Changed
- Fix cuda bug by @qwopqwop200 in #202
- Fix
revision
and other huggingface_hub kwargs in .from_quantized() by @TheBloke in #205 - Change the install script so it attempts to build the CUDA extension in all cases by @TheBloke in #206
- Add a central version number by @TheBloke in #207
- Add Safetensors metadata saving, with some values saved to each .safetensor file by @TheBloke in #208
- [FEATURE] Implement perplexity metric to compare against llama.cpp by @casperbh96 in #166
- Fix error raised when CUDA kernels are not installed by @PanQiWei in #209
- Fix build on non-CUDA machines after #206 by @casperbh96 in #212
New Contributors
- @casperbh96 made their first contribution in #166
Full Changelog: v0.3.0...v0.3.2