Releases: bitsandbytes-foundation/bitsandbytes
Latest `main` wheel
Latest main
pre-release wheel
This pre-release contains the latest development wheels for all supported platforms, rebuilt automatically on every commit to the main
branch.
How to install:
Pick the correct command for your platform and run it in your terminal:
Linux (ARM/aarch64)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_aarch64.whl
Linux (x86_64)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-manylinux_2_24_x86_64.whl
Windows (x86_64)
pip install --force-reinstall https://github.com/bitsandbytes-foundation/bitsandbytes/releases/download/continuous-release_main/bitsandbytes-1.33.7.preview-py3-none-win_amd64.whl
Note:
These wheels are updated automatically with every commit tomain
and become available as soon as the python-package.yml workflow finishes.
The version number is replaced with 1.33.7-preview in order to keep the link stable, this however does not affect the installed version at all:
> pip install https://.../bitsandbytes-1.33.7-preview-py3-none-manylinux_2_24_x86_64.whl
Collecting bitsandbytes==1.33.7rc0
...
Successfully installed bitsandbytes-0.46.0.dev0
0.47.0
Highlights:
- FSDP2 compatibility for Params4bit (#1719)
- Bugfix for 4bit quantization with large block sizes (#1721)
- Further removal of previously deprecated code (#1669)
- Improved CPU coverage (#1628)
- Include NVIDIA Volta support in CUDA 12.8 and 12.9 builds (#1715)
What's Changed
- Enable CPU/XPU native and ipex path by @jiqing-feng in #1628
- Fix CI regression by @matthewdouglas in #1666
- Add CPU + IPEX to nightly CI by @matthewdouglas in #1667
- Fix params4bit passing bnb quantized by @mklabunde in #1665
- Deprecation cleanup by @matthewdouglas in #1669
- CI workflow: bump torch 2.7.0 to 2.7.1 by @matthewdouglas in #1670
- Improvement for torch.compile support on Params4bit by @matthewdouglas in #1673
- Fixed a bug in test_fw_bit_quant testing on CPU by @Egor-Krivov in #1675
- doc fix signature for 8-bit optim by @ved1beta in #1660
- Apply clang-format rules by @matthewdouglas in #1678
- Add clang-format by @matthewdouglas in #1677
- HPU (Intel gaudi) support for bnb unit tests by @ckvermaAI in #1680
- CI: Setup HPU nightly tests by @matthewdouglas in #1681
- Update test_kbit_backprop unit test by @ckvermaAI in #1682
- Update README.md by @matthewdouglas in #1684
- Enable ROCm backend with custom ops integration by @pnunna93 in #1683
- Fix AdamW documentation by @agupta2304 in #1686
- Make minor improvements to optimizer.py by @agupta2304 in #1687
- Add CUDA 12.9 build by @matthewdouglas in #1689
- CI: Test with PyTorch 2.8.0 RC by @matthewdouglas in #1693
- Automatically call CMake as part of PEP 517 build by @mgorny in #1512
- fix log by @jiqing-feng in #1697
- [XPU] Add inference benchmark for XPU by @Egor-Krivov in #1696
- Add kernel registration for 8bit and 32bit optimizers by @Egor-Krivov in #1706
- Create FUNDING.yml by @matthewdouglas in #1714
- Add Volta support in cu128/cu129 builds by @matthewdouglas in #1715
- Fix Params4bit tensor subclass handling by @ved1beta in #1719
- [CUDA] Fixing quantization uint8 packing bug for NF4 and FP4 by @Mhmd-Hisham in #1721
New Contributors
- @mklabunde made their first contribution in #1665
- @agupta2304 made their first contribution in #1686
- @mgorny made their first contribution in #1512
- @Mhmd-Hisham made their first contribution in #1721
Full Changelog: 0.46.0...0.47.0
0.46.1
What's Changed
- Fix params4bit passing bnb quantized by @mklabunde in #1665
- Improvement for torch.compile support on Params4bit by @matthewdouglas in #1673
- doc fix signature for 8-bit optim by @ved1beta in #1660
- Fix AdamW documentation by @agupta2304 in #1686
- Make minor improvements to optimizer.py by @agupta2304 in #1687
- Add CUDA 12.9 build by @matthewdouglas in #1689
- Automatically call CMake as part of PEP 517 build by @mgorny in #1512
New Contributors
- @mklabunde made their first contribution in #1665
- @agupta2304 made their first contribution in #1686
- @mgorny made their first contribution in #1512
Full Changelog: 0.46.0...0.46.1
0.46.0: torch.compile() support; custom ops refactor; Linux aarch64 wheels
Highlights
- Support for
torch.compile
without graph breaks for LLM.int8().- Compatible with PyTorch 2.4+, but PyTorch 2.6+ is recommended.
- Experimental CPU support is included.
- Support
torch.compile
without graph breaks for 4bit.- Compatible with PyTorch 2.4+ for
fullgraph=False
. - Requires PyTorch 2.8 nightly for
fullgraph=True
.
- Compatible with PyTorch 2.4+ for
- We are now publishing wheels for CUDA Linux aarch64 (sbsa)!
- Targets are Turing generation and newer: sm75, sm80, sm90, and sm100.
- PyTorch Custom Operators refactoring and integration:
- We have refactored most of the library code to integrate better with PyTorch via the
torch.library
and custom ops APIs. This helps enable ourtorch.compile
and additional hardware compatibility efforts. - End-users do not need to change the way they are using
bitsandbytes
.
- We have refactored most of the library code to integrate better with PyTorch via the
- Unit tests have been cleaned up for increased determinism and most are now device-agnostic.
- A new nightly CI runs unit tests for CPU (Windows x86-64, Linux x86-64/aarch64) and CUDA (Linux/Windows x86-64).
Compatability Changes
- Support for Python 3.8 is dropped.
- Support for PyTorch < 2.2.0 is dropped.
- CUDA 12.6 and 12.8 builds are now compatible for
manylinux_2_24
(previouslymanylinux_2_34
). - Many APIs that were previously marked as deprecated have now been removed.
- New deprecations:
- bnb.autograd.get_inverse_transform_indices()
- bnb.autograd.undo_layout()
- bnb.functional.create_quantile_map()
- bnb.functional.estimate_quantiles()
- bnb.functional.get_colrow_absmax()
- bnb.functional.get_row_absmax()
- bnb.functional.histogram_scatter_add_2d()
What's Changed
- PyTorch Custom Operator Integration by @matthewdouglas in #1544
- Bump CUDA 12.8.0 build to CUDA 12.8.1 by @matthewdouglas in #1575
- Drop Python 3.8 support. by @matthewdouglas in #1574
- Test cleanup by @matthewdouglas in #1576
- Fix: Return tuple in get_cuda_version_tuple by @DevKimbob in #1580
- Fix torch.compile issue for LLM.int8() with threshold=0 by @matthewdouglas in #1581
- fix for missing cpu lib by @Titus-von-Koeller in #1585
- Fix #1588 - torch compatability for <=2.4 by @matthewdouglas in #1590
- Add autoloading for backend packages by @matthewdouglas in #1593
- Specify blocksize by @cyr0930 in #1586
- fix typo getitem by @ved1beta in #1597
- fix: Improve CUDA version detection and error handling by @ved1beta in #1599
- Support LLM.int8() inference with torch.compile by @matthewdouglas in #1594
- Updates for device agnosticism by @matthewdouglas in #1601
- Stop building for CUDA toolkit < 11.8 by @matthewdouglas in #1605
- fix intel cpu/xpu installation by @jiqing-feng in #1613
- Support 4bit torch.compile fullgraph with PyTorch nightly by @matthewdouglas in #1616
- Improve torch.compile support for int8 with torch>=2.8 nightly by @matthewdouglas in #1617
- Add simple op implementations for CPU by @matthewdouglas in #1602
- Set up nightly CI for unit tests by @matthewdouglas in #1619
- point to correct latest continuous release main by @winglian in #1621
- ARM runners (faster than cross compilation qemu) by @johnnynunez in #1539
- Linux aarch64 CI updates by @matthewdouglas in #1622
- Moved int8_mm_dequant from CPU to default backend by @Egor-Krivov in #1626
- Refresh content for README.md by @matthewdouglas in #1620
- C lib loading: add fallback with sensible error msg by @Titus-von-Koeller in #1615
- Switch CUDA builds to use Rocky Linux 8 container by @matthewdouglas in #1638
- Improvements to test suite by @matthewdouglas in #1636
- Additional CI runners by @matthewdouglas in #1639
- CI runner updates by @matthewdouglas in #1643
- Optimizer backwards compatibility fix by @matthewdouglas in #1647
- General cleanup & test improvements by @matthewdouglas in #1646
- Add torch.compile tests by @matthewdouglas in #1648
- Documentation Cleanup by @matthewdouglas in #1644
- simplified non_sign_bits by @ved1beta in #1649
New Contributors
- @DevKimbob made their first contribution in #1580
- @cyr0930 made their first contribution in #1586
- @ved1beta made their first contribution in #1597
- @winglian made their first contribution in #1621
- @Egor-Krivov made their first contribution in #1626
Full Changelog: 0.45.4...0.46.0
Multi-Backend Preview
continuous-release_multi-backend-refactor update compute_type_is_set attr (#1623)
0.45.5
This is a minor release that affects CPU-only usage of bitsandbytes. The CPU build of the library was inadvertently omitted from the v0.45.4 wheels.
Full Changelog: 0.45.4...0.45.5
0.45.4
This is a minor release that affects CPU-only usage of bitsandbytes. There is one bugfix and improved system compatibility on Linux.
What's Changed
- Build: use ubuntu-22.04 instead of 24.04 for CPU build (glibc compat) by @matthewdouglas in #1538
- Fix CPU dequantization to use nested dequantized scaling constant by @zyklotomic in #1549
New Contributors
- @zyklotomic made their first contribution in #1549
Full Changelog: 0.45.3...0.45.4
0.45.3
Overview
This is a small patch release containing a few bug fixes.
Additionally, this release contains a CUDA 12.8 build which adds the sm100 and sm120 targets for NVIDIA Blackwell GPUs.
What's Changed
- Fix #1490 by @matthewdouglas in #1496
- Blackwell binaries! by @johnnynunez in #1491
- Bug fix: Update create_dynamic_map to always return a float32 tensor by @mitchellgoffpc in #1521
- Update cuda versions in error messages by @FxMorin in #1520
- QuantState.to(): move code tensor with others to correct device by @matthewdouglas in #1528
- Installation doc updates by @matthewdouglas in #1529
New Contributors
- @mitchellgoffpc made their first contribution in #1521
- @FxMorin made their first contribution in #1520
Full Changelog: 0.45.2...0.45.3
0.45.2
This patch release fixes a compatibility issue with Triton 3.2 in PyTorch 2.6. When importing bitsandbytes
without any GPUs visible in an environment with Triton installed, a RuntimeError may be raised:
RuntimeError: 0 active drivers ([]). There should only be one.
Full Changelog: 0.45.1...0.45.2
0.45.1
Overview
This is a patch release containing compatibility fixes.
Highlights
- Compatibility for
triton>=3.2.0
- Moved package configuration to
pyproject.toml
- Build system: initial support for NVIDIA Blackwell B100 GPUs, RTX 50 Blackwell series GPUs and Jetson Thor Blackwell.
- Note: Binaries built for these platforms are not included in this release. They will be included in future releases upon the availability of the upcoming CUDA Toolkit 12.7 and 12.8.
- Packaging: wheels will no longer include unit tests. (#1478)
- Sets the minimum PyTorch version to 2.0.0.
What's Changed
- Add installation doc for bnb on Ascend NPU by @ji-huazhong in #1442
- (chore) Remove unused dotfiles by @matthewdouglas in #1445
- Remove triton.ops, copy necessary bits here by @bertmaher in #1413
- chore: migrate config files to
pyproject.toml
by @SauravMaheshkar in #1373 - cleanup: remove unused kernels/C++ code by @matthewdouglas in #1458
- (Deps) Require torch 2.x and minor updates by @matthewdouglas in #1459
- FSDP-QLoRA doc updates for TRL integration by @blbadger in #1471
- Initial support blackwell by @johnnynunez in #1481
- (build) include Ada/Hopper targets in cu118 build by @matthewdouglas in #1487
- Exclude tests from distribution by @akx in #1486
New Contributors
- @bertmaher made their first contribution in #1413
- @SauravMaheshkar made their first contribution in #1373
- @blbadger made their first contribution in #1471
- @johnnynunez made their first contribution in #1481
Full Changelog: 0.45.0...0.45.1