Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions by TimDettmers · Pull Request #1871 · bitsandbytes-foundation/bitsandbytes

TimDettmers · 2026-02-16T13:22:23Z

These APIs were formally deprecated in v0.45.0 (December 2024) as part of the LLM.int8() refactoring in PR #1401. They have been emitting FutureWarning at runtime for over a year. Two prior cleanup rounds already removed the easier items: v0.47.0 removed arange, _mul, get_special_format_str, get_tensor_stream, pre_call/post_call, and the layout transform functions; v0.49.0 removed igemmlt, mm_dequant, double_quant, vectorwise_quant/vectorwise_dequant/vectorwise_mm_dequant, dequant_min_max, extract_outliers, and pipeline_test. This PR removes everything that remains.

43 files changed, ~30 lines added, ~3,190 lines removed. All pre-commit hooks pass. The full removal was verified by importing the package and confirming every deleted symbol is gone while all live functions (quantize_blockwise, dequantize_blockwise, optimizer_update_8bit_blockwise, int8_double_quant, etc.) remain intact.

Removed: Legacy dynamic quantization functions

quantize(), dequantize(), quantize_no_absmax(), and dequantize_no_absmax() in bitsandbytes/functional.py implemented an older dynamic 8-bit quantization scheme that used a single global absmax value for the entire tensor. This approach was superseded by blockwise quantization (quantize_blockwise / dequantize_blockwise), which quantizes independent blocks of 256 elements each, significantly reducing the impact of outliers. The _no_absmax variants were purely internal — their only caller was the quantize()/dequantize() wrapper pair. The only production code that still called these was the research FP8 matmul module, which is also removed in this PR. The functions, their @deprecated decorators, and the typing_extensions.deprecated import are all deleted.

Removed: Non-blockwise 8-bit optimizer path

optimizer_update_8bit() was the non-blockwise 8-bit optimizer update function. It dispatched to legacy C kernels (cadam_static_8bit_grad_32, etc.) that quantized the entire optimizer state with a single global max value. The modern replacement, optimizer_update_8bit_blockwise(), uses per-block absmax arrays (block size 256) for much better numerical accuracy. The function, its str2optimizer8bit C function dispatch table, and both call sites in Optimizer2State.update_step() and Optimizer1State.update_step() are removed. The non-blockwise state initialization code (scalar max1/new_max1/max2/new_max2 tensors) is also removed from both init_state() methods. LAMB and LARS previously defaulted to block_wise=False; they now use blockwise quantization like every other optimizer. The block_wise parameter itself is removed from all optimizer constructors since there is only one path now.

Removed: Percentile clipping

percentile_clipping() tracked a rolling window of the last 100 gradient norms and clipped gradients at a user-specified percentile to improve training stability. It called into CUDA C kernels (cpercentile_clipping_g32/g16) that have no triton or multi-backend equivalent, making it a CUDA-only feature that couldn't be extended to other backends. The function was already decorated @deprecated and emitting warnings. The percentile_clipping parameter is removed from all ~33 optimizer class constructors across 9 files (adam.py, adamw.py, adagrad.py, lamb.py, lars.py, lion.py, rmsprop.py, sgd.py, ademamix.py), the two base classes (Optimizer2State, Optimizer1State), the get_config() method, and the optimizer documentation.

Removed: `bitsandbytes.research` module

The research module contained experimental FP8 matmul implementations (matmul_fp8_mixed, matmul_fp8_global) and their corresponding autograd classes (MatMulFP8Mixed, MatMulFP8Global), plus the switchback_bnb function and SwitchBackBnb autograd class. These were introduced in v0.38.1 as research prototypes for fake-FP8 quantized training and Int8 SwitchBack layers. They were never promoted to stable API, and their tests were already marked @pytest.mark.deprecated and excluded from the default test suite (some were also @pytest.mark.skip). The entire bitsandbytes/research/ directory is deleted, and the research import is removed from bitsandbytes/__init__.py.

Removed: SwitchBack linear layers and triton kernels

SwitchBackLinearBnb in bitsandbytes/nn/modules.py was a non-triton SwitchBack layer that called bnb.matmul_mixed() — a function that doesn't exist in the public namespace, making the class effectively broken. SwitchBackLinear, SwitchBackLinearGlobal, SwitchBackLinearVectorwise, and StandardLinear in bitsandbytes/nn/triton_based_modules.py were triton-based SwitchBack variants. All are deleted along with their exports from bitsandbytes/nn/__init__.py. The bitsandbytes/triton/ directory contained the underlying triton kernels (int8_matmul_mixed_dequantize, quantize_rowwise, etc.); with all SwitchBack consumers removed, these kernels are orphaned and are also deleted. The benchmarking/switchback/ directory is removed as well.

Deprecated: Dead `MatmulLtState` fields

CxB, CxBt, formatB, and _tile_indices on the MatmulLtState class were vestiges of the old col32/ColAmpere tensor layout system that was removed in the v0.45.0 int8 refactoring. These fields have been always None (or unused) since then. They are now deprecated via __getattr__: accessing them returns None and emits a FutureWarning. They will be fully removed in the next bitsandbytes release. A stale comment referencing CxB in MatMul4Bit is cleaned up.

Both TGI and vLLM access state.CxB in their 8-bit inference paths. PRs have been opened to remove this dead code from both projects:

Test and documentation cleanup

tests/test_deprecated.py (4 tests covering quantize/dequantize, percentile clipping, FP8 matmul, and FP8 linear) and tests/test_triton.py (1 test covering SwitchBackLinear) are deleted entirely. test_adam_percentile_clipping is removed from test_optim.py. The switchback_bnb parametrization is removed from test_matmullt in test_autograd.py. Stale block_wise=True and block_wise=False kwargs are cleaned from optimizer test constructors. test_int8_double_quant in test_functional.py was incorrectly marked @pytest.mark.deprecated despite testing the live int8_double_quant function — the marker is removed so it runs in the default suite. The deprecated pytest marker definition is removed from pyproject.toml, and docs/source/optimizers.mdx is updated to remove the percentile clipping example.

Breaking changes

This is a breaking change for anyone who:

Called F.quantize(), F.dequantize(), F.percentile_clipping(), or F.optimizer_update_8bit() directly
Used any bnb.research.* API
Used bnb.nn.SwitchBackLinear* or bnb.nn.SwitchBackLinearBnb
Passed percentile_clipping= or block_wise= to any optimizer constructor
Used LAMB8bit or LARS8bit and relied on the non-blockwise default

All of these have been emitting FutureWarning since v0.45.0 (December 2024), or were already non-functional.

Test plan

pre-commit run --all-files passes (ruff, ruff-format, typos, clang-format, trailing whitespace, etc.)
Package imports cleanly (import bitsandbytes as bnb)
All deleted symbols confirmed absent; all live symbols confirmed present
Optimizer constructors no longer accept percentile_clipping or block_wise
CI test suite passes

🤖 Generated with Claude Code

github-actions · 2026-02-16T13:26:09Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

TimDettmers

PR Review: #1871 — Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions

Classification: Deprecation/Removal (major)
Size: Very large (43 files, +32 / -3191 lines), but almost entirely deletions of already-deprecated code
Author: TimDettmers (maintainer)

Comprehensive removal of all remaining deprecated symbols that have been emitting FutureWarning since v0.45.0 (December 2024). This is the third and final cleanup round following v0.47.0 and v0.49.0. The scope is well-documented in the PR body, CI is fully green across all platforms, and the commit is clean.

Blocking issue (1):

1. Removing MatmulLtState.CxB will crash TGI and vLLM at runtime

The PR removes the CxB, CxBt, and formatB attributes from the MatmulLtState dataclass. While these are indeed dead within bitsandbytes itself (never assigned a non-None value since v0.45.0), both TGI and vLLM still access state.CxB in their 8-bit inference forward paths:

TGI (server/text_generation_server/layers/bnb.py):

if self.state.CB is not None and self.state.CxB is not None:
    del self.state.CB
    self.weight.data = self.state.CxB

vLLM (vllm/model_executor/layers/quantization/bitsandbytes.py):

and matmul_states[i].CxB is not None
):
    del matmul_states[i].CB
    qweight[offsets[i]:offsets[i+1]] = matmul_states[i].CxB

Today, CxB is always None so these branches never execute, but the attribute access itself (state.CxB) will raise AttributeError after this PR. Both projects would crash on any 8-bit inference call.

Recommendation: Keep CxB, CxBt, and formatB as deprecated stub attributes on MatmulLtState (e.g. CxB: Optional[torch.Tensor] = None # Deprecated: always None, kept for downstream compat), or coordinate removal with TGI and vLLM. Alternatively, override __getattr__ to return None for these removed attributes. This is the only change needed — the rest of the PR is clean.

Downstream Impact

Risk level: MEDIUM (upgradable to LOW if CxB is retained)

Affected APIs:

F.quantize(), F.dequantize(), F.quantize_no_absmax(), F.dequantize_no_absmax() — not used by any downstream project
F.optimizer_update_8bit() — not used by any downstream project
F.percentile_clipping() — not used by any downstream project
percentile_clipping and block_wise optimizer constructor params — not passed by Transformers trainer
bitsandbytes.research.* — not used by any downstream project
bnb.nn.SwitchBackLinear*, bnb.nn.StandardLinear — not used by any downstream project
MatmulLtState.CxB — accessed by TGI and vLLM (see blocking issue)
MatmulLtState.formatB — not accessed by any downstream project
MatmulLtState._tile_indices / tile_indices property — not accessed by any downstream project

Affected projects:

Transformers: Not affected. Does not use any removed APIs. Optimizer calls do not pass percentile_clipping or block_wise.
PEFT: Not affected. Does not use any removed APIs.
Accelerate: Not affected. Does not use any removed APIs.
TGI: Affected — accesses state.CxB in 8-bit forward path. Will crash with AttributeError.
vLLM: Affected — accesses matmul_states[i].CxB in 8-bit forward path. Will crash with AttributeError.

Recommendation: Retain CxB (and optionally CxBt, formatB) as deprecated stubs on MatmulLtState for at least one release cycle, then coordinate removal with TGI/vLLM. The rest of the removal is safe to merge.

Suggestions (non-blocking):

LAMB/LARS behavior change: The PR description notes that LAMB and LARS previously defaulted to block_wise=False and now use blockwise quantization. This is a subtle behavioral change for users of LAMB8bit and LARS8bit — the numerics will differ. The PR description documents this as a breaking change, which is appropriate. Consider adding a note to the release notes that LAMB8bit/LARS8bit users may see different training dynamics.
test_int8_double_quant marker fix: Good catch removing the incorrect @pytest.mark.deprecated marker from a test that covers live functionality. This ensures it runs in the default test suite.

Cross-PR conflicts:

PR #1869 (Fix GlobalOptimManager.override_config): Touches bitsandbytes/optim/optimizer.py and tests/test_optim.py. The changes are in different sections of get_config() — #1869 adds a pid2config fallback, while this PR removes percentile_clipping/block_wise lines. Trivial merge conflict, easily resolved.
PR #1861 (Fix AdEMAMix scheduler guard): Touches bitsandbytes/optim/ademamix.py and bitsandbytes/optim/optimizer.py. Changes t_alpha/t_beta3 defaults in get_config(), while this PR removes percentile_clipping/block_wise from the same method. Trivial merge conflict.
Recommend merging #1869 and #1861 first (smaller, independent fixes), then rebasing this PR.

Security: Clear (maintainer PR, pure deletion of deprecated code)
Downstream impact: MEDIUM (TGI/vLLM CxB access — see blocking issue)
Tests: Adequate (deprecated tests removed, live tests retained, test_int8_double_quant marker fixed)
CI: All checks pass (builds, lint, CPU tests, CUDA tests across 6 GPU/CUDA configs, ROCm, XPU, Windows)
Serialization: No impact (removed APIs do not affect Params4bit, Int8Params, QuantState, or checkpoint formats)
Cross-PR conflicts: Trivial overlaps with #1869 and #1861 on optimizer.py
Commit hygiene: Single well-structured commit with comprehensive description

bitsandbytes/autograd/_functions.py

The `MatmulLtState.CxB` attribute has been always `None` since bitsandbytes v0.45.0 (December 2024), when the col32/ColAmpere tensor layout system was removed. The conditional blocks that checked `state.CxB is not None` have therefore never executed. bitsandbytes is removing the `CxB` attribute entirely in an upcoming release (see bitsandbytes-foundation/bitsandbytes#1871), which would cause an AttributeError here. This commit removes the dead code proactively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The `MatmulLtState.CxB` attribute has been always `None` since bitsandbytes v0.45.0 (December 2024), when the col32/ColAmpere tensor layout system was removed. The conditional block that checked `matmul_states[i].CxB is not None` has therefore never executed. bitsandbytes is removing the `CxB` attribute entirely in an upcoming release (see bitsandbytes-foundation/bitsandbytes#1871), which would cause an AttributeError here. This commit removes the dead code proactively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…d legacy quantization functions Remove all remaining deprecated code that has been emitting FutureWarning since v0.45.0 (December 2024). Two prior cleanup rounds (v0.47.0, v0.49.0) already removed the easier items; this finishes the job. - Delete quantize(), dequantize(), quantize_no_absmax(), dequantize_no_absmax(), optimizer_update_8bit(), percentile_clipping(), and the str2optimizer8bit dispatch table from functional.py - Remove the non-blockwise 8-bit optimizer path from Optimizer2State and Optimizer1State; LAMB/LARS now use blockwise quantization - Remove percentile_clipping and block_wise parameters from all ~33 optimizer class constructors - Delete bitsandbytes/research/ (FP8 matmul, SwitchBack) - Delete bitsandbytes/nn/triton_based_modules.py, SwitchBackLinearBnb, and the orphaned bitsandbytes/triton/ kernel directory - Remove dead MatmulLtState fields (CxB, CxBt, formatB, _tile_indices) - Delete test_deprecated.py, test_triton.py; clean test_autograd.py, test_optim.py, test_functional.py - Remove benchmarking/switchback/ and update docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TimDettmers commented Feb 16, 2026

View reviewed changes

bitsandbytes/autograd/_functions.py Show resolved Hide resolved

This was referenced Feb 16, 2026

Fix GlobalOptimManager.override_config not propagating to optimizer #1869

Open

Fix Params4bit attribute access for FSDP state_dict traversal #1866

Open

TimDettmers force-pushed the deprecation branch from 49a768a to 62d6963 Compare February 16, 2026 17:21

matthewdouglas added this to the v0.50.0 milestone Feb 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions#1871

Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions#1871
TimDettmers wants to merge 1 commit intomainfrom
deprecation

TimDettmers commented Feb 16, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

TimDettmers left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

TimDettmers commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Removed: Legacy dynamic quantization functions

Removed: Non-blockwise 8-bit optimizer path

Removed: Percentile clipping

Removed: bitsandbytes.research module

Removed: SwitchBack linear layers and triton kernels

Deprecated: Dead MatmulLtState fields

Test and documentation cleanup

Breaking changes

Test plan

Uh oh!

github-actions bot commented Feb 16, 2026

Uh oh!

TimDettmers left a comment

Choose a reason for hiding this comment

PR Review: #1871 — Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions

Blocking issue (1):

Downstream Impact

Suggestions (non-blocking):

Cross-PR conflicts:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

TimDettmers commented Feb 16, 2026 •

edited

Loading

Removed: `bitsandbytes.research` module

Deprecated: Dead `MatmulLtState` fields