Skip to content

Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions#1871

Open
TimDettmers wants to merge 1 commit intomainfrom
deprecation
Open

Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions#1871
TimDettmers wants to merge 1 commit intomainfrom
deprecation

Conversation

@TimDettmers
Copy link
Collaborator

@TimDettmers TimDettmers commented Feb 16, 2026

These APIs were formally deprecated in v0.45.0 (December 2024) as part of the LLM.int8() refactoring in PR #1401. They have been emitting FutureWarning at runtime for over a year. Two prior cleanup rounds already removed the easier items: v0.47.0 removed arange, _mul, get_special_format_str, get_tensor_stream, pre_call/post_call, and the layout transform functions; v0.49.0 removed igemmlt, mm_dequant, double_quant, vectorwise_quant/vectorwise_dequant/vectorwise_mm_dequant, dequant_min_max, extract_outliers, and pipeline_test. This PR removes everything that remains.

43 files changed, ~30 lines added, ~3,190 lines removed. All pre-commit hooks pass. The full removal was verified by importing the package and confirming every deleted symbol is gone while all live functions (quantize_blockwise, dequantize_blockwise, optimizer_update_8bit_blockwise, int8_double_quant, etc.) remain intact.

Removed: Legacy dynamic quantization functions

quantize(), dequantize(), quantize_no_absmax(), and dequantize_no_absmax() in bitsandbytes/functional.py implemented an older dynamic 8-bit quantization scheme that used a single global absmax value for the entire tensor. This approach was superseded by blockwise quantization (quantize_blockwise / dequantize_blockwise), which quantizes independent blocks of 256 elements each, significantly reducing the impact of outliers. The _no_absmax variants were purely internal — their only caller was the quantize()/dequantize() wrapper pair. The only production code that still called these was the research FP8 matmul module, which is also removed in this PR. The functions, their @deprecated decorators, and the typing_extensions.deprecated import are all deleted.

Removed: Non-blockwise 8-bit optimizer path

optimizer_update_8bit() was the non-blockwise 8-bit optimizer update function. It dispatched to legacy C kernels (cadam_static_8bit_grad_32, etc.) that quantized the entire optimizer state with a single global max value. The modern replacement, optimizer_update_8bit_blockwise(), uses per-block absmax arrays (block size 256) for much better numerical accuracy. The function, its str2optimizer8bit C function dispatch table, and both call sites in Optimizer2State.update_step() and Optimizer1State.update_step() are removed. The non-blockwise state initialization code (scalar max1/new_max1/max2/new_max2 tensors) is also removed from both init_state() methods. LAMB and LARS previously defaulted to block_wise=False; they now use blockwise quantization like every other optimizer. The block_wise parameter itself is removed from all optimizer constructors since there is only one path now.

Removed: Percentile clipping

percentile_clipping() tracked a rolling window of the last 100 gradient norms and clipped gradients at a user-specified percentile to improve training stability. It called into CUDA C kernels (cpercentile_clipping_g32/g16) that have no triton or multi-backend equivalent, making it a CUDA-only feature that couldn't be extended to other backends. The function was already decorated @deprecated and emitting warnings. The percentile_clipping parameter is removed from all ~33 optimizer class constructors across 9 files (adam.py, adamw.py, adagrad.py, lamb.py, lars.py, lion.py, rmsprop.py, sgd.py, ademamix.py), the two base classes (Optimizer2State, Optimizer1State), the get_config() method, and the optimizer documentation.

Removed: bitsandbytes.research module

The research module contained experimental FP8 matmul implementations (matmul_fp8_mixed, matmul_fp8_global) and their corresponding autograd classes (MatMulFP8Mixed, MatMulFP8Global), plus the switchback_bnb function and SwitchBackBnb autograd class. These were introduced in v0.38.1 as research prototypes for fake-FP8 quantized training and Int8 SwitchBack layers. They were never promoted to stable API, and their tests were already marked @pytest.mark.deprecated and excluded from the default test suite (some were also @pytest.mark.skip). The entire bitsandbytes/research/ directory is deleted, and the research import is removed from bitsandbytes/__init__.py.

Removed: SwitchBack linear layers and triton kernels

SwitchBackLinearBnb in bitsandbytes/nn/modules.py was a non-triton SwitchBack layer that called bnb.matmul_mixed() — a function that doesn't exist in the public namespace, making the class effectively broken. SwitchBackLinear, SwitchBackLinearGlobal, SwitchBackLinearVectorwise, and StandardLinear in bitsandbytes/nn/triton_based_modules.py were triton-based SwitchBack variants. All are deleted along with their exports from bitsandbytes/nn/__init__.py. The bitsandbytes/triton/ directory contained the underlying triton kernels (int8_matmul_mixed_dequantize, quantize_rowwise, etc.); with all SwitchBack consumers removed, these kernels are orphaned and are also deleted. The benchmarking/switchback/ directory is removed as well.

Deprecated: Dead MatmulLtState fields

CxB, CxBt, formatB, and _tile_indices on the MatmulLtState class were vestiges of the old col32/ColAmpere tensor layout system that was removed in the v0.45.0 int8 refactoring. These fields have been always None (or unused) since then. They are now deprecated via __getattr__: accessing them returns None and emits a FutureWarning. They will be fully removed in the next bitsandbytes release. A stale comment referencing CxB in MatMul4Bit is cleaned up.

Both TGI and vLLM access state.CxB in their 8-bit inference paths. PRs have been opened to remove this dead code from both projects:

Test and documentation cleanup

tests/test_deprecated.py (4 tests covering quantize/dequantize, percentile clipping, FP8 matmul, and FP8 linear) and tests/test_triton.py (1 test covering SwitchBackLinear) are deleted entirely. test_adam_percentile_clipping is removed from test_optim.py. The switchback_bnb parametrization is removed from test_matmullt in test_autograd.py. Stale block_wise=True and block_wise=False kwargs are cleaned from optimizer test constructors. test_int8_double_quant in test_functional.py was incorrectly marked @pytest.mark.deprecated despite testing the live int8_double_quant function — the marker is removed so it runs in the default suite. The deprecated pytest marker definition is removed from pyproject.toml, and docs/source/optimizers.mdx is updated to remove the percentile clipping example.

Breaking changes

This is a breaking change for anyone who:

  • Called F.quantize(), F.dequantize(), F.percentile_clipping(), or F.optimizer_update_8bit() directly
  • Used any bnb.research.* API
  • Used bnb.nn.SwitchBackLinear* or bnb.nn.SwitchBackLinearBnb
  • Passed percentile_clipping= or block_wise= to any optimizer constructor
  • Used LAMB8bit or LARS8bit and relied on the non-blockwise default

All of these have been emitting FutureWarning since v0.45.0 (December 2024), or were already non-functional.

Test plan

  • pre-commit run --all-files passes (ruff, ruff-format, typos, clang-format, trailing whitespace, etc.)
  • Package imports cleanly (import bitsandbytes as bnb)
  • All deleted symbols confirmed absent; all live symbols confirmed present
  • Optimizer constructors no longer accept percentile_clipping or block_wise
  • CI test suite passes

🤖 Generated with Claude Code

@github-actions
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator Author

@TimDettmers TimDettmers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: #1871 — Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions

Classification: Deprecation/Removal (major)
Size: Very large (43 files, +32 / -3191 lines), but almost entirely deletions of already-deprecated code
Author: TimDettmers (maintainer)

Comprehensive removal of all remaining deprecated symbols that have been emitting FutureWarning since v0.45.0 (December 2024). This is the third and final cleanup round following v0.47.0 and v0.49.0. The scope is well-documented in the PR body, CI is fully green across all platforms, and the commit is clean.


Blocking issue (1):

1. Removing MatmulLtState.CxB will crash TGI and vLLM at runtime

The PR removes the CxB, CxBt, and formatB attributes from the MatmulLtState dataclass. While these are indeed dead within bitsandbytes itself (never assigned a non-None value since v0.45.0), both TGI and vLLM still access state.CxB in their 8-bit inference forward paths:

TGI (server/text_generation_server/layers/bnb.py):

if self.state.CB is not None and self.state.CxB is not None:
    del self.state.CB
    self.weight.data = self.state.CxB

vLLM (vllm/model_executor/layers/quantization/bitsandbytes.py):

and matmul_states[i].CxB is not None
):
    del matmul_states[i].CB
    qweight[offsets[i]:offsets[i+1]] = matmul_states[i].CxB

Today, CxB is always None so these branches never execute, but the attribute access itself (state.CxB) will raise AttributeError after this PR. Both projects would crash on any 8-bit inference call.

Recommendation: Keep CxB, CxBt, and formatB as deprecated stub attributes on MatmulLtState (e.g. CxB: Optional[torch.Tensor] = None # Deprecated: always None, kept for downstream compat), or coordinate removal with TGI and vLLM. Alternatively, override __getattr__ to return None for these removed attributes. This is the only change needed — the rest of the PR is clean.


Downstream Impact

Risk level: MEDIUM (upgradable to LOW if CxB is retained)

Affected APIs:

  • F.quantize(), F.dequantize(), F.quantize_no_absmax(), F.dequantize_no_absmax() — not used by any downstream project
  • F.optimizer_update_8bit() — not used by any downstream project
  • F.percentile_clipping() — not used by any downstream project
  • percentile_clipping and block_wise optimizer constructor params — not passed by Transformers trainer
  • bitsandbytes.research.* — not used by any downstream project
  • bnb.nn.SwitchBackLinear*, bnb.nn.StandardLinear — not used by any downstream project
  • MatmulLtState.CxBaccessed by TGI and vLLM (see blocking issue)
  • MatmulLtState.formatB — not accessed by any downstream project
  • MatmulLtState._tile_indices / tile_indices property — not accessed by any downstream project

Affected projects:

  • Transformers: Not affected. Does not use any removed APIs. Optimizer calls do not pass percentile_clipping or block_wise.
  • PEFT: Not affected. Does not use any removed APIs.
  • Accelerate: Not affected. Does not use any removed APIs.
  • TGI: Affected — accesses state.CxB in 8-bit forward path. Will crash with AttributeError.
  • vLLM: Affected — accesses matmul_states[i].CxB in 8-bit forward path. Will crash with AttributeError.

Recommendation: Retain CxB (and optionally CxBt, formatB) as deprecated stubs on MatmulLtState for at least one release cycle, then coordinate removal with TGI/vLLM. The rest of the removal is safe to merge.


Suggestions (non-blocking):

  • LAMB/LARS behavior change: The PR description notes that LAMB and LARS previously defaulted to block_wise=False and now use blockwise quantization. This is a subtle behavioral change for users of LAMB8bit and LARS8bit — the numerics will differ. The PR description documents this as a breaking change, which is appropriate. Consider adding a note to the release notes that LAMB8bit/LARS8bit users may see different training dynamics.

  • test_int8_double_quant marker fix: Good catch removing the incorrect @pytest.mark.deprecated marker from a test that covers live functionality. This ensures it runs in the default test suite.


Cross-PR conflicts:

  • PR #1869 (Fix GlobalOptimManager.override_config): Touches bitsandbytes/optim/optimizer.py and tests/test_optim.py. The changes are in different sections of get_config()#1869 adds a pid2config fallback, while this PR removes percentile_clipping/block_wise lines. Trivial merge conflict, easily resolved.

  • PR #1861 (Fix AdEMAMix scheduler guard): Touches bitsandbytes/optim/ademamix.py and bitsandbytes/optim/optimizer.py. Changes t_alpha/t_beta3 defaults in get_config(), while this PR removes percentile_clipping/block_wise from the same method. Trivial merge conflict.

  • Recommend merging #1869 and #1861 first (smaller, independent fixes), then rebasing this PR.


  • Security: Clear (maintainer PR, pure deletion of deprecated code)
  • Downstream impact: MEDIUM (TGI/vLLM CxB access — see blocking issue)
  • Tests: Adequate (deprecated tests removed, live tests retained, test_int8_double_quant marker fixed)
  • CI: All checks pass (builds, lint, CPU tests, CUDA tests across 6 GPU/CUDA configs, ROCm, XPU, Windows)
  • Serialization: No impact (removed APIs do not affect Params4bit, Int8Params, QuantState, or checkpoint formats)
  • Cross-PR conflicts: Trivial overlaps with #1869 and #1861 on optimizer.py
  • Commit hygiene: Single well-structured commit with comprehensive description

TimDettmers added a commit to TimDettmers/text-generation-inference that referenced this pull request Feb 16, 2026
The `MatmulLtState.CxB` attribute has been always `None` since
bitsandbytes v0.45.0 (December 2024), when the col32/ColAmpere
tensor layout system was removed. The conditional blocks that
checked `state.CxB is not None` have therefore never executed.

bitsandbytes is removing the `CxB` attribute entirely in an
upcoming release (see bitsandbytes-foundation/bitsandbytes#1871),
which would cause an AttributeError here. This commit removes the
dead code proactively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
TimDettmers added a commit to TimDettmers/vllm that referenced this pull request Feb 16, 2026
The `MatmulLtState.CxB` attribute has been always `None` since
bitsandbytes v0.45.0 (December 2024), when the col32/ColAmpere
tensor layout system was removed. The conditional block that
checked `matmul_states[i].CxB is not None` has therefore never
executed.

bitsandbytes is removing the `CxB` attribute entirely in an
upcoming release (see bitsandbytes-foundation/bitsandbytes#1871),
which would cause an AttributeError here. This commit removes the
dead code proactively.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d legacy quantization functions

Remove all remaining deprecated code that has been emitting FutureWarning
since v0.45.0 (December 2024). Two prior cleanup rounds (v0.47.0, v0.49.0)
already removed the easier items; this finishes the job.

- Delete quantize(), dequantize(), quantize_no_absmax(),
  dequantize_no_absmax(), optimizer_update_8bit(), percentile_clipping(),
  and the str2optimizer8bit dispatch table from functional.py
- Remove the non-blockwise 8-bit optimizer path from Optimizer2State and
  Optimizer1State; LAMB/LARS now use blockwise quantization
- Remove percentile_clipping and block_wise parameters from all ~33
  optimizer class constructors
- Delete bitsandbytes/research/ (FP8 matmul, SwitchBack)
- Delete bitsandbytes/nn/triton_based_modules.py, SwitchBackLinearBnb,
  and the orphaned bitsandbytes/triton/ kernel directory
- Remove dead MatmulLtState fields (CxB, CxBt, formatB, _tile_indices)
- Delete test_deprecated.py, test_triton.py; clean test_autograd.py,
  test_optim.py, test_functional.py
- Remove benchmarking/switchback/ and update docs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments