Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions#1871
Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions#1871TimDettmers wants to merge 1 commit intomainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
TimDettmers
left a comment
There was a problem hiding this comment.
PR Review: #1871 — Remove deprecated APIs: research module, non-blockwise optimizers, and legacy quantization functions
Classification: Deprecation/Removal (major)
Size: Very large (43 files, +32 / -3191 lines), but almost entirely deletions of already-deprecated code
Author: TimDettmers (maintainer)
Comprehensive removal of all remaining deprecated symbols that have been emitting FutureWarning since v0.45.0 (December 2024). This is the third and final cleanup round following v0.47.0 and v0.49.0. The scope is well-documented in the PR body, CI is fully green across all platforms, and the commit is clean.
Blocking issue (1):
1. Removing MatmulLtState.CxB will crash TGI and vLLM at runtime
The PR removes the CxB, CxBt, and formatB attributes from the MatmulLtState dataclass. While these are indeed dead within bitsandbytes itself (never assigned a non-None value since v0.45.0), both TGI and vLLM still access state.CxB in their 8-bit inference forward paths:
TGI (server/text_generation_server/layers/bnb.py):
if self.state.CB is not None and self.state.CxB is not None:
del self.state.CB
self.weight.data = self.state.CxBvLLM (vllm/model_executor/layers/quantization/bitsandbytes.py):
and matmul_states[i].CxB is not None
):
del matmul_states[i].CB
qweight[offsets[i]:offsets[i+1]] = matmul_states[i].CxBToday, CxB is always None so these branches never execute, but the attribute access itself (state.CxB) will raise AttributeError after this PR. Both projects would crash on any 8-bit inference call.
Recommendation: Keep CxB, CxBt, and formatB as deprecated stub attributes on MatmulLtState (e.g. CxB: Optional[torch.Tensor] = None # Deprecated: always None, kept for downstream compat), or coordinate removal with TGI and vLLM. Alternatively, override __getattr__ to return None for these removed attributes. This is the only change needed — the rest of the PR is clean.
Downstream Impact
Risk level: MEDIUM (upgradable to LOW if CxB is retained)
Affected APIs:
F.quantize(),F.dequantize(),F.quantize_no_absmax(),F.dequantize_no_absmax()— not used by any downstream projectF.optimizer_update_8bit()— not used by any downstream projectF.percentile_clipping()— not used by any downstream projectpercentile_clippingandblock_wiseoptimizer constructor params — not passed by Transformers trainerbitsandbytes.research.*— not used by any downstream projectbnb.nn.SwitchBackLinear*,bnb.nn.StandardLinear— not used by any downstream projectMatmulLtState.CxB— accessed by TGI and vLLM (see blocking issue)MatmulLtState.formatB— not accessed by any downstream projectMatmulLtState._tile_indices/tile_indicesproperty — not accessed by any downstream project
Affected projects:
- Transformers: Not affected. Does not use any removed APIs. Optimizer calls do not pass
percentile_clippingorblock_wise. - PEFT: Not affected. Does not use any removed APIs.
- Accelerate: Not affected. Does not use any removed APIs.
- TGI: Affected — accesses
state.CxBin 8-bit forward path. Will crash withAttributeError. - vLLM: Affected — accesses
matmul_states[i].CxBin 8-bit forward path. Will crash withAttributeError.
Recommendation: Retain CxB (and optionally CxBt, formatB) as deprecated stubs on MatmulLtState for at least one release cycle, then coordinate removal with TGI/vLLM. The rest of the removal is safe to merge.
Suggestions (non-blocking):
-
LAMB/LARS behavior change: The PR description notes that LAMB and LARS previously defaulted to
block_wise=Falseand now use blockwise quantization. This is a subtle behavioral change for users ofLAMB8bitandLARS8bit— the numerics will differ. The PR description documents this as a breaking change, which is appropriate. Consider adding a note to the release notes that LAMB8bit/LARS8bit users may see different training dynamics. -
test_int8_double_quantmarker fix: Good catch removing the incorrect@pytest.mark.deprecatedmarker from a test that covers live functionality. This ensures it runs in the default test suite.
Cross-PR conflicts:
-
PR #1869 (Fix GlobalOptimManager.override_config): Touches
bitsandbytes/optim/optimizer.pyandtests/test_optim.py. The changes are in different sections ofget_config()— #1869 adds apid2configfallback, while this PR removespercentile_clipping/block_wiselines. Trivial merge conflict, easily resolved. -
PR #1861 (Fix AdEMAMix scheduler guard): Touches
bitsandbytes/optim/ademamix.pyandbitsandbytes/optim/optimizer.py. Changest_alpha/t_beta3defaults inget_config(), while this PR removespercentile_clipping/block_wisefrom the same method. Trivial merge conflict. -
Recommend merging #1869 and #1861 first (smaller, independent fixes), then rebasing this PR.
- Security: Clear (maintainer PR, pure deletion of deprecated code)
- Downstream impact: MEDIUM (TGI/vLLM
CxBaccess — see blocking issue) - Tests: Adequate (deprecated tests removed, live tests retained,
test_int8_double_quantmarker fixed) - CI: All checks pass (builds, lint, CPU tests, CUDA tests across 6 GPU/CUDA configs, ROCm, XPU, Windows)
- Serialization: No impact (removed APIs do not affect
Params4bit,Int8Params,QuantState, or checkpoint formats) - Cross-PR conflicts: Trivial overlaps with #1869 and #1861 on
optimizer.py - Commit hygiene: Single well-structured commit with comprehensive description
The `MatmulLtState.CxB` attribute has been always `None` since bitsandbytes v0.45.0 (December 2024), when the col32/ColAmpere tensor layout system was removed. The conditional blocks that checked `state.CxB is not None` have therefore never executed. bitsandbytes is removing the `CxB` attribute entirely in an upcoming release (see bitsandbytes-foundation/bitsandbytes#1871), which would cause an AttributeError here. This commit removes the dead code proactively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The `MatmulLtState.CxB` attribute has been always `None` since bitsandbytes v0.45.0 (December 2024), when the col32/ColAmpere tensor layout system was removed. The conditional block that checked `matmul_states[i].CxB is not None` has therefore never executed. bitsandbytes is removing the `CxB` attribute entirely in an upcoming release (see bitsandbytes-foundation/bitsandbytes#1871), which would cause an AttributeError here. This commit removes the dead code proactively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…d legacy quantization functions Remove all remaining deprecated code that has been emitting FutureWarning since v0.45.0 (December 2024). Two prior cleanup rounds (v0.47.0, v0.49.0) already removed the easier items; this finishes the job. - Delete quantize(), dequantize(), quantize_no_absmax(), dequantize_no_absmax(), optimizer_update_8bit(), percentile_clipping(), and the str2optimizer8bit dispatch table from functional.py - Remove the non-blockwise 8-bit optimizer path from Optimizer2State and Optimizer1State; LAMB/LARS now use blockwise quantization - Remove percentile_clipping and block_wise parameters from all ~33 optimizer class constructors - Delete bitsandbytes/research/ (FP8 matmul, SwitchBack) - Delete bitsandbytes/nn/triton_based_modules.py, SwitchBackLinearBnb, and the orphaned bitsandbytes/triton/ kernel directory - Remove dead MatmulLtState fields (CxB, CxBt, formatB, _tile_indices) - Delete test_deprecated.py, test_triton.py; clean test_autograd.py, test_optim.py, test_functional.py - Remove benchmarking/switchback/ and update docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
49a768a to
62d6963
Compare
These APIs were formally deprecated in v0.45.0 (December 2024) as part of the LLM.int8() refactoring in PR #1401. They have been emitting
FutureWarningat runtime for over a year. Two prior cleanup rounds already removed the easier items: v0.47.0 removedarange,_mul,get_special_format_str,get_tensor_stream,pre_call/post_call, and the layout transform functions; v0.49.0 removedigemmlt,mm_dequant,double_quant,vectorwise_quant/vectorwise_dequant/vectorwise_mm_dequant,dequant_min_max,extract_outliers, andpipeline_test. This PR removes everything that remains.43 files changed, ~30 lines added, ~3,190 lines removed. All pre-commit hooks pass. The full removal was verified by importing the package and confirming every deleted symbol is gone while all live functions (
quantize_blockwise,dequantize_blockwise,optimizer_update_8bit_blockwise,int8_double_quant, etc.) remain intact.Removed: Legacy dynamic quantization functions
quantize(),dequantize(),quantize_no_absmax(), anddequantize_no_absmax()inbitsandbytes/functional.pyimplemented an older dynamic 8-bit quantization scheme that used a single global absmax value for the entire tensor. This approach was superseded by blockwise quantization (quantize_blockwise/dequantize_blockwise), which quantizes independent blocks of 256 elements each, significantly reducing the impact of outliers. The_no_absmaxvariants were purely internal — their only caller was thequantize()/dequantize()wrapper pair. The only production code that still called these was the research FP8 matmul module, which is also removed in this PR. The functions, their@deprecateddecorators, and thetyping_extensions.deprecatedimport are all deleted.Removed: Non-blockwise 8-bit optimizer path
optimizer_update_8bit()was the non-blockwise 8-bit optimizer update function. It dispatched to legacy C kernels (cadam_static_8bit_grad_32, etc.) that quantized the entire optimizer state with a single global max value. The modern replacement,optimizer_update_8bit_blockwise(), uses per-block absmax arrays (block size 256) for much better numerical accuracy. The function, itsstr2optimizer8bitC function dispatch table, and both call sites inOptimizer2State.update_step()andOptimizer1State.update_step()are removed. The non-blockwise state initialization code (scalarmax1/new_max1/max2/new_max2tensors) is also removed from bothinit_state()methods. LAMB and LARS previously defaulted toblock_wise=False; they now use blockwise quantization like every other optimizer. Theblock_wiseparameter itself is removed from all optimizer constructors since there is only one path now.Removed: Percentile clipping
percentile_clipping()tracked a rolling window of the last 100 gradient norms and clipped gradients at a user-specified percentile to improve training stability. It called into CUDA C kernels (cpercentile_clipping_g32/g16) that have no triton or multi-backend equivalent, making it a CUDA-only feature that couldn't be extended to other backends. The function was already decorated@deprecatedand emitting warnings. Thepercentile_clippingparameter is removed from all ~33 optimizer class constructors across 9 files (adam.py,adamw.py,adagrad.py,lamb.py,lars.py,lion.py,rmsprop.py,sgd.py,ademamix.py), the two base classes (Optimizer2State,Optimizer1State), theget_config()method, and the optimizer documentation.Removed:
bitsandbytes.researchmoduleThe research module contained experimental FP8 matmul implementations (
matmul_fp8_mixed,matmul_fp8_global) and their corresponding autograd classes (MatMulFP8Mixed,MatMulFP8Global), plus theswitchback_bnbfunction andSwitchBackBnbautograd class. These were introduced in v0.38.1 as research prototypes for fake-FP8 quantized training and Int8 SwitchBack layers. They were never promoted to stable API, and their tests were already marked@pytest.mark.deprecatedand excluded from the default test suite (some were also@pytest.mark.skip). The entirebitsandbytes/research/directory is deleted, and theresearchimport is removed frombitsandbytes/__init__.py.Removed: SwitchBack linear layers and triton kernels
SwitchBackLinearBnbinbitsandbytes/nn/modules.pywas a non-triton SwitchBack layer that calledbnb.matmul_mixed()— a function that doesn't exist in the public namespace, making the class effectively broken.SwitchBackLinear,SwitchBackLinearGlobal,SwitchBackLinearVectorwise, andStandardLinearinbitsandbytes/nn/triton_based_modules.pywere triton-based SwitchBack variants. All are deleted along with their exports frombitsandbytes/nn/__init__.py. Thebitsandbytes/triton/directory contained the underlying triton kernels (int8_matmul_mixed_dequantize,quantize_rowwise, etc.); with all SwitchBack consumers removed, these kernels are orphaned and are also deleted. Thebenchmarking/switchback/directory is removed as well.Deprecated: Dead
MatmulLtStatefieldsCxB,CxBt,formatB, and_tile_indiceson theMatmulLtStateclass were vestiges of the old col32/ColAmpere tensor layout system that was removed in the v0.45.0 int8 refactoring. These fields have been alwaysNone(or unused) since then. They are now deprecated via__getattr__: accessing them returnsNoneand emits aFutureWarning. They will be fully removed in the next bitsandbytes release. A stale comment referencingCxBinMatMul4Bitis cleaned up.Both TGI and vLLM access
state.CxBin their 8-bit inference paths. PRs have been opened to remove this dead code from both projects:Test and documentation cleanup
tests/test_deprecated.py(4 tests coveringquantize/dequantize, percentile clipping, FP8 matmul, and FP8 linear) andtests/test_triton.py(1 test covering SwitchBackLinear) are deleted entirely.test_adam_percentile_clippingis removed fromtest_optim.py. Theswitchback_bnbparametrization is removed fromtest_matmulltintest_autograd.py. Staleblock_wise=Trueandblock_wise=Falsekwargs are cleaned from optimizer test constructors.test_int8_double_quantintest_functional.pywas incorrectly marked@pytest.mark.deprecateddespite testing the liveint8_double_quantfunction — the marker is removed so it runs in the default suite. Thedeprecatedpytest marker definition is removed frompyproject.toml, anddocs/source/optimizers.mdxis updated to remove the percentile clipping example.Breaking changes
This is a breaking change for anyone who:
F.quantize(),F.dequantize(),F.percentile_clipping(), orF.optimizer_update_8bit()directlybnb.research.*APIbnb.nn.SwitchBackLinear*orbnb.nn.SwitchBackLinearBnbpercentile_clipping=orblock_wise=to any optimizer constructorLAMB8bitorLARS8bitand relied on the non-blockwise defaultAll of these have been emitting
FutureWarningsince v0.45.0 (December 2024), or were already non-functional.Test plan
pre-commit run --all-filespasses (ruff, ruff-format, typos, clang-format, trailing whitespace, etc.)import bitsandbytes as bnb)percentile_clippingorblock_wise🤖 Generated with Claude Code