feat: add end-to-end vLLM W4A8+FP8 mixed quantization pipeline by liusong1222 · Pull Request #255 · Tencent/AngelSlim

liusong1222 · 2026-03-10T07:35:42Z

feat: add end-to-end vLLM W4A8+FP8 mixed quantization pipeline for DeepSeek V3

Major changes:

Add weight_quantize.py: standalone weight quantization module supporting
FP8 blockwise and INT4 symmetric per-group quantization with multi-process
parallel processing on safetensors files (no full HF model loading required)
Add VLLMCalibrateEngine in engine.py: unified engine for vLLM-based
calibration (activation/MoE stats collection) and weight quantization,
with support for skipping calibration when stats already exist
Add CalibrateConfig in config_parser.py: YAML-driven calibration config
(backend, tp_size, max_num_seqs, etc.) integrated into FullConfig
Add pack_weight_to_int8_gpu() in packing_utils.py: pure PyTorch
GPU-accelerated INT4→INT8 packing (no numpy dependency)
Add YAML config deepseek_r1_w4a8_fp8_vllm.yaml and shell script for
DeepSeek R1 W4A8+FP8 quantization workflow
Integrate vLLM calibrate path into tools/run.py via vllm_calibrate_run()
Delete standalone tools/run_vllm_calibrate.py (consolidated into engine)

…epSeek V3

…y directly

yghstill · 2026-03-10T11:02:40Z

+        print("\n" + "=" * 80)
+        print("Calibration completed successfully!")
+        print(f"Results saved to: {output_dir}")
+        print("=" * 80)


都统一采用print_info函数

yghstill · 2026-03-10T11:07:34Z

+
+        return {"activation_stats": activation_stats, "moe_stats": moe_stats}
+
+    def quantize(


quantize函数能否抽象到compressor/quant文件夹

quantize是一个engine中控制顶层流程的函数，感觉还是放到engine里会比较好；把quantize下的_moe_expert_stats_to_input_scales函数移到quant文件夹下了，精简了engine中quantize的代码数量。

yghstill · 2026-03-10T11:13:47Z



+@dataclass
+class CalibrateConfig:


CalibrateConfig能不能放在QuantizationConfig下，类似quant_method
-->calibrate_method

CalibrateConfig移到CompressionConfig下了，和QuantizationConfig并列，这样层级关系会不会更好一点

…unify logging

…nt#255)

liusong1222 added 2 commits March 10, 2026 15:27

feat: add end-to-end vLLM W4A8+FP8 mixed quantization pipeline for De…

9567357

…epSeek V3

simplify VLLMCalibrateEngine.prepare_data by calling DataLoaderFactor…

7aac533

…y directly

yghstill reviewed Mar 10, 2026

View reviewed changes

refactor: nest calibrate under compression, extract moe scale merge, …

db32e63

…unify logging

yghstill previously approved these changes Mar 10, 2026

View reviewed changes

fix prepare_data in engine

3f5de17

liusong1222 dismissed yghstill’s stale review via 3f5de17 March 10, 2026 13:55

rename vllm calibration yaml

59f5707

yghstill approved these changes Mar 11, 2026

View reviewed changes

liusong1222 merged commit 9985e28 into Tencent:main Mar 11, 2026
5 checks passed

dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026

feat: add end-to-end vLLM W4A8+FP8 mixed quantization pipeline (Tence…

3e28235

…nt#255)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add end-to-end vLLM W4A8+FP8 mixed quantization pipeline #255

feat: add end-to-end vLLM W4A8+FP8 mixed quantization pipeline #255
liusong1222 merged 5 commits into
Tencent:mainfrom
liusong1222:feature/vllm_calibrate

liusong1222 commented Mar 10, 2026

Uh oh!

yghstill Mar 10, 2026

Uh oh!

liusong1222 Mar 10, 2026

Uh oh!

yghstill Mar 10, 2026

Uh oh!

liusong1222 Mar 10, 2026

Uh oh!

yghstill Mar 10, 2026

Uh oh!

liusong1222 Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		return {"activation_stats": activation_stats, "moe_stats": moe_stats}

		def quantize(



		@dataclass
		class CalibrateConfig:

Conversation

liusong1222 commented Mar 10, 2026

Uh oh!

yghstill Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

liusong1222 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

yghstill Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

liusong1222 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

yghstill Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

liusong1222 Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants