From df3fd75b458d29e437888f7b4679d4ab7292752e Mon Sep 17 00:00:00 2001 From: Guo Wei Date: Mon, 17 Nov 2025 16:55:38 +0800 Subject: [PATCH] Update description and code about GPTAQ in README.md --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 85ce3127d..659106d49 100644 --- a/README.md +++ b/README.md @@ -292,12 +292,12 @@ model.save(quant_path) ### Quantization using GPTAQ (Experimental, not MoE compatible, and results may not be better than v1) -Enable GPTAQ quantization by setting `v2 = True`. +Enable GPTAQ quantization by setting `gptaq = True`. ```py -# Note v2 is currently experimental, not MoE compatible, and requires 2-4x more vram to execute -# We have many reports of v2 not working better or exceeding v1 so please use for testing only +# Note GPTAQ is currently experimental, not MoE compatible, and requires 2-4x more vram to execute +# We have many reports of GPTAQ not working better or exceeding GPTQ so please use for testing only # If oom on 1 gpu, please set CUDA_VISIBLE_DEVICES=0,1 to 2 gpu and gptqmodel will auto use second gpu -quant_config = QuantizeConfig(bits=4, group_size=128, v2=True) +quant_config = QuantizeConfig(bits=4, group_size=128, gptaq=True) ``` `Llama 3.1 8B-Instruct` quantized using `test/models/test_llama3_2.py`