From df3fd75b458d29e437888f7b4679d4ab7292752e Mon Sep 17 00:00:00 2001
From: Guo Wei <guowei30@xiaomi.com>
Date: Mon, 17 Nov 2025 16:55:38 +0800
Subject: [PATCH] Update description and code about GPTAQ in README.md

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 85ce3127d..659106d49 100644
--- a/README.md
+++ b/README.md
@@ -292,12 +292,12 @@ model.save(quant_path)
 
 ### Quantization using GPTAQ (Experimental, not MoE compatible, and results may not be better than v1)
 
-Enable GPTAQ quantization by setting `v2 = True`.
+Enable GPTAQ quantization by setting `gptaq = True`.
 ```py
-# Note v2 is currently experimental, not MoE compatible, and requires 2-4x more vram to execute
-# We have many reports of v2 not working better or exceeding v1 so please use for testing only
+# Note GPTAQ is currently experimental, not MoE compatible, and requires 2-4x more vram to execute
+# We have many reports of GPTAQ not working better or exceeding GPTQ so please use for testing only
 # If oom on 1 gpu, please set CUDA_VISIBLE_DEVICES=0,1 to 2 gpu and gptqmodel will auto use second gpu
-quant_config = QuantizeConfig(bits=4, group_size=128, v2=True)
+quant_config = QuantizeConfig(bits=4, group_size=128, gptaq=True)
 ```
 `Llama 3.1 8B-Instruct` quantized using `test/models/test_llama3_2.py`