Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -292,12 +292,12 @@ model.save(quant_path)

### Quantization using GPTAQ (Experimental, not MoE compatible, and results may not be better than v1)

Enable GPTAQ quantization by setting `v2 = True`.
Enable GPTAQ quantization by setting `gptaq = True`.
```py
# Note v2 is currently experimental, not MoE compatible, and requires 2-4x more vram to execute
# We have many reports of v2 not working better or exceeding v1 so please use for testing only
# Note GPTAQ is currently experimental, not MoE compatible, and requires 2-4x more vram to execute
# We have many reports of GPTAQ not working better or exceeding GPTQ so please use for testing only
# If oom on 1 gpu, please set CUDA_VISIBLE_DEVICES=0,1 to 2 gpu and gptqmodel will auto use second gpu
quant_config = QuantizeConfig(bits=4, group_size=128, v2=True)
quant_config = QuantizeConfig(bits=4, group_size=128, gptaq=True)
```
`Llama 3.1 8B-Instruct` quantized using `test/models/test_llama3_2.py`

Expand Down