From 1faee35db55f6dc6182e2d6b30dcdae2afb4fe34 Mon Sep 17 00:00:00 2001 From: Qubitium-ModelCloud Date: Thu, 30 Oct 2025 22:22:48 +0800 Subject: [PATCH] Refine GPT-QModel description in README Removed redundant information about quantization methods and improved clarity on supported methods. --- README.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/README.md b/README.md index f9a30b639..89b696f92 100644 --- a/README.md +++ b/README.md @@ -128,9 +128,7 @@ Fixed quantization of OPT and DeepSeek V2-Lite models. Fixed inference for DeepS ## What is GPT-QModel? GPT-QModel is a production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF Transformers, vLLM, and SGLang. -Public and ModelCloud's internal tests have shown that GPTQ is on-par and/or exceeds other 4bit quantization methods in terms of both quality recovery and production-level inference speed for token latency and rps. GPTQ has the optimal blend of quality and inference speed you need in a real-world production deployment. - -GPT-QModel not only supports GPTQ but also QQQ, GPTQv2, Eora with more quantization methods and enhancements planned. +GPT-QModel currently supports GPTQ, AWQ, QQQ, GPTAQ, EoRa, GAR with more quantization methods and enhancements planned. ## Quantization Support