Quantize HuggingFace models and evaluate precision using llm-compressor and lm-eval.
| Scheme | Description | Calibration |
|---|---|---|
nvfp4 |
NVIDIA FP4 | Required |
fp8-group |
Per-group FP8 (E4M3), group_size=64 | Not needed |
./setup.sh
source .venv/bin/activatepython quantize.py --model deepseek-ai/DeepSeek-V2-Lite --scheme fp8-group
python quantize.py --model deepseek-ai/DeepSeek-V2-Lite --scheme nvfp4Compare baseline vs quantized on standard benchmarks:
python eval.py \
--baseline deepseek-ai/DeepSeek-V2-Lite \
--quantized DeepSeek-V2-Lite-FP8-Group \
--tasks gsm8k \
--limit 100./publish.sh carlyou/DeepSeek-V2-Lite-FP8-Group DeepSeek-V2-Lite-FP8-Group| Model | Scheme | Base Model |
|---|---|---|
| carlyou/DeepSeek-V2-Lite-FP8-Group | fp8-group | deepseek-ai/DeepSeek-V2-Lite |
| carlyou/DeepSeek-V2-Lite-NVFP4 | nvfp4 | deepseek-ai/DeepSeek-V2-Lite |
| carlyou/DeepSeek-Coder-V2-Lite-Instruct-NVFP4 | nvfp4 | deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct |