Skip to content

carlyou/llm-quant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-quant

Quantize HuggingFace models and evaluate precision using llm-compressor and lm-eval.

Supported Schemes

Scheme Description Calibration
nvfp4 NVIDIA FP4 Required
fp8-group Per-group FP8 (E4M3), group_size=64 Not needed

Setup

./setup.sh
source .venv/bin/activate

Quantize

python quantize.py --model deepseek-ai/DeepSeek-V2-Lite --scheme fp8-group
python quantize.py --model deepseek-ai/DeepSeek-V2-Lite --scheme nvfp4

Evaluate

Compare baseline vs quantized on standard benchmarks:

python eval.py \
  --baseline deepseek-ai/DeepSeek-V2-Lite \
  --quantized DeepSeek-V2-Lite-FP8-Group \
  --tasks gsm8k \
  --limit 100

Publish

./publish.sh carlyou/DeepSeek-V2-Lite-FP8-Group DeepSeek-V2-Lite-FP8-Group

Published Models

Model Scheme Base Model
carlyou/DeepSeek-V2-Lite-FP8-Group fp8-group deepseek-ai/DeepSeek-V2-Lite
carlyou/DeepSeek-V2-Lite-NVFP4 nvfp4 deepseek-ai/DeepSeek-V2-Lite
carlyou/DeepSeek-Coder-V2-Lite-Instruct-NVFP4 nvfp4 deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct

About

Quantize and evaluate LLMs with llm-compressor

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors