This document presents step-by-step instructions for auto-round.
The mbpp in huggingface is adopted as the default calibration data and will be downloaded automatically from the datasets Hub. To customize a dataset, please kindly follow our dataset code. See more about loading huggingface dataset
Enter into the examples folder
- Default Settings:
CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name Salesforce/codegen25-7b-multi --amp --bits 4 --group_size -1 --enable_minmax_tuning --use_quant_input
- Reduced GPU Memory Usage and Adjusted Training Batch Size:
CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name Salesforce/codegen25-7b-multi --amp --bits 4 --group_size -1 --low_gpu_mem_usage --train_bs 1 --gradient_accumulate_steps 8
-
Utilizing the AdamW Optimizer: Include the flag
--adam
. Note that AdamW is less effective than Sign gradient descent in many scenarios we tested. -
Running the Original SignRound:
CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name Salesforce/codegen25-7b-multi --amp --bits 4 --group_size -1 --iters 400 --lr 0.0025 --minmax_lr 0.0025
--enable_minmax_tuning
is strongly recommended
Please follow https://github.com/bigcode-project/bigcode-evaluation-harness to eval the model, currently we only support fake model evaluation.
Model | Method | HumanEval top1 t=0.2 n_samples=20 |
---|---|---|
Salesforce/codegen25-7b-multi | FP16 | 0.2854 |
AutoRound use_quant_input=False | 0.2841 |
If you find SignRound useful for your research, please cite our paper:
@article{cheng2023optimize,
title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs},
author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao},
journal={arXiv preprint arXiv:2309.05516},
year={2023}
}