This is the code to replicate the instruction tuning experiments in the paper Scaling Sparse Fine-Tuning to Large Language Models. [cite]
For our Sparse Fine-Tuning (SFT) implementation based on the Hugging Face library, please visit peft.
Important: This requires our PEFT implementation and will not work with HuggingFace PEFT!
First, install the Python libraries and initialise the peft submodule.
You can set SFT_EXPERIMENT_DIR
to your preferred path for storing models and results.
Due to an issue with flash-attn
you have to install it separately after installing the other requirements.
Note that flash-attn
requires CUDA 11.6 or higher and a torch version that supports CUDA 11.6 or higher.
pip install -r requirements.txt
pip install flash-attn==2.2.2
git submodule update --init --recursive
cd peft
python setup.py develop
export SFT_EXPERIMENT_DIR=./results
Next, prepare train and eval data.
Note that our original experiments were run based on the Flan v2 50K sub-mixture stored on Beaker, which now requires authorisation for access. Hence, we now rely on an unofficial snapshot from the Hugging Face Hub.
./scripts/prepare_train_data.sh
./scripts/prepare_eval_data.sh
Note: The training and evaluation scripts use the LLama-2 models. Make sure you have access to the models. You can request access here.
To fine-tune an LLM with PEFT, run the following command.
You can specify your preferred LLM, PEFT method, quantization, and hyper-parameters inside the script file.
./scripts/finetune_peft_with_accelerate.sh
Finally, you can run evaluation on all benchmarks (MMLU, BBH-Hard, GSM, TyDiQA, Codex-HumanEval) with the following script. Remember to specify the path to the trained PEFT parameters as PEFT
and set the desired quantisation inside the script.
./scripts/eval_all.sh
Our code and setup for the instruction tuning experiments builds on open-instruct.
Please use the following snippet to cite our work.
@misc{ansell2024scaling,
title={Scaling Sparse Fine-Tuning to Large Language Models},
author={Alan Ansell and Ivan Vulić and Hannah Sterz and Anna Korhonen and Edoardo M. Ponti},
year={2024},
eprint={2401.16405},
archivePrefix={arXiv},
primaryClass={cs.CL}
}