Scaling Sparse Fine-Tuning to Large Language Models

This is the code to replicate the instruction tuning experiments in the paper Scaling Sparse Fine-Tuning to Large Language Models. [cite]

For our Sparse Fine-Tuning (SFT) implementation based on the Hugging Face library, please visit peft.

Important: This requires our PEFT implementation and will not work with HuggingFace PEFT!

Setup

First, install the Python libraries and initialise the peft submodule.

You can set SFT_EXPERIMENT_DIR to your preferred path for storing models and results.

Due to an issue with flash-attn you have to install it separately after installing the other requirements. Note that flash-attn requires CUDA 11.6 or higher and a torch version that supports CUDA 11.6 or higher.

pip install -r requirements.txt
pip install flash-attn==2.2.2 
git submodule update --init --recursive
cd peft
python setup.py develop
export SFT_EXPERIMENT_DIR=./results

Next, prepare train and eval data.

Note that our original experiments were run based on the Flan v2 50K sub-mixture stored on Beaker, which now requires authorisation for access. Hence, we now rely on an unofficial snapshot from the Hugging Face Hub.

./scripts/prepare_train_data.sh
./scripts/prepare_eval_data.sh

Train

Note: The training and evaluation scripts use the LLama-2 models. Make sure you have access to the models. You can request access here.

To fine-tune an LLM with PEFT, run the following command.

You can specify your preferred LLM, PEFT method, quantization, and hyper-parameters inside the script file.

./scripts/finetune_peft_with_accelerate.sh

Eval

Finally, you can run evaluation on all benchmarks (MMLU, BBH-Hard, GSM, TyDiQA, Codex-HumanEval) with the following script. Remember to specify the path to the trained PEFT parameters as PEFT and set the desired quantisation inside the script.

./scripts/eval_all.sh

Acknowledgements

Our code and setup for the instruction tuning experiments builds on open-instruct.

Citation

Please use the following snippet to cite our work.

@misc{ansell2024scaling,
      title={Scaling Sparse Fine-Tuning to Large Language Models}, 
      author={Alan Ansell and Ivan Vulić and Hannah Sterz and Anna Korhonen and Edoardo M. Ponti},
      year={2024},
      eprint={2401.16405},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
ds_configs		ds_configs
eval		eval
finetune		finetune
media		media
peft @ 8f1b0e0		peft @ 8f1b0e0
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ds_configs

ds_configs

eval

eval

finetune

finetune

media

media

peft @ 8f1b0e0

peft @ 8f1b0e0

scripts

scripts

.gitignore

.gitignore

.gitmodules

.gitmodules

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Scaling Sparse Fine-Tuning to Large Language Models

Setup

Train

Eval

Acknowledgements

Citation

About

Releases

Packages

Contributors 3

Languages

License

ducdauge/sft-llm

Folders and files

Latest commit

History

Repository files navigation

Scaling Sparse Fine-Tuning to Large Language Models

Setup

Train

Eval

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Languages