Skip to content

ducdauge/sft-llm

Repository files navigation

Scaling Sparse Fine-Tuning to Large Language Models

This is the code to replicate the instruction tuning experiments in the paper Scaling Sparse Fine-Tuning to Large Language Models. [cite]

For our Sparse Fine-Tuning (SFT) implementation based on the Hugging Face library, please visit peft.

Important: This requires our PEFT implementation and will not work with HuggingFace PEFT!

Sparse Fine Tuning Phases

Setup

First, install the Python libraries and initialise the peft submodule.

You can set SFT_EXPERIMENT_DIR to your preferred path for storing models and results.

Due to an issue with flash-attn you have to install it separately after installing the other requirements. Note that flash-attn requires CUDA 11.6 or higher and a torch version that supports CUDA 11.6 or higher.

pip install -r requirements.txt
pip install flash-attn==2.2.2 
git submodule update --init --recursive
cd peft
python setup.py develop
export SFT_EXPERIMENT_DIR=./results

Next, prepare train and eval data.

Note that our original experiments were run based on the Flan v2 50K sub-mixture stored on Beaker, which now requires authorisation for access. Hence, we now rely on an unofficial snapshot from the Hugging Face Hub.

./scripts/prepare_train_data.sh
./scripts/prepare_eval_data.sh

Train

Note: The training and evaluation scripts use the LLama-2 models. Make sure you have access to the models. You can request access here.

To fine-tune an LLM with PEFT, run the following command.

You can specify your preferred LLM, PEFT method, quantization, and hyper-parameters inside the script file.

./scripts/finetune_peft_with_accelerate.sh

Eval

Finally, you can run evaluation on all benchmarks (MMLU, BBH-Hard, GSM, TyDiQA, Codex-HumanEval) with the following script. Remember to specify the path to the trained PEFT parameters as PEFT and set the desired quantisation inside the script.

./scripts/eval_all.sh

Acknowledgements

Our code and setup for the instruction tuning experiments builds on open-instruct.

Citation

Please use the following snippet to cite our work.

@misc{ansell2024scaling,
      title={Scaling Sparse Fine-Tuning to Large Language Models}, 
      author={Alan Ansell and Ivan Vulić and Hannah Sterz and Anna Korhonen and Edoardo M. Ponti},
      year={2024},
      eprint={2401.16405},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Scaling Sparse Fine-Tuning to Large Language Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published