Generalized extrapolatioN scalE (GeNE), a straightforward and effective method applied to the interpolate function of positional embeddings to achieve training short, test long. Experimental results show that GeNE notably improves long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length.
We uploaded our training and test data to the Google cloud, and these files can be downloaded from this link and need to be manually placed in the data directory
We've released our models on huggingface that are finetuned on
| Model | Link |
|---|---|
| llama2-gene-64k-base | Huggingface 🤗 |
| llama2-gene-64k-chat | Huggingface 🤗 |
Take the example of finetuning Llama2 checkpoint, use the following command:
torchrun --nproc_per_node 8 finetune.py --training_config_path ./configs/ptrain_16k_64k.yamlCustom training can be achieved by modifying the config files in configs.
Run the following command to evaluate the PPL of the model:
cd ./evaluation
deepspeed --num_gpus 8 ppl.py --model_path path-to-your-checkpoint --tokenizer_path path-to-tokenizerTo evaluate the accuracy of passkey retrieval, use the following command:
deepspeed --num_gpus 4 passkey.py --model_path path-to-your-checkpoint --tokenizer_path path-to-tokenizer