Context Length Extension via Generalized Extrapolation Scale

Generalized extrapolatioN scalE (GeNE), a straightforward and effective method applied to the interpolate function of positional embeddings to achieve training short, test long. Experimental results show that GeNE notably improves long context language modeling. By randomly scaling the extrapolation ratio during the finetuning, GeNE achieves stable extrapolation on 64k contexts by training on 16k length.

Data

We uploaded our training and test data to the Google cloud, and these files can be downloaded from this link and need to be manually placed in the data directory

Model

We've released our models on huggingface that are finetuned on $\leq$ 16k length of text.

Model	Link
llama2-gene-64k-base	Huggingface 🤗
llama2-gene-64k-chat	Huggingface 🤗

Usage

Take the example of finetuning Llama2 checkpoint, use the following command:

torchrun --nproc_per_node 8 finetune.py --training_config_path ./configs/ptrain_16k_64k.yaml

Custom training can be achieved by modifying the config files in configs.

Run the following command to evaluate the PPL of the model:

cd ./evaluation
deepspeed --num_gpus 8 ppl.py --model_path path-to-your-checkpoint --tokenizer_path path-to-tokenizer

To evaluate the accuracy of passkey retrieval, use the following command:

deepspeed --num_gpus 4 passkey.py --model_path path-to-your-checkpoint --tokenizer_path path-to-tokenizer

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
configs		configs
evaluation		evaluation
gene		gene
README.md		README.md
dataset.py		dataset.py
finetune.py		finetune.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Length Extension via Generalized Extrapolation Scale

Data

Model

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Context Length Extension via Generalized Extrapolation Scale

Data

Model

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages