KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

PyTorch implementation of the paper KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation accepted at NeurIPS 2022. This repository is adapted from the awesome gpt-neox library.

Important Changes and Information

This repository was developed based on commit 450b58c4ad7f36c319ca0b2f089c7349f34d8c3b of gpt-neox. We bump it to commit 738b87e73775e2cef4ea0a898b655f5d717cb8a0 to include some (irrelevant to this project) bug fixes. We only keep the main branch.
We remove the .github/ folder as it is not needed in our experiments.
The original gpt-neox readme is renamed as README_gpt_neox.md.
The config files used in our experiments are stored in kerple_configs/.
The two proposed positional embeddings are called ParallelKerplePower and ParallelKerpleLog in this repository. A simple grep will point you to our implementation.

Installation

Please refer to the original readme README_gpt_neox.md for details. We use the Host Setup without fused kernels.

Data Preparation

Warning: These datasets are huge! Please make sure you have at least 250 GB of disk space before download them all.

We use the three preconfigured datasets in the orignal gpt-neox repository:

python prepare_data.py -d ./data openwebtext2
python prepare_data.py -d ./data arxiv
python prepare_data.py -d ./data github

Please refer to the original readme README_gpt_neox.md for details.

Config Preparation

python generate_ymls.py

Training

bash train.sh

Testing

bash test.sh

Pretrained Models

We release 6 pretrained checkpoints: kerple_log and kerple_power pretrained on the above three datasets.

Please navigate to Releases to download the checkpoints.
You can right click on the filename, copy link address, and use wget to download it directly in a command line environment.
Once the files are downloaded, unzip them and leave them in the current directory.
Run test.sh, and the extrapolation performance should be very close to the numbers reported in Table 3 of the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
eval_tasks		eval_tasks
kerple_configs		kerple_configs
megatron		megatron
requirements		requirements
tests		tests
tools		tools
.clang-format		.clang-format
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CODEOWNERS		CODEOWNERS
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README_gpt_neox.md		README_gpt_neox.md
deepy.py		deepy.py
evaluate.py		evaluate.py
generate.py		generate.py
generate_ymls.py		generate_ymls.py
prepare_data.py		prepare_data.py
test.sh		test.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Important Changes and Information

Installation

Data Preparation

Config Preparation

Training

Testing

Pretrained Models

About

Releases 1

Packages

Languages

License

chijames/KERPLE

Folders and files

Latest commit

History

Repository files navigation

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Important Changes and Information

Installation

Data Preparation

Config Preparation

Training

Testing

Pretrained Models

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages