Reformer TF2

Implementation of the Reformer in tf2

Dependancy

Tensorflow >= 2.3.0

Usage

use Docker >= 19.03

docker pull tensorflow/tensorflow:2.3.0-gpu
./docker_tf2.sh

Don't forget to change the mapping path with your own directory

download enwik8 dataset

The enwik8 data was downloaded from the Hutter prize page: here

Unicode code points are directly used as vocabulary index

set hyperprameters

model_tf = TFReformerLM(
    num_tokens= FLAGS.vocab_size,
    emb = FLAGS.embedding_size,
    depth = 6,   # batch 4 full attention 8 이면 안돌아감 
    max_seq_len = FLAGS.seq_len,
    heads = 8,
    lsh_dropout = 0.1,
    causal = True,        # auto-regressive or not
    bucket_size = 64,     # average size of qk per bucket, 64 was recommended in paper
    n_hashes = 4,         # 4 is permissible per author, 8 is the best but slower
    ff_chunks = 8,      # number of chunks for feedforward layer, make higher if there are memory issues
    weight_tie = True,   # tie parameters of each layer for no memory per additional depth
    attn_chunks = 8,        # process lsh attention in chunks, only way for memory to fit when scaling to 16k tokens
    use_full_attn = False   # use full self attention, for comparison

)

You only need Tesla V100 of 16G to run the reformer model with above settings on 4096 sequence length and batch size of 4

train

python main.py

the main.py code includes generation code and you can generate the sequence with generator/seq_generator.py

Reference

This code is based on

https://github.com/akanyaani/gpt-2-tensorflow2.0

https://github.com/cerebroai/reformers

https://github.com/lucidrains/reformer-pytorch

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
generator		generator
reformers		reformers
sentencepiece		sentencepiece
sketch		sketch
tokenizer		tokenizer
.gitignore		.gitignore
README.md		README.md
docker_tf2.sh		docker_tf2.sh
enwik8.gz		enwik8.gz
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generator

generator

reformers

reformers

sentencepiece

sentencepiece

sketch

sketch

tokenizer

tokenizer

.gitignore

.gitignore

README.md

README.md

docker_tf2.sh

docker_tf2.sh

enwik8.gz

enwik8.gz

main.py

main.py

Repository files navigation

Reformer TF2

Dependancy

Usage

use Docker >= 19.03

download enwik8 dataset

set hyperprameters

train

Reference

About

Languages

domyounglee/tf2-reformer

Folders and files

Latest commit

History

Repository files navigation

Reformer TF2

Dependancy

Usage

use Docker >= 19.03

download enwik8 dataset

set hyperprameters

train

Reference

About

Resources

Stars

Watchers

Forks

Languages