Run pretrained ELMo model to get single sentence perplexity.
Modified from Allenai bilm-tf
pip install tensorflow-gpu==1.2 h5py
python setup.py install
-
Data file format: each line in a file is a sentence to calculate perplexity. data
-
Split data file into pieces, one sentence per piece.
cd data
split sents.txt -d -l 1 -a 4 cs
- Run the evaluation script.
sh evaluate.sh
- The perplexity score is shown in stdout
...
5946: 129.57085
5947: 1412.2032
5948: 5172.711
5949: 2126.5542
...
Sentence line number followed by the perplxity (unnormalized by sentence length)
To finetune the ELMo on additional corpus, first downlaod the pretrained model to models/
The tensorflow checkpoint is available by downloading these files:
vocabulary checkpoint options 1 2 3
| |--models |--vocab-2016-09-10.txt |--checkpoint |--checkpoint |--options.json |--model.ckpt-935588.meta |--model.ckpt-935588.index |--model.ckpt-935588.data-00000-of-00001
Then use the following script.
sh finetune.sh
After finetuning the model, you can run the evaluation again to see the finetune effect.