Skip to content

Shuailong/bilm-tf

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

90 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Use ELMo as a Language Model

Purpose

Run pretrained ELMo model to get single sentence perplexity.

Modified from Allenai bilm-tf

Installation

pip install tensorflow-gpu==1.2 h5py
python setup.py install

Run Evaluation

  1. Data file format: each line in a file is a sentence to calculate perplexity. data

  2. Split data file into pieces, one sentence per piece.

cd data
split sents.txt -d -l 1 -a 4 cs
  1. Run the evaluation script.
sh evaluate.sh
  1. The perplexity score is shown in stdout
...
5946: 129.57085
5947: 1412.2032
5948: 5172.711
5949: 2126.5542
...

Sentence line number followed by the perplxity (unnormalized by sentence length)

Finetune

To finetune the ELMo on additional corpus, first downlaod the pretrained model to models/

The tensorflow checkpoint is available by downloading these files:

vocabulary checkpoint options 1 2 3

|
|--models
    |--vocab-2016-09-10.txt
    |--checkpoint
            |--checkpoint
            |--options.json
            |--model.ckpt-935588.meta
            |--model.ckpt-935588.index
            |--model.ckpt-935588.data-00000-of-00001

Then use the following script.

sh finetune.sh

After finetuning the model, you can run the evaluation again to see the finetune effect.

About

SenMaking ELMo baseline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.2%
  • Dockerfile 1.1%
  • Shell 0.7%