Requirements
- bert-base-uncased.30522.768d.vec (download here. Put bert-base-uncased.30522.768d.vec in the main directory.)
- Also, put source.txt and target.txt in the main directory.
Then, run the following:
python train.py --input==source.txt --output==target.txt
Since unknown tokens affect the result, we use an approach described in Unsupervised Sentence Compression using Denoising Auto-Encoders to generate rare words that are not in the vocabulary.