Sources for Sentence Paraphraser project

Overview

The sources are intended to allow reproducing the experiments described in the project report

Setting up

Clone the repository:
git clone https://github.com/delkind/paraphraser.git
cd paraphraser
Run setup script
./setup.sh
Please note that before executing any scripts as per instructions below, the following command should be invoked to activate virtual environment:
source ./.env/bin/activate

Pre-trained universal embeddings (InferSent) experiment

Creating the paraphrases of the bible BBE corpus in the style of the YLT corpus

Download the pre-trained TCNN and LSTM based decoders and pre-built universal embeddings for Bible dataset by running
./dl_uni_emb_files.sh
To calculate BLEU score for both models for n random samples please run
./uni_emb_calc_bleu.sh --samples <n>
To emit the original sentences (GOLD) file please run
./uni_emb_create_gold.sh
To emit LSTM model predictions file please run
./uni_emb_lstm_pred.sh
To emit TCNN model predictions file please run
./uni_emb_tcnn_pred.sh

Re-building experiment models and embeddings

The instructions above assume usage of pre-trained models and pre-built embeddings in order to produce the predictions and evaluate the experiment results. Below we provide the instructions for re-building and re-training models and embeddings instead of using the pre-built ones.

Reproducing sentence embedding creation

Setup InferSent data files by running
./setup_infersent.sh
Install PyTorch - follow the instructions here. We haven't provided a script since the installation differs substantially depending on the platform.
Create embeddings from the YLT and BBE bible corpora by running
./create_uni_emb.sh
Verify that exp/uni_embed/embeddings.h5 file is created

Reproducing models training

We have experimented with the decoder based on LSTM and Temporal CNN architectures. To train the LSTM-based decoder run
./uni_emb_train_lstm.sh
To train the TCNN-based decoder run
./uni_emb_train_tcnn.sh
In order to specify the number of epochs for training --epochs <n> parameter can be specified to both scripts where n is the number of epochs. The default is to train for 10 epochs. The model is saved (and subsequently overwritten) after each epoch.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
InferSent @ 9f0ff72		InferSent @ 9f0ff72
keras-tcn @ 417f1d2		keras-tcn @ 417f1d2
notebooks		notebooks
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
create_uni_emb.sh		create_uni_emb.sh
dl_uni_emb_files.sh		dl_uni_emb_files.sh
requirements.txt		requirements.txt
setup.py		setup.py
setup.sh		setup.sh
setup_infersent.sh		setup_infersent.sh
uni_emb_calc_bleu.sh		uni_emb_calc_bleu.sh
uni_emb_create_gold.sh		uni_emb_create_gold.sh
uni_emb_lstm_pred.sh		uni_emb_lstm_pred.sh
uni_emb_tcnn_pred.sh		uni_emb_tcnn_pred.sh
uni_emb_train_lstm.sh		uni_emb_train_lstm.sh
uni_emb_train_tcnn.sh		uni_emb_train_tcnn.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sources for Sentence Paraphraser project

Overview

Setting up

Pre-trained universal embeddings (InferSent) experiment

Creating the paraphrases of the bible BBE corpus in the style of the YLT corpus

Re-building experiment models and embeddings

Reproducing sentence embedding creation

Reproducing models training

About

Releases

Packages

Contributors 2

Languages

License

delkind/paraphraser

Folders and files

Latest commit

History

Repository files navigation

Sources for Sentence Paraphraser project

Overview

Setting up

Pre-trained universal embeddings (InferSent) experiment

Creating the paraphrases of the bible BBE corpus in the style of the YLT corpus

Re-building experiment models and embeddings

Reproducing sentence embedding creation

Reproducing models training

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages