SMERTI for Semantic Text Exchange

Code for the SMERTI pipeline designed for Semantic Text Exchange, presented in Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange published at EMNLP-IJCNLP 2019. You can cite it as follows:

@inproceedings{feng-etal-2019-keep,
    title = "Keep Calm and Switch On! Preserving Sentiment and Fluency in Semantic Text Exchange",
    author = "Feng, Steven Y. and Li, Aaron W. and Hoey, Jesse",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov, year = "2019", address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1272",
    doi = "10.18653/v1/D19-1272", pages = "2701--2711",
}

Authors: Steven Y. Feng, Aaron W. Li, Jesse Hoey

Poster and other resources can be found here.

Note: inquiries should be directed to stevenyfeng@gmail.com or by opening an issue here.

Diagram of SMERTI pipeline, with ERM (Entity Replacement Module), SMM (Similarity Masking Module), and TIM (Text Infilling Module). Starting string on top left, ending string on top right, given replacement entity (RE) of *rainy*.

Datasets can be obtained from the following sources:

News Headlines
Yelp Reviews
Amazon Reviews (user review data (18gb))

Following is a brief description of each file:

Dataset_preprocessing is for preprocessing the reviews (e.g. Yelp and Amazon) and news headlines, generating training and testing splits, and writing them to .txt files. (Data Preprocessing and Preparation)
Dataset_masking is for generating masked versions of the training and testing data from step 1. for reviews and news headlines, and writing them to .txt files. (Data Preprocessing and Preparation)
RNN_training and Transformer_training are to train the TIM for SMERTI-RNN and SMERTI-Transformer, respectively, based on the .txt files generated from steps 1. and 2. above. (Text Infilling Module (TIM) Training)
RNN_final_pipeline and Transformer_final_pipeline are the final pipelines for SMERTI-RNN and SMERTI-Transformer, respectively. This includes the ERM, SMM, and TIM components of each pipeline variation. Also note that the bottom of each file includes a section to generate and write output to .txt files, based on the evaluation lines generated from step 5. below, to be used in metric calculations in step 6. (Final SMERTI Pipelines)
Evaluation_setup is to generate the evaluation lines for the datasets and includes choosing nouns, choosing evaluation lines per noun, and writing them to .txt files. (Evaluation)
Final_evaluation is to calculate metrics including SPA, BLEU, CSS, Perplexity, and SLOR using SMERTI's output from the evaluation lines, and includes functions to write results to .txt files. (Evaluation)
SLOR_normalization is to normalize all SLOR values calculated from step 6. above to a [0,1] interval. (Evaluation)

GenAug SMERTI-Transformer

"GenAug SMERTI-Transformer" folder contains the SMERTI training and inference code for GenAug, presented in GenAug: Data Augmentation for Finetuning Text Generators published at EMNLP 2020 DeeLIO Workshop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Preprocessing and Preparation

Data Preprocessing and Preparation

Evaluation

Evaluation

Final SMERTI Pipelines

Final SMERTI Pipelines

GenAug SMERTI-Transformer

GenAug SMERTI-Transformer

Text Infilling Module (TIM) Training

Text Infilling Module (TIM) Training

README.md

README.md

SMERTI_diagram.png

SMERTI_diagram.png

Repository files navigation

SMERTI for Semantic Text Exchange

Datasets can be obtained from the following sources:

Following is a brief description of each file:

GenAug SMERTI-Transformer

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Data Preprocessing and Preparation		Data Preprocessing and Preparation
Evaluation		Evaluation
Final SMERTI Pipelines		Final SMERTI Pipelines
GenAug SMERTI-Transformer		GenAug SMERTI-Transformer
Text Infilling Module (TIM) Training		Text Infilling Module (TIM) Training
README.md		README.md
SMERTI_diagram.png		SMERTI_diagram.png

styfeng/SMERTI

Folders and files

Latest commit

History

Repository files navigation

SMERTI for Semantic Text Exchange

Datasets can be obtained from the following sources:

Following is a brief description of each file:

GenAug SMERTI-Transformer

About

Topics

Resources

Stars

Watchers

Forks

Languages