Skip to content
No description, website, or topics provided.
Python Perl Shell Emacs Lisp Smalltalk Ruby Other
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
files2rouge
onmt
tools
.gitignore
LICENSE.txt
NOTICE.txt
README.md
active_learning.py
floyd.yml
install.txt
preprocess.py
preprocess.sh
requirements.txt
setup.py
test_msr.sh
train.py
train.sh

README.md

Data-efficient Neural Text Compression with Interactive Learning

In this project, we develop a general framework for Interactive Text Compression. We propose an interactive text compression model using active learning learning methods for data-efficient learning.

If you reuse this software, please use the following citation:

@inproceedings{tubiblio111696,
    title = {Data-efficient Neural Text Compression with Interactive Learning},
    author = {P.V.S., Avinesh and Meyer, Christian M.},
    publisher = {Association for Computational Linguistics},
    booktitle = {Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics},
    pages = {2543–-2554},
    month = june,
    year = {2019},
    location = {Minneapolis, USA},
}

Abstract: Neural sequence-to-sequence models have been successfully applied to text compression. However, these models were trained on huge automatically induced parallel corpora, which are only available for a few domains and tasks. In this paper, we propose a novel interactive setup to neural text compression that enables transferring a model to new domains and compression tasks with minimal human supervision. This is achieved by employing active learning, which intelligently samples from a large pool of unlabeled data. Using this setup, we can successfully adapt a model trained on small data of 40k samples for a headline generation task to a general text compression dataset at an acceptable compression quality with just 500 sampled instances annotated by a human.

Contact person: Avinesh P.V.S., first_name AT aiphes.tu-darmstadt.de, first_name.last_name AT gmail DOT com

http://www.ukp.tu-darmstadt.de/

http://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Processing Data

python tools/process_google.py
sh preprocess.sh

Run the text compression model Seq2Seq-Gen and Pointer-Gen

sh train.sh
sh test_msr.sh

Interactive Active learning Sampling

python active_learning.py

Evaluate Results

files2rouge syste_output.txt reference.txt
You can’t perform that action at this time.