Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Data-efficient Neural Text Compression with Interactive Learning

In this project, we develop a general framework for Interactive Text Compression. We propose an interactive text compression model using active learning learning methods for data-efficient learning.

If you reuse this software, please use the following citation:

@inproceedings{tubiblio111696,
    title = {Data-efficient Neural Text Compression with Interactive Learning},
    author = {P.V.S., Avinesh and Meyer, Christian M.},
    publisher = {Association for Computational Linguistics},
    booktitle = {Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics},
    pages = {2543–-2554},
    month = june,
    year = {2019},
    location = {Minneapolis, USA},
}

Abstract: Neural sequence-to-sequence models have been successfully applied to text compression. However, these models were trained on huge automatically induced parallel corpora, which are only available for a few domains and tasks. In this paper, we propose a novel interactive setup to neural text compression that enables transferring a model to new domains and compression tasks with minimal human supervision. This is achieved by employing active learning, which intelligently samples from a large pool of unlabeled data. Using this setup, we can successfully adapt a model trained on small data of 40k samples for a headline generation task to a general text compression dataset at an acceptable compression quality with just 500 sampled instances annotated by a human.

Contact person: Avinesh P.V.S., first_name AT aiphes.tu-darmstadt.de, first_name.last_name AT gmail DOT com

http://www.ukp.tu-darmstadt.de/

http://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Processing Data

python tools/process_google.py
sh preprocess.sh

Run the text compression model Seq2Seq-Gen and Pointer-Gen

sh train.sh
sh test_msr.sh

Interactive Active learning Sampling

python active_learning.py

Evaluate Results

files2rouge syste_output.txt reference.txt

About

No description, website, or topics provided.

Resources

License

Releases

No releases published

Packages

No packages published