CATS

CATS: Customizable Abstractive Topic-based Summarization

This repository contains code related to the paper “CATS: Customizable Abstractive Topic-based Summarization” published at Transactions of Information Systems (TOIS) journal, 2021.

The code has been developed using python 2.7, and Tensorflow 1.4. This implementation is based on code releases related to Pointer-Generator Networks here and the TextSum project.

Dataset

Obtaining the non-anonymized CNN/DailyMail dataset Used in the Paper: In order to obtain the dataset, we encourage users to download and preprocess the dataset as described here. Furthermore, we use the exact same setting of chunked data.

Using Topic Information:

The LDA models used in our paper can be obtained from here. The current code release has been tested with the 150 topics pre-trained LDA model. You can make a reference to one of the provided LDA topic models in data.py in the TopicModel class.

Train

In order to train the model you may run:

python run_summarization.py --mode=train --data_path=/path/to/chunked/train_* --vocab_path=/path/to/vocab --log_root=/path/to/a/log/directory --exp_name=myexperiment

This will create a subdirectory of your specified log_root called myexperiment where all checkpoints will be saved. Then the model will start training using the train_*.bin files as training data.

Decoding

As stated in the paper, no topic information were used at test time. In order to decode without topic information, we used the pointer-generator basic model code here. After downloading the code, you may decode using:

python run_summarization.py --mode=decode --data_path=/path/to/chunked/val_* --vocab_path=/path/to/vocab --log_root=/path/to/a/log/directory --exp_name=myexperiment

Please note that one should run the above command using the same settings entered for the training job (plus any decode mode specific flags like beam_size).

This will repeatedly load random examples from your specified datafile and generate a summary using beam search. The results will be printed to screen.

If you would like to run evaluation on the entire validation or test set and obtain ROUGE scores, set the flag single_pass=1. This will go through the entire dataset in the same order, writing the generated summaries to file, and then running evaluation using pyrouge.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
__init__.py		__init__.py
attention_decoder.py		attention_decoder.py
batcher.py		batcher.py
beam_search.py		beam_search.py
data.py		data.py
decode.py		decode.py
inspect_checkpoint.py		inspect_checkpoint.py
model.py		model.py
run_summarization.py		run_summarization.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

init.py

init.py

attention_decoder.py

attention_decoder.py

batcher.py

batcher.py

beam_search.py

beam_search.py

data.py

data.py

decode.py

decode.py

inspect_checkpoint.py

inspect_checkpoint.py

model.py

model.py

run_summarization.py

run_summarization.py

util.py

util.py

Repository files navigation

CATS

CATS: Customizable Abstractive Topic-based Summarization

Dataset

Using Topic Information:

Train

Decoding

About

Releases

Packages

Languages

License

ali-bahrainian/CATS

Folders and files

Latest commit

History

Repository files navigation

CATS

CATS: Customizable Abstractive Topic-based Summarization

Dataset

Using Topic Information:

Train

Decoding

About

Resources

License

Stars

Watchers

Forks

Languages