Controllable Text Summarization Survey

This repository contains the controllable text summarization (CTS) survey papers and is based on our paper, "Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey"

You can cite our paper as the following

@misc{urlana2023controllable,
      title={Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey}, 
      author={Ashok Urlana and Pruthwik Mishra and Tathagato Roy and Rahul Mishra},
      year={2023},
      eprint={2311.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

We group the papers according to the controllable aspects as Length, Coverage, Style, Abstractivity, Salience, Entity, Topic, Role, Diversity, Structure.

Length

Paper	Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data	CNN Daily Mail, QMSum
Abstractive Document Summarization with Summary-length Prediction EACL-2023	CNNDM, NYT, WikiHow
Length Control in Abstractive Summarization by Pretraining Information Selection ACL-2022 code	CNN-DailyMail, XSUM
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization EMNLP-2022 code	DUC2004
A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization Neurips-2022 code	Gigaword, DUC2004
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code	CNNDM, arXiv, BIGPATENT
A New Approach to Overgenerating and Scoring Abstractive Summaries NAACL-2021 code data	Gigaword, Newsroom
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code	CNNDM, Newsroom, DUC-2002
Lenatten: An effective length controlling unit for text summarization ACL-2021 code	CNNDM
Interpretable multi headed attention for abstractive summarization at controllable lengths COLING-2020	MSR Narratives and Thinking-Machines
Positional Encoding to Control Output Sequence Length NAACL-2019 code	JAMUS corpus (Japanese) of different number of characters present in the summary
Global Optimization under Length Constraint for Neural Text Summarization ACL-2019	CNNDM, Mainichi
A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation INLG-2019 data	JAMUS corpus (Japanese) of different number of characters present in the summary
Controllable Abstractive Summarization ACL-NMT(W)-2018	CNN-DailyMail
Unsupervised Sentence Compression using Denoising Auto-Encoders CoNLL-2018 code	Gigaword
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network EMNLP-2018 code	CNNDM, DMQA
Controlling Output Length in Neural Encoder-Decoders EMNLP-2016 code	DUC2004, Gigaword
A Neural Attention Model for Abstractive Sentence Summarization EMNLP-2015	NYT, DUC2004

Coverage

Paper	Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data	CNN Daily Mail, QMSum
SWING : Balancing Coverage and Faithfulness for Dialogue Summarization EACL-2023 code	DIALOG-SUM, SAMSUM
Unsupervised Multi-Granularity Summarization EMNLP-2022 data	GranuDUC, MultiNews, DUC2004, Arxiv
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities NIPS-2022 code data	Multi-LexSum
Controllable Abstractive Dialogue Summarization with Sketch Supervision ALC-IJCNLP-2021 code	SAMSum
SemSUM: Semantic Dependency Guided Neural Abstractive Summarization AAAI-2020 data	Gigaword, DUC2004 and MSR abstractive summarization dataset
Get to the point: Summarization with pointer generator networks ACL-2017 code	CNNDM

Style

Paper	Datasets Used
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles ACL-BIoNLP(W)-2023	PLOS and eLife
Generating Summaries with Controllable Readability Levels EMNLP-2023 code	CNNDM
HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models EMNLP-2022 code	CNN Daily Mail, XSUM, Newsroom
Readability Controllable Biomedical Document Summarization EMNLP-2022 data	TS and PLS
Inference time style control for summarization NAACL-2021 code	CNNDM
Hooks in the Headline: Learning to Generate Headlines with Controlled Styles ACL-2020 code	NYT, CNN
Generating Formality-tuned Summaries Using Input-dependent Rewards CoNLL-2019	CNN Daily Mail + Webis-TLDR-17 corpus
Controllable Abstractive Summarization ACL-NMT(W)-2018	CNN-DailyMail

Abstractivity

Paper	Datasets Used
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code	CNNDM, Newsroom, DUC-2002
Controlling the Amount of Verbatim Copying in Abstractive Summarization AAAI-2020 code	Gigaword, Newsroom
Improving Abstraction in Text Summarization EMNLP-2018	CNNDM
Get to the point: Summarization with pointer generator networks ACL-2017 code	CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code	CNN/DM, DUC2002

Salience

Paper	Datasets Used
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection EACL-2023	CNNDM, XSUM, NYTimes
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code	QMSum and SQuALITY
Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network NAACL-HLT-2018	CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code	CNN/DM, DUC2002

Entity

Paper	Datasets Used
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code	QMSum and SQuALITY
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders AACL-2022	EntSum
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code	CNNDM, arXiv, BIGPATENT
ENTSUM: A Data Set for Entity-Centric Summarization ACL-2022 code data	CNNDM, NYT
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code	CNNDM, Newsroom, DUC-2002
Controllable Neural Dialogue Summarization with Personal Named Entity Planning EMNLP-2021 code	SAMSum
Controllable Abstractive Sentence Summarization with Guiding Entities COLING-2020 code	Gigaword, DUC2004
Controllable Abstractive Summarization ACL-NMT(W)-2018	CNN-DailyMail

Topic

Paper	Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data	CNN Daily Mail, QMSum
Topic-aware Multimodal Summarization AACL-2022 code data	MSMO
NEWTS: A Corpus for News Topic-Focused Summarization ACL-2022 data	NEWTS
ASPECTNEWS: Aspect-Oriented Summarization of News Documents ACL-2022 code data	ASPECTNEWS
Aspect-controllable opinion summarization EMNLP-2021 code	SPACE, OPOSUM+
Decision-Focused Summarization EMNLP-2021 code data	Yelp's businesses, reviews, and user data
CATS: Customizable Abstractive Topic-based Summarization ACM-2021 code	CNNDM, AMI , ICSI, ADSE
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization TACL-2021 code data	WikiAsp
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach EMNLP-2020 code	CNN -Dailymail, MA News, All the News
OPINIONDIGEST: A Simple Framework for Opinion Summarization ACL-2020 code	Hotel, Yelp
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews SIGIR-2020 code data	Tourism Reviews
Generating topic-oriented summaries using neural attention NAACL-HLT-2018	CNNDM
Vocabulary Tailored Summary Generation ACL-2018	CNNDM

Role

Paper	Datasets Used
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions ACL-2022 code data	CSDS, MC
Towards Modeling Role-Aware Centrality for Dialogue Summarization AACL-2022 data	CSDS, MC
CSDS: A fine-grained Chinese dataset for customer service dialogue summarization EMNLP-2021 code data	CSDS

Diversity

Paper	Datasets Used
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation ACL-2022 code	CNN/DailyMail and Xsum and question generation (SQuAD)

Structure

Paper	Datasets Used
STRONG – Structure Controllable Legal Opinion Summary Generation IJCNLP-AACL-2023	CanLII
SentBS: Sentence-level beam search for controllable summarization EMNLP-2022 code	Meta Review Dataset (MReD)
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation ACL-2022 code data	MReD
Planning with Learned Entity Prompts for Abstractive Summarization TACL-2021	CNN/DailyMail, XSum, SAMSum, and BillSum

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Controllable Text Summarization Survey

Length

Coverage

Style

Abstractivity

Salience

Entity

Topic

Role

Diversity

Structure

About

Releases

Packages

Contributors 2

License

ashokurlana/controllable_text_summarization_survey

Folders and files

Latest commit

History

Repository files navigation

Controllable Text Summarization Survey

Length

Coverage

Style

Abstractivity

Salience

Entity

Topic

Role

Diversity

Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages