Skip to content

ashokurlana/controllable_text_summarization_survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Controllable Text Summarization Survey


This repository contains the controllable text summarization (CTS) survey papers and is based on our paper, "Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey"

You can cite our paper as the following

@misc{urlana2023controllable,
      title={Controllable Text Summarization: Unraveling Challenges, Approaches, and Prospects -- A Survey}, 
      author={Ashok Urlana and Pruthwik Mishra and Tathagato Roy and Rahul Mishra},
      year={2023},
      eprint={2311.09212},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

We group the papers according to the controllable aspects as Length, Coverage, Style, Abstractivity, Salience, Entity, Topic, Role, Diversity, Structure.

Length

Paper Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data CNN Daily Mail, QMSum
Abstractive Document Summarization with Summary-length Prediction EACL-2023 CNNDM, NYT, WikiHow
Length Control in Abstractive Summarization by Pretraining Information Selection ACL-2022 code CNN-DailyMail, XSUM
Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization EMNLP-2022 code DUC2004
A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization Neurips-2022 code Gigaword, DUC2004
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code CNNDM, arXiv, BIGPATENT
A New Approach to Overgenerating and Scoring Abstractive Summaries NAACL-2021 code data Gigaword, Newsroom
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code CNNDM, Newsroom, DUC-2002
Lenatten: An effective length controlling unit for text summarization ACL-2021 code CNNDM
Interpretable multi headed attention for abstractive summarization at controllable lengths COLING-2020 MSR Narratives and Thinking-Machines
Positional Encoding to Control Output Sequence Length NAACL-2019 code JAMUS corpus (Japanese) of different number of characters present in the summary
Global Optimization under Length Constraint for Neural Text Summarization ACL-2019 CNNDM, Mainichi
A Large-Scale Multi-Length Headline Corpus for Analyzing Length-Constrained Headline Generation Model Evaluation INLG-2019 data JAMUS corpus (Japanese) of different number of characters present in the summary
Controllable Abstractive Summarization ACL-NMT(W)-2018 CNN-DailyMail
Unsupervised Sentence Compression using Denoising Auto-Encoders CoNLL-2018 code Gigaword
Controlling Length in Abstractive Summarization Using a Convolutional Neural Network EMNLP-2018 code CNNDM, DMQA
Controlling Output Length in Neural Encoder-Decoders EMNLP-2016 code DUC2004, Gigaword
A Neural Attention Model for Abstractive Sentence Summarization EMNLP-2015 NYT, DUC2004

Coverage

Paper Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data CNN Daily Mail, QMSum
SWING : Balancing Coverage and Faithfulness for Dialogue Summarization EACL-2023 code DIALOG-SUM, SAMSUM
Unsupervised Multi-Granularity Summarization EMNLP-2022 data GranuDUC, MultiNews, DUC2004, Arxiv
Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities NIPS-2022 code data Multi-LexSum
Controllable Abstractive Dialogue Summarization with Sketch Supervision ALC-IJCNLP-2021 code SAMSum
SemSUM: Semantic Dependency Guided Neural Abstractive Summarization AAAI-2020 data Gigaword, DUC2004 and MSR abstractive summarization dataset
Get to the point: Summarization with pointer generator networks ACL-2017 code CNNDM

Style

Paper Datasets Used
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles ACL-BIoNLP(W)-2023 PLOS and eLife
Generating Summaries with Controllable Readability Levels EMNLP-2023 code CNNDM
HYDRASUM: Disentangling Style Features in Text Summarization with Multi-Decoder Models EMNLP-2022 code CNN Daily Mail, XSUM, Newsroom
Readability Controllable Biomedical Document Summarization EMNLP-2022 data TS and PLS
Inference time style control for summarization NAACL-2021 code CNNDM
Hooks in the Headline: Learning to Generate Headlines with Controlled Styles ACL-2020 code NYT, CNN
Generating Formality-tuned Summaries Using Input-dependent Rewards CoNLL-2019 CNN Daily Mail + Webis-TLDR-17 corpus
Controllable Abstractive Summarization ACL-NMT(W)-2018 CNN-DailyMail

Abstractivity

Paper Datasets Used
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code CNNDM, Newsroom, DUC-2002
Controlling the Amount of Verbatim Copying in Abstractive Summarization AAAI-2020 code Gigaword, Newsroom
Improving Abstraction in Text Summarization EMNLP-2018 CNNDM
Get to the point: Summarization with pointer generator networks ACL-2017 code CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code CNN/DM, DUC2002

Salience

Paper Datasets Used
Incorporating Question Answering-Based Signals into Abstractive Summarization via Salient Span Selection EACL-2023 CNNDM, XSUM, NYTimes
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code QMSum and SQuALITY
Guiding Generation for Abstractive Text Summarization based on Key Information Guide Network NAACL-HLT-2018 CNNDM
SummaRuNNer: A Recurrent Neural Network based Sequence Model for Extractive Summarization of Documents AAAI-2017 code CNN/DM, DUC2002

Entity

Paper Datasets Used
SOCRATIC Pretraining: Question-Driven Pretraining for Controllable Summarization ACL-2023 code QMSum and SQuALITY
Extractive Entity-Centric Summarization as Sentence Selection using Bi-Encoders AACL-2022 EntSum
CTRLSUM: Towards Generic Controllable Text Summarization EMNLP-2022 code CNNDM, arXiv, BIGPATENT
ENTSUM: A Data Set for Entity-Centric Summarization ACL-2022 code data CNNDM, NYT
Controllable Summarization with Constrained Markov Decision Process TACL-2021 code CNNDM, Newsroom, DUC-2002
Controllable Neural Dialogue Summarization with Personal Named Entity Planning EMNLP-2021 code SAMSum
Controllable Abstractive Sentence Summarization with Guiding Entities COLING-2020 code Gigaword, DUC2004
Controllable Abstractive Summarization ACL-NMT(W)-2018 CNN-DailyMail

Topic

Paper Datasets Used
MACSUM: Controllable Summarization with Mixed Attributes TACL -2023 code data CNN Daily Mail, QMSum
Topic-aware Multimodal Summarization AACL-2022 code data MSMO
NEWTS: A Corpus for News Topic-Focused Summarization ACL-2022 data NEWTS
ASPECTNEWS: Aspect-Oriented Summarization of News Documents ACL-2022 code data ASPECTNEWS
Aspect-controllable opinion summarization EMNLP-2021 code SPACE, OPOSUM+
Decision-Focused Summarization EMNLP-2021 code data Yelp's businesses, reviews, and user data
CATS: Customizable Abstractive Topic-based Summarization ACM-2021 code CNNDM, AMI , ICSI, ADSE
WikiAsp: A Dataset for Multi-domain Aspect-based Summarization TACL-2021 code data WikiAsp
Summarizing Text on Any Aspects: A Knowledge-Informed Weakly-Supervised Approach EMNLP-2020 code CNN -Dailymail, MA News, All the News
OPINIONDIGEST: A Simple Framework for Opinion Summarization ACL-2020 code Hotel, Yelp
Read what you need: Controllable Aspect-based Opinion Summarization of Tourist Reviews SIGIR-2020 code data Tourism Reviews
Generating topic-oriented summaries using neural attention NAACL-HLT-2018 CNNDM
Vocabulary Tailored Summary Generation ACL-2018 CNNDM

Role

Paper Datasets Used
Other Roles Matter! Enhancing Role-Oriented Dialogue Summarization via Role Interactions ACL-2022 code data CSDS, MC
Towards Modeling Role-Aware Centrality for Dialogue Summarization AACL-2022 data CSDS, MC
CSDS: A fine-grained Chinese dataset for customer service dialogue summarization EMNLP-2021 code data CSDS

Diversity

Paper Datasets Used
A Well-Composed Text is Half Done! Composition Sampling for Diverse Conditional Generation ACL-2022 code CNN/DailyMail and Xsum and question generation (SQuAD)

Structure

Paper Datasets Used
STRONG – Structure Controllable Legal Opinion Summary Generation IJCNLP-AACL-2023 CanLII
SentBS: Sentence-level beam search for controllable summarization EMNLP-2022 code Meta Review Dataset (MReD)
MReD: A Meta-Review Dataset for Structure-Controllable Text Generation ACL-2022 code data MReD
Planning with Learned Entity Prompts for Abstractive Summarization TACL-2021 CNN/DailyMail, XSum, SAMSum, and BillSum

About

This repository contains the controllable text summarization (CTS) survey papers

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published