DivSumm summarization dataset

Dataset introduced in the paper: Analyzing the Dialect Diversity in Multi-document Summaries (COLING 2022) Olubusayo Olabisi, Aaron Hudson, Antonie Jetter, Ameeta Agrawal

DivSumm is a novel dataset consisting of dialect-diverse tweets and human-written extractive and abstractive summaries. It consists of 90 tweets each on 25 topics in multiple English dialects (African-American, Hispanic and White), and two reference summaries per input."

Directories

input_docs - 90 tweets per topic evenly distributed among 3 dialects; total 25 topics

abstractive - Two annotators were asked to summarize each topic in 5 sentences using their own words.

extractive - Two annotators were asked to select 5 tweets from each topic that summarized the input tweets.

Paper

You can find our paper here. If you use this dataset in your work, please cite our paper:

@inproceedings{olabisi-etal-2022-analyzing,
    title = "Analyzing the Dialect Diversity in Multi-document Summaries",
    author = "Olabisi, Olubusayo  and Hudson, Aaron  and Jetter, Antonie  and Agrawal, Ameeta",
    booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
    month = oct,
    year = "2022",
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
abstractive		abstractive
extractive		extractive
input_docs		input_docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DivSumm summarization dataset

Directories

Paper

About

Releases

Packages

Contributors 2

PortNLP/DivSumm

Folders and files

Latest commit

History

Repository files navigation

DivSumm summarization dataset

Directories

Paper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages