Skip to content

TobeyYang/Yahoo-News-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Yahoo! News Dataset

2021.1.22 Update the download link.

This repository contains the Yahoo! news dataset in paper Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation (EMNLP2019). The dataset is available here.

We build this dataset by crawling news articles and the associated comments from Yahoo! News. The side information associated with news including:

  • Paragraph. After pre-processing, we retain the paragraph structure of news article.
  • Category. There are 31 news categories and the distribution is shown in figure 1.
  • Wiki-Entities. The Wikipedia entities mentioned in the news articles are extracted.
  • Vote. Each comment has upvote, downvote and abusevote information from news readers.
  • Sentiment. Each comment is annotated with POSITIVE, NEGATIVE or NEUTRAL by Yahoo!.

1

After pre-processing, We randomly sample a training set, a validation set and a test set. Please refer to the paper for more details.

Train Validation Test
# News 152,355 5,000 3,160
Avg. # Comments per News 20.6 20.5 20.5
Avg. #Upvotes per Comment 31.4 30.2 32.0
Avg. #DownVotes per Comment 4.8 4.8 4.9
Avg. #AbuseVotes per Comment 0.05 0.05 0.05

Citation

If you use this dataset in your research work, please cite our EMNLP2019 paper.

@inproceedings{yang-etal-2019-read,
    title = "Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation",
    author = "Yang, Ze  and
      Xu, Can  and
      Wu, Wei  and
      Li, Zhoujun",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1512",
    doi = "10.18653/v1/D19-1512",
    pages = "5076--5088",
}

About

Yahoo! news dataset of DeepCom (EMNLP2019)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published