"Rossiya Segodnya" news dataset

This repository contains a news dataset presented in the paper:

Daniil Gavrilov, Pavel Kalaidin, and Valentin Malykh. Self-Attentive Model for Headline Generation. 41st European Conference on Information Retrieval, 2019. arXiv:1901.07786 [cs.CL]

To download the dataset please use a direct link or clone the repository using git lfs.

Description

Full dataset contains 1003869 Russian language news documents from January, 2010 to December, 2014.

ria_20.json contains the first 20 news documents from the dataset.
ria_1k.json contains the first 1000 news documents from the dataset.
ria.json.gz is full GZip'ed dataset.

Dataset format: each row contains a JSON document that consists of two fields: text is a document body, while title is a news headline.

License

This data is lisensed by Rossiya Segodnya news agency (ria.ru) under CC-BY-ND-NC license. The license text could be accessed here. The Russian version of the same license could be accessed here.

Misc

If you're using the data in a research please consider citing the mentioned paper:

@inproceedings{gavrilov2018self,
	title={Self-Attentive Model for Headline Generation},
	author={Gavrilov, Daniil and  Kalaidin, Pavel and  Malykh, Valentin},
	booktitle={Proceedings of the 41st European Conference on Information Retrieval},
	year={2019}
}

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
LICENSE		LICENSE
LICENSE.ru		LICENSE.ru
README.md		README.md
ria.json.gz		ria.json.gz
ria_1k.json		ria_1k.json
ria_20.json		ria_20.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

"Rossiya Segodnya" news dataset

Description

License

Misc

About

Licenses found

Releases

Packages

Contributors 3

License

Licenses found

RossiyaSegodnya/ria_news_dataset

Folders and files

Latest commit

History

Repository files navigation

"Rossiya Segodnya" news dataset

Description

License

Misc

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages