MisinformationCorpusSinhala

A dataset consisting of 3576 documents in Sinhala, drawn from Sri Lankan news websites and factchecking operations, annotated as CREDIBLE, FALSE, PARTIAL or UNCERTAIN. The dataset has markers for the content of the document, the classification, the web domain from which each document was retrieved, and the date on which the document was published.

Paper (covering methodology and results of machine learning classification): https://lirneasia.net/2021/07/a-corpus-and-machine-learning-models-for-fake-news-classification-in-sinhala/

Update as of Nov 2022: please note that some parts of the original corpus were corrupted, for reasons unknown to us. This repo restores the files.

Using this work

This dataset is released under a CC BY 4.0 license. This license allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. For more information, see https://creativecommons.org/licenses/by/4.0/

Citing this work

@misc{jayawickrama2021sinhala,
   title={A corpus and machine learning models for fake news classification in sinhala},
   author={Vihanga Jayawickrama, Asanka Ranasinghe, Dimuthu C. Attanayake, and Yudhanjaya Wijeratne,
   year={2021},
   primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Corpus.csv		Corpus.csv
Corpus.xlsx		Corpus.xlsx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Corpus.csv

Corpus.csv

Corpus.xlsx

Corpus.xlsx

README.md

README.md

Repository files navigation

MisinformationCorpusSinhala

Using this work

Citing this work

About

Releases

Packages

Contributors 2

LIRNEasia/MisinformationCorpusSinhala

Folders and files

Latest commit

History

Repository files navigation

MisinformationCorpusSinhala

Using this work

Citing this work

About

Resources

Stars

Watchers

Forks