Skip to content

MaazAmjad/Urdu-News-Augmented-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

    Annotated Fake News Dataset in Urdu and Augmentation using Machine Translation
                          ===========================

                            March 03, 2020
                            
                       Maaz Amjad, Grigori Sidorov, Alisa Zhila

                   Natural Language and Text Processing Laboratory
                   Center for Computing Research (CIC)
                   Instituto Politécnico Nacional (IPN)
                   Ciudad de México (Mexico City), Mexico  

CONTENTS

  1. Introduction
  2. Feedback
  3. Citation Info
  4. Acknowledgments

1. Introduction

This dataset accompanies paper by Amjad, M., Sidorov, G., Zhila, A. Data Augmentation using Machine Translation for Fake News Detection in the Urdu Language (2020), LREC 2020 (accepted).

This is a language resource which contains a dataset of 900 news articles originally in Urdu annotated as real or fake. Additionally, it contains a 400 news article as an augmentation dataset generated using Google Translate MT system from English to Urdu, as well as a number of combinations of these datasets for exploration of the augmentation effect. The original English Fake News dataset is available from https://web.eecs.umich.edu/~mihalcea/downloads.html#FakeNews.


2. Feedback

If you want to know how this dataset was build (include the explanation of crawling and annotation technique) and how we did our experiments for Fake News detection in Urdu language using this dataset, you can read our paper in here:

For further questions or inquiries about this dataset, you can contact Maaz Amjad (maazamjad@phystech.edu)


3. Citation Info

This dataset and the other resource can be used for free, but if you want to publish paper/publication using this dataset, please cite this publication:

@article{Maazaug2020,
author = {Maaz Amjad, Grigori Sidorov, Alisa Zhila},
title = {Annotated Fake News Dataset in Urdu and Augmentation using Machine Translation},
conference = {Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)},
page = {2530–2535}
year = {2020}
}

4. Acknowledgments

The work was done with partial support of CONACYT project 240844 and SIP-IPN projects 20195719.

About

Urdu-News-Augmented-Dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published