Urdu-threat-detection-FIRE2022

                            Urdu Threatening Tweet Dataset
                          ===========================

1. Introduction

This README demonstrates Urdu Threatening Tweet dataset that contain "Threatening" and "non-threatening" tweets.

2. Data collection

"Urdu Threatening Tweet Dataset" dataset contains "Threatening" and "non-threatening" tweets in Urdu language. More details on the data collection are provided in the paper.

3. Dataset structure

This dataset has two folders. Each folder contains real and fake news. There are 5 types of news in this dataset such as technology, business, sports, health, and entertainment. The file names show the category of news. For example, the news categories are sports (sp), technology (tech), etc.

The class distribution for the fake and real news for each dataset is shown below:

Dataset	Class	Entries
	Threatening	1,782
	------------	---------
	Non-Threatening	1,782
Total Tweets	-----	3,564

Dataset	Class	Entries
Group Threat	Group	1,341
	------------	---------
Individual Threat	Individual	441
Total Tweets	-----	1,782

4. Task and Dataset Description

This is a binary classiifcation task. There are two primary tasks in this shared task.

The first task is to classify the given tweet as "Threatening" and "non-threatening".
The second task is if the tweet is classified as a "threatening" tweet, then it should be further classified as a threat given to an "individual" or a "group".

The format of the Training Dataset.xlsx file is as follows:

The first column corresponds to the Tweet text to be classified as "Threatening" and "non-threatening". If the tweet is classified as "threatening" tweet, then it should be further classified as threat is given to an "individial" or a "group".

The second column contains the truth label "1" or "0".

a. The truth label "1" corresponds to the threatening tweet. 

b. The truth label "0" corresponds to the non-threatening tweet.

The third column contains the truth label "1", "0", "2".

a. The truth label "1" corresponds to the threat is given to a "group" (G). 

b. The truth label "0" corresponds to the threat is given to an "individual" (Single = S). 

c. The truth label "2" corresponds to the non-threatening tweet.

5. Feedback

If you want to know how this dataset was build (include the explanation of crawling and annotation technique) and how we did our experiments for identifying Threatening Tweet in Urdu language using this dataset, you can read our paper DOI: 10.1109/ACCESS.2021.3112500

For further questions or inquiries about this dataset, you can contact Maaz Amjad (h.maazamjad@gmail.com)

6. Citation Info

This dataset and the other resource can be used for free, but if you want to publish a paper/publication using this dataset, please cite this publication:

@ARTICLE{maaz9536729,
  author={Amjad, Maaz and Ashraf, Noman and Zhila, Alisa and Sidorov, Grigori and Zubiaga, Arkaitz and Gelbukh, Alexander},
  journal={IEEE Access}, 
  title={Threatening Language Detection and Target Identification in Urdu Tweets}, 
  year={2021},
  volume={9},
  number={},
  pages={128302-128313},
  doi={10.1109/ACCESS.2021.3112500}}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
Baseline		Baseline
README.md		README.md
Test_data_with_truth_labels.xltx		Test_data_with_truth_labels.xltx
Test_with_truth_labels_Statistic.ipynb		Test_with_truth_labels_Statistic.ipynb
Training Dataset.xlsx		Training Dataset.xlsx
test_data.xls		test_data.xls

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Urdu-threat-detection-FIRE2022

CONTENTS

1. Introduction

2. Data collection

3. Dataset structure

4. Task and Dataset Description

5. Feedback

6. Citation Info

About

Releases

Packages

Languages

MaazAmjad/Urdu-threat-detection-FIRE2022

Folders and files

Latest commit

History

Repository files navigation

Urdu-threat-detection-FIRE2022

CONTENTS

1. Introduction

2. Data collection

3. Dataset structure

4. Task and Dataset Description

5. Feedback

6. Citation Info

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages