Constraint-AAAI-2021-Hostility-Detection-in-Hindi-Using-Multilingual-Finetuned-Features

Published research Work at Constraint 2021: Workshop of the prestigious AAAI 2021 (Core A*) conference. Publisher: Springer CCIS (Paper - https://link.springer.com/chapter/10.1007/978-3-030-73696-5_19)

Workshop Details

Workshop Name: CONSTRAIN 2021 (Workshop on Combating Online Hostile Posts in Regional Languages during Emergency Situation)
Workshop Page: https://constraint-shared-task-2021.github.io/
Conference Name: 35th AAAI Conference on Artificial Intelligence 2021 (Core A*)
Conference Page: https://aaai.org/Conferences/AAAI-21/

Authors

Arkadipta De (https://github.com/Arko98) [IIT Hyderabad M.Tech in Artificial Intelligence] [Main and Corresponding Author]
Venkatesh Elangovan (https://github.com/venkateshelangovan) [IIT Hyderabad M.Tech in Artificial Intelligence]
Kaushal Kumar Maurya (https://github.com/kaushal0494) [IIT Hyderabad PhD in Computer Science and Engineering]
Dr. Maunendra Sankar Desarkar (https://www.iith.ac.in/~maunendra/) [Assistant Professor, Computer Science and Engineering Department, IIT Hyderabad]

Contact:

Arkadipta De - ai20mtech14002@iith.ac.in
Venkatesh E - ai20mtech14005@iith.ac.in
Kaushal Kumar Maurya - cs18resch11003@iith.ac.in

Citation Details

For research purpose, please cite the following

@InProceedings{10.1007/978-3-030-73696-5_19,
author="De, Arkadipta
and Elangovan, Venkatesh
and Maurya, Kaushal Kumar
and Desarkar, Maunendra Sankar",
editor="Chakraborty, Tanmoy
and Shu, Kai
and Bernard, H. Russell
and Liu, Huan
and Akhtar, Md Shad",
title="Coarse and Fine-Grained Hostility Detection in Hindi Posts Using Fine Tuned Multilingual Embeddings",
booktitle="Combating Online Hostile Posts in Regional Languages during Emergency Situation",
year="2021",
publisher="Springer International Publishing",
address="Cham",
pages="201--212",
abstract="Due to the wide adoption of social media platforms like Facebook, Twitter, etc., there is an emerging need of detecting online posts that can go against the community acceptance standards. The hostility detection task has been well explored for resource-rich languages like English, but is unexplored for resource-constrained languages like Hindi due to the unavailability of large suitable data. We view this hostility detection as a multi-label multi-class classification problem. We propose an effective neural network-based technique for hostility detection in Hindi posts. We leverage pre-trained multilingual Bidirectional Encoder Representations of Transformer (mBERT) to obtain the contextual representations of Hindi posts. We have performed extensive experiments including different pre-processing techniques, pre-trained models, neural architectures, hybrid strategies, etc. Our best performing neural classifier model includes One-vs-the-Rest approach where we obtained 92.60{\%}, 81.14{\%}, 69.59{\%}, 75.29{\%} and 73.01{\%} F1 scores for hostile, fake, hate, offensive, and defamation labels respectively. The proposed model (https://github.com/Arko98/Hostility-Detection-in-Hindi-Constraint-2021) outperformed the existing baseline models and emerged as the state-of-the-art model for detecting hostility in the Hindi posts.",
isbn="978-3-030-73696-5"
}

Instructions for Fine-Tuning

General Instructions: a. Run all Code files in Google Colab b. Use Links to Download or Load the Finetuned Models into Colab Local Directory (These FineTuned Models were used to produce results)

Load Provided bin_train.xlxs and bin_validate.xlxs into Coarse-Grained_Finetune (mBERT).ipynb and run. It will return two .npy files a. Train_Probs_Coarse_mBERT.npy b. Test_Probs_Coarse_mBERT.npy => Note: Already finetuned Model link: https://drive.google.com/drive/folders/1GD-3vX7tV2caTi8k5kF208aeDvqrWNez?usp=sharing
Load Provided bin_train.xlxs and bin_validate.xlxs into Coarse-Grained_Finetune (XLMR).ipynb and run. It will return two .npy files a. Train_Probs_Coarse_XLMR.npy b. Test_Probs_Coarse_XLMR.npy => Note: Already finetuned Model link: https://drive.google.com/file/d/1zJAZBNTgICJdVCUBblMVvICoHN7ivE8c/view?usp=sharing Additionally this code will give two more .npy files a. Train_Labels.npy (Train Labels from bin_train.xlxs) b. Test_Labels.npy (Validation Labels from bin_validate.xlxs)
Load The following files into Colab Local Directory a. Train_Probs_Coarse_mBERT.npy b. Test_Probs_Coarse_mBERT.npy c. Train_Probs_Coarse_XLMR.npy d. Test_Probs_Coarse_XLMR.npy e. Train_Labels.npy f. Test_Labels.npy And run the code Coarse_Grained_Ensemble_Finetune.ipynb. It will give the following two files - a. Task_1_Pred_Labels.npy (Predicted Labels for bin_validate.xlxs) [0 = non_hostile, 1= hostile] b. Task_1_Best.h5 (Ensemble Model's Saved Weights after Finetuning) => Note: Already finetuned Model link: https://drive.google.com/file/d/1Jxbh675wwhYGZ_zqGjtrlxILMcXgSxAF/view?usp=sharing
Using the Task_1_Pred_Labels.npy we then change the bin_validate.xlxs dataset into four versions (Fake, Hate, Defamation and Offensive). Each file has labels as {Fake,Non-Fake}, {Hate,Non-Hate} etc. We provide the files as follows - a. fake_train.xlxs and fake_validate.xlxs b. hate_train.xlxs and hate_validate.xlxs c. offensive_train.xlxs and offensive_validate.xlxs d. defamation_train.xlxs and defamation_validate.xlxs
Load fileset a or b or c (from Point 4) in code Fine_Grained_Finetune (Fake,Hate,Offensive).ipynb seperately and run three times separately. It will return one .npy file each time - a. Final_Fake_validation_Pred_Label.npy for Fileset a b. Final_Hate_validation_Pred_Label.npy for Fileset b c. Final_Offensive_validation_Pred_Label.npy for Fileset c => Note: Model link (for fileset a): https://drive.google.com/file/d/1MakVA0Tq2Lw7NVUbPNYMS6-mIC4xoVt8/view?usp=sharing => Note: Model link (for fileset b): https://drive.google.com/file/d/1-HQpdByfHOSfK1K_EVDLDUxzAm9yzcYf/view?usp=sharing => Note: Model link (for fileset c): https://drive.google.com/file/d/1hZHEIdXI3RDwzJwV0xUiqn9Ux9fhmpqN/view?usp=sharing
Load fileset d (from Point 4) in code Fine_Grained_Finetune (Defamation).ipynb run. It will return the following .npy file. a. Final_Defamation_validation_Pred_Label.npy for Fileset d => Note: Model link (for fileset d): https://drive.google.com/drive/folders/1lFfOVSikbP-aRWHev_NoEdLYYS38mw6E?usp=sharing
Load the four files as below in the Final_Result_Validation.ipynb - a. Final_Fake_validation_Pred_Label.npy b. Final_Hate_validation_Pred_Label.npy c. Final_Offensive_validation_Pred_Label.npy d. Final_Defamation_validation_Pred_Label.npy and run. It will return - a. answer.csv (Final Submission File for Validation Dataset) b. Shows the Complete Result according to Official Evaluation Script (released by Organizers)

Instructions for Model-Inference

General Instructions: a. Run all Code files in Google Colab b. For reproducilibility always load the pretrained models from provided links and do not run training, directly run inference. c. Use Links to Download or Load the Finetuned Models into Colab Local Directory (These FineTuned Models were used to produce results) d. Change Paths accordingly

Load Provided bin_train.xlxs and Validation.xlxs (for producing Validation Results) / Hindi_Test.xlxs (for producing Test Results) into Coarse-Grained_Finetune (mBERT).ipynb and run. It will return an .npy file a. Test_Probs_Coarse_mBERT.npy => Note: Already finetuned Model link: https://drive.google.com/drive/folders/1GD-3vX7tV2caTi8k5kF208aeDvqrWNez?usp=sharing
Load Provided bin_train.xlxs and Validation.xlxs (for producing Validation Results) / Hindi_Test.xlxs (for producing Test Results) into Coarse-Grained_Finetune (XLMR).ipynb and run. It will return an .npy file a. Test_Probs_Coarse_XLMR.npy => Note: Already finetuned Model link: https://drive.google.com/file/d/1zJAZBNTgICJdVCUBblMVvICoHN7ivE8c/view?usp=sharing
Load The following files into Colab Local Directory a. Test_Probs_Coarse_mBERT.npy b. Test_Probs_Coarse_XLMR.npy And run the code Coarse_Grained_Ensemble_Finetune.ipynb. It will give the following file - a. Task_1_Pred_Labels.npy [0 = non_hostile, 1= hostile] => Note: Already finetuned Model link: https://drive.google.com/file/d/1Jxbh675wwhYGZ_zqGjtrlxILMcXgSxAF/view?usp=sharing
Using Task_1_Pred_Labels.npy we separate the Predicted Hostile Posts from Validation.xlxs and create new data file known as Hostile_Validate.xlxs OR Using Task_1_Pred_Labels.npy we separate the Predicted Hostile Posts from Hindi_Test.xlxs and create new data file known as Hostile_Hindi_Test.xlxs
Load Hostile_Validate.xlxs (for producing Validation Results) / Hostile_Hindi_Test.xlxs (for producing Test Results) in code Fine_Grained_Finetune (Fake,Hate,Offensive).ipynb seperately and run three times separately with dataset parameter as (fake,hate,offensive) and Label_name parameter as (Fake,Hate,Offensive) It will return one .npy file each time - a. Pred_Fake_Label.npy for dataset = 'fake' and Label_name = 'Fake' b. Pred_Hate_Label.npy for dataset = 'hate' and Label_name = 'Hate' c. Pred_Offensive_Label.npy for dataset = 'offensive' and Label_name = 'Offensive' => Note: Model link (for fake): https://drive.google.com/file/d/1MakVA0Tq2Lw7NVUbPNYMS6-mIC4xoVt8/view?usp=sharing => Note: Model link (for hate): https://drive.google.com/file/d/1-HQpdByfHOSfK1K_EVDLDUxzAm9yzcYf/view?usp=sharing => Note: Model link (for offensive): https://drive.google.com/file/d/1hZHEIdXI3RDwzJwV0xUiqn9Ux9fhmpqN/view?usp=sharing
Load defamation_train.xlxs and Hostile_Validate.xlxs (for producing Validation Results) / Hostile_Hindi_Test.xlxs (for producing Test Results) in code Fine_Grained_Finetune (Defamation).ipynb run. It will return the following .npy file. a. Pred_Defamation_Label.npy for Defamation => Note: Model link (for defamation): https://drive.google.com/drive/folders/1lFfOVSikbP-aRWHev_NoEdLYYS38mw6E?usp=sharing
Load the files as below in the Final_Result.ipynb - a. Pred_Fake_Label.npy b. Pred_Hate_Label.npy c. Pred_Offensive_Label.npy d. Pred_Defamation_Label.npy e. Task_1_Pred_Labels.npy (from Point 3) f. Hostile_Validate.xlxs (for producing Validation Results) / Hostile_Hindi_Test.xlxs (for producing Test Results) and run. It will return - a. Result.csv (Final Submission File Validation OR Test) b. Shows the Complete Result according to Official Evaluation Script (released by Organizers) (change addresses accordingly)

Fine-Tuned Released Model Checkpoints

mBERT Coarse Grained : https://drive.google.com/drive/folders/1GD-3vX7tV2caTi8k5kF208aeDvqrWNez?usp=sharing
XLMR Coarse Grained : https://drive.google.com/file/d/1zJAZBNTgICJdVCUBblMVvICoHN7ivE8c/view?usp=sharing
Coarse Grained : https://drive.google.com/file/d/1Jxbh675wwhYGZ_zqGjtrlxILMcXgSxAF/view?usp=sharing
Fine Grained Fake : https://drive.google.com/file/d/1MakVA0Tq2Lw7NVUbPNYMS6-mIC4xoVt8/view?usp=sharing
Fine Grained Hate : https://drive.google.com/file/d/1-HQpdByfHOSfK1K_EVDLDUxzAm9yzcYf/view?usp=sharing
Fine Grained Offensive : https://drive.google.com/file/d/1hZHEIdXI3RDwzJwV0xUiqn9Ux9fhmpqN/view?usp=sharing
Fine Grained Defamation: https://drive.google.com/drive/folders/1lFfOVSikbP-aRWHev_NoEdLYYS38mw6E?usp=sharing

Note: The Dataset can be found at official dataset repository https://github.com/mohit19014/Hindi-Hostility-Detection-CONSTRAINT-2021

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
Fine-Tuning		Fine-Tuning
Model-Inference		Model-Inference
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-Tuning

Fine-Tuning

Model-Inference

Model-Inference

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Constraint-AAAI-2021-Hostility-Detection-in-Hindi-Using-Multilingual-Finetuned-Features

Workshop Details

Authors

Citation Details

Instructions for Fine-Tuning

Instructions for Model-Inference

Fine-Tuned Released Model Checkpoints

About

Releases

Packages

Languages

License

Arko98/Hostility-Detection-in-Hindi-Constraint-2021

Folders and files

Latest commit

History

Repository files navigation

Constraint-AAAI-2021-Hostility-Detection-in-Hindi-Using-Multilingual-Finetuned-Features

Workshop Details

Authors

Citation Details

Instructions for Fine-Tuning

Instructions for Model-Inference

Fine-Tuned Released Model Checkpoints

About

Resources

License

Stars

Watchers

Forks

Languages