GBV-Resources

This repository serves as a comprehensive collection of resources for the automated identification of online Gender-Based Violence (GBV) and related phenomena.

For further details, see our systematic review paper:

Gavin Abercrombie, Aiqi Jiang, Poppy Gerrard-Abbott, Ioannis Konstas, and Verena Rieser. 2023. Resources for Automated Identification of Online Gender-Based Violence: A Systematic Review. Proceedings of the 7th Workshop on Online Abuse and Harms (WOAH). Association for Computational Linguistics.

Bibtex:

@inproceedings{abercrombie-etal-2023-resources,
    title = {Resources for Automated Identification of Online {G}ender-{B}ased {V}iolence: A Systematic Review},
    author = {Abercrombie, Gavin} and Jaing, Aiqi and Gerrard-Abbot, Poppy and Konstas, Ioannis and Rieser, Verena},
    booktitle = {Proceedings of the 7th Workshop on Online Harms and Abuse},
    month = {July},
    year = {2023},
    address = {Toronto},
    publisher = {Association for Computational Linguistics},
}

Contribute to this list

Something missing?

You can contribute to this list by editing this file and making a pull request.

Please follow this template and add details at the bottom of the list.

Template:

| Reference | Title | Dataset URL | GBV characterisation | Platform | Language | Modality | Sampling | Date of data | Annotators | IRB | Non-aggregated labels | Data Statement |

Reference: Link to publication or description of the resource
Title: Publication or dataset name
Dataset URL: Link to the dataset
GBV characterisation: How is GBV described (e.g. 'misogyny', `gender' as a hate speech target)
Platform: e.g. Twitter, TikTok etc.
Language: e.g. Basque, Scottish Gaelic, Mi’kmaq etc.
Modality: e.g. Text, Video, Meme etc.
Sampling: How the source data was sampled e.g. keywords, targeted accounts, random etc.
Date of data: The dates between which the source data was produced/published
Annotators: The number and type of annotators e.g. 3 students, 10000 crowdworkers, 5 per item etc.
IRB: Whether the study passed ethical review by an Institutional Review Board or similar
Non-aggregated labels: Whether the non-aggregated labels have been released
Data Statement: Whether there is a data statement (Bender & Friedman, TACL 2018) describing the resource

Datasets

Reference	Title	Dataset URL	GBV characterisation	Platform	Language	Modality	Sampling	Date of data	Annotators	IRB	Non-aggregated labels	Data Statement
Al-Hassan and Al-Dossari, 2022	Detection of hate speech in Arabic tweets using deep learning	N/A	Sexism	Twitter	Arabic	Text	Keywords	Unknown	2 volunteers	No	No	No
Almanea and Poesio, 2022	ArMIS - The Arabic Misogyny and Sexism Corpus with Annotator Subjective Disagreements	https://codalab.lisn.upsaclay.fr/competitions/6146#learn_the_details-get_starting_kit	Misogyny, Sexism	Twitter	Arabic	Text	Keywords	October 2020	3 main annotators and 32 others. Self-defined beliefs and gender	No	Yes	No
Alsafari et al., 2020	Hate and offensive speech detection on Arabic social media	https://github.com/sbalsefri/ArabicHateSpeechDataset	Gender as category	Twitter	Arabic (Gulf)	Text	keywords, hashtags, profiles	April - September 2019	3: 2 women, 1 man	No	No	No
Anzovino et al., 2018	Automatic Identification and Classification of Misogynistic Language on Twitter	https://amievalita2018.wordpress.com/data/	Misogyny	Twitter	English	Text	keywords, hashtags, mentions of potential harassed users, self-declared mysoginist profiles	2017	3 experts + crowdworkers	No	No	Yes
Assenmacher et al., 2021	RP-Mod & RP-Crowd: Moderator- and Crowd-Annotated German News Comment Datasets	https://zenodo.org/record/5291339#.Y6RfyOLP3S6	Sexism	Rheinische Post	German	Text	Comments blocked by community managers	Nov. 2018 - June 2020	5 per item	No	No	No
Basile et al., 2019	SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter	https://competitions.codalab.org/competitions/19935	Women as target	Twitter	English, Spanish	Text	Victims of hate accounts; identified haters; keywords	July 2018 - Sept. 2018 + from earlier datasets	Crowdworkers	No	No	No
Bhattacharya et al., 2020	Developing a Multilingual Annotated Corpus of Misogyny and Aggression	https://sites.google.com/view/trac2/shared-task?pli=1	Misogyny	Facebook, Twitter, YouTube	Bangla, English, Hindi, code-mixed	Text	Topics	Unknown	4 linguists 'expected to have a centrist or left-leaning political orientation'	No	No	No
Borkan et al., 2019	Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification	https://www.tensorflow.org/datasets/catalog/civil_comments	Gender as category, subgroups: Male, Female, Transgender, Other gender	Comment forums	English	Text	Unknown	Unknown	Crowdworkers	Unknown	No	No
Bosco et al.	Overview of the EVALITA 2018 Hate Speech Detection Task	http://www.di.unito.it/~tutreeb/haspeede-evalita18/data.html	'Gender issues'-based hate	Facebook, Twitter	Italian	Text	Facebook: targeted pages and groups; Twitter: keywords	Facebook: 2016; Twitter 2017-2018	Facebook: bachelor students; Twitter: experts and crowdworkers	No	No	No
Cercas Curry et al., 2021	ConvAbuse: Data, Analysis, and Benchmarks for Nuanced Abuse Detection in Conversational AI	https://github.com/amandacurry/convabuse	Sexism, sexual harrassment	Dialogue systems: ELIZA, CarbonBot (Facebook)	English	Text	Stratified keyword	CarbotBot: Oct. 2019 - Dec. 2020; ELIZA: Dec. 2002 - Nov. 2007	6 female and 2 non-binary Gender Studies students	Yes	Yes	Yes
Chiril et al., 2021	"Be nice to your wife! The restaurants are closed": Can Gender Stereotype Detection Improve Sexism Classification?	https://bit.ly/FrenchGenderStereotypes	Sexism	Twitter	French	Text	Keywords, personal names, hashtags	Unknown	1 male, 1 female students in Linguistics and Communication and Gender	No	No	No
Chiril et al., 2019	Multilingual and Multitarget Hate Speech Detection in Tweets	N/A	Sexism	Twitter	2 female and 1 male students in Communication and Gender	French	Text	Oct. 2017 - May 2018	Keywords	No	No	No
Chiril et al. 2020	An Annotated Corpus for Sexism Detection in French Tweets	https://github.com/patriChiril/An-Annotated-Corpus-for-Sexism-Detection-in-French-Tweets	Sexism	Twitter	French	Text	Keywords, hashtags, personal names	Oct. 2017 - May 2018	3 female and 2 male Communication and Gender students	No	No	No
Chung and Lin, 2021	TOCAB: A Dataset for Chinese Abusive Language Processing	http://nlp.cse.ntou.edu.tw/resources/TOCAB/	Sex (gender, sexual orientation, or gender identity) as abuse category	PTT (Taiwanese bulletin board)	Chinese	Text	Popular posts	Mar. 2019 - June 2019	12 students	No	No	No
Das et al., 2022	Hate Speech and Offensive Language Detection in Bengali	https://github.com/hate-alert/Bengali_Hate	Gender as target	Twitter	Bengali	Text	Keywords	Unknown	4 Computer Science students	No	No	No
El Ansari et al., 2020	A Dataset to Support Sexist Content Detection in Arabic Text	N/A	Sexism, discrimination and Violence Against Women	Twitter	Arabic	Text	Keywords	2018	Volunteers	No	No	No
Fanton et al., 2021	Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech	https://github.com/marcoguerini/CONAN	Women as target	Semi-synthetic text	English	Text	Unknown	Unknown	3 student interns	No	No	No
Fersini et al., 2018	Overview of the Task on Automatic Misogyny Identification at IberEval 2018	https://amiibereval2018.wordpress.com/important-dates/data/	Misogyny	Twitter	English Spanish	Text	Keywords	July 2017 - Nov. 2017	Unknown + crowdworkers	No	No	No
Fersini et al., 2020	AMI @ EVALITA2020: Automatic Misogyny Identification	https://github.com/dnozza/ami2020	Misogyny	Twitter	Italian	Text	Unknown	2018 + 2020	Unknown	No	No	No
Fersini et al., 2022	SemEval-2022 Task 5: Multimedia Automatic Misogyny Identification	https://competitions.codalab.org/competitions/34175	Misogyny	Twitter, Reddit; meme sites e.g., 9GaG, Knowyourmeme, Imgur	English	Memes	Threads with women as the subject; anti-women accounts; (3) target victim accounts; (4) keywords and hashtags	Unknown	Unknown	No	No	No
García-Díaz et al., 2021	Detecting misogyny in Spanish tweets. An approach based on linguistics features and word embeddings	https://pln.inf.um.es/corpora/misogyny/misocorpus-spanish-2020.rar	Misogyny	Twitter	Spanish	Text	Target accounts; geographical locations; keywords	Unknown	5 women, 2 men: authors, 2 colleagues, 1 student	No	No	No
Gomez et al., 2021	Exploring hate speech detection in multimodal publications	https://drive.google.com/file/d/1S9mMhZFkntNnYdO-1dZXwF_8XIiFcmlF	Sexism	Twitter	English	Image + text	Keywords	Sept. 2018 - Feb. 2019	3 crowdworkers per item	No	No	No
Gong et al., 2021	Abusive Language Detection in Heterogeneous Contexts: Dataset Collection and the Role of Supervised Attention	N/A	Gender and sexuality as target	YouTube	English	Text	Keywords	2017	17 psychology students, incl. 3 graduate students studying bullying and related phenomena	No	No	No
Grosz and Conde-Cespedes, 2020	Automatic Detection of Sexist Statements Commonly Used at the Workplace	https://github.com/dylangrosz/Automatic_Detection_of_Sexist_Statements_Commonly_Used_at_the_Workplace	Sexism	Twitter, work-related quotes, press quotes, faculty/student submissions	English	Text	Keywords	Unknown	Authors: 1 male, 1 female	No	No	No
Guellil et al., 2021	Sexism detection: The first corpus in Algerian dialect with a code-switching in Arabic/ French and English	N/A	Sexism	YouTube	Arabic (Algerian)	Text	Keywords and manually selected video IDs	Feb. - Mar. 2019	3 Algerian Arabic speakers	No	No	No
Guest et al., 2021	An Expert Annotated Dataset for the Detection of Online Misogyny	https://github.com/ellamguest/online-misogyny-eacl2021	Misogyny	Reddit	English	Text	Targeted subreddits	Feb. - May 2020	6 annotators trained (by the authors) to identify misogynistic content	No	No	Yes
Hewitt et al., 2016	The problem of identifying misogynist language on Twitter (and other online social spaces)	N/A	Misogyny	Twitter	English	Text	Keywords	Unknown	1 researcher	No	No	No
Höfels et al., 2022	CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets	https://aclanthology.org/2022.lrec-1.243/	Sexism	Twitter	Romanian	Text	Keywords	May - Sept. 2021	10: 7 female, 3 male students in Languages and Literature and Modern Applied Languages 'with an interest/knowledge in gender studies'	No	Yes	Partial
Ibrohim et al., 2019	Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter	https://github.com/okkyibrohim/id-multi-label-hate-speech-and-abusive-language-detection	Hate based on gender as category	Twitter	Indonesian	Text	Previous datasets + keywords	Mar. - Oct. 2018 + old data	30 from different backgrounds. 3 per item.	No	No	No
Jha and Mamidi, 2017	When does a compliment become sexist? Analysis and classification of ambivalent sexism using Twitter data	https://github.com/AkshitaJha/NLP_CSS_2017	Sexism	Twitter	English	Text	Keywords	Unknown	Authors + 3 23-year old non-activist feminists	No	No	No
Jiang et al., 2022	SWSR: A Chinese dataset and lexicon for online sexism detection	https://zenodo.org/record/4773875#.Y5DTMYLP3ao	Sexism	Sina Weibo	Chinese	Text	Keywords	June 2015 - June 2020	3: 2 female and one male PHD students	No	No	No
Jeong et al., 2022	KOLD: Korean Offensive Language Dataset	https://github.com/boychaboy/KOLD	Gender and sexual orientation as target	NAVER news, YouTube	Korean	Text	Keywords	Mar. 2020 - Mar. 2022	3,124 crowdworkers	Yes	No	No
Kennedy et al., 2020	Constructing interval variables via faceted Rasch measurement and multitask deep learning: a hate speech application	https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech	Gender identity as target, sexist speech	Twitter, Reddit, YouTube	English	Text	Stratified sampling with identity relevance and hate speech hypothesis scores	Mar. - Aug. 2019	7,912 crowdworkers	No	Yes	No
Kennedy et al., 2022	Introducing the Gab Hate Corpus: defining and applying hate-based rhetoric to social media posts at scale	https://osf.io/edua3/	Gender identity as target	Gab	English	Text	Targeted data source (Gab)	Jan. - Oct. 2018	Min. 3 per item: undergraduate research assistants	No	No	No
Kirk et al., 2023	SemEval-2023 Task 10: Explainable Detection of Online Sexism	https://github.com/rewire-online/edos	Sexism	Gab, Reddit	English	Text	Targeted data sources	Aug. 2016 Oct. 2018	19 women	No	Yes	Yes
Kumar et al. 2018	Aggression-annotated Corpus of Hindi-English Code-mixed Data	https://github.com/kraiyani/Facebook-Post-Aggression-Identification	Gendered aggression	Facebook, Twitter	Hindi-English	Text	Targeted pages, hashtags	Unknown	4 PhD Linguistics students	No	No	No
Kwarteng et al., 2022	Misogynoir: challenges in detecting intersectional hate	https://github.com/kwartengj/Snam2022	Misogynoir (misogyny aimed at Black women)	Twitter	English	Text	Unknown	Unknown	3	No	No	No
Lee et al., 2022	K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment	https://github.com/adlnlp/K-MHaS	Gender or sexual orientation as category	Korean entertainment news aggregation platform, Korean News Comments	Korean	Text	Random	Jan. 2018 - June 2020	5 Korean speakers	No	No	Partial
Leite et al., 2020	Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis	https://github.com/JAugusto97/ToLD-Br	Misogyny	Twitter	Portuguese (Brazilian)	Text	Keywords, hashtags, targeted users	July - Aug. 2019	42 volunteers at a university; 3 per item	No	Yes	Yes
Lynn et al., 2019	Urban Dictionary definitions dataset for misogyny speech detection	https://ieee-dataport.org/documents/urban-dictionary-definitions-dataset-misogyny-speech-detection	Misogyny	Urban Dictionary	English	Text	Keywords	1999 - 2006	3 independent researchers	No	No	No
Mathew et al., 2021	HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection	https://github.com/hate-alert/HateXplain	Women as target	Gab, Twitter	English	Text	Keywords	Gab: unknown; Twitter: twitter Jan. 2019 - June 2020	253 crowdworkers	No	No	No
Mulki and Ghanem, 2021	Let-Mi: An Arabic Levantine Twitter Dataset for Misogynistic Language	https://github.com/bilalghanem/let-mi	Misogyny	Twitter	Arabic (Levantine)	Text	Targeted journalists' accounts	Oct. - Nov. 2019	3: 1 male and 2 females Levantine speakers	No	No	No
Mollas et al., 2022	ETHOS: a multi-label hate speech detection dataset	https://github.com/intelligence-csd-auth-gr/Ethos-Hate-Speech-Dataset	Gender as category	Reddit, Youtube	English	Text	Automated classification, targeted subreddits	Unknown - Oct. 2017	Crowdworkers - 5 per item	No	No	No
Moon et al., 2020	BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection	https://github.com/kocohub/korean-hate-speech	'Gender bias' as category	NAVER news	Korean	Text	Article views, stratified sampling by Wilson score; random	Jan. 2018 - Feb. 2020	32: 29 crowdworkers, 3 NLP researchers; 3 per item	No	No	No
Ousidhoum et al., 2019	Multilingual and Multi-Aspect Hate Speech Analysis	https://github.com/HKUST-KnowComp/MLMA_hate_speech	Gender as target	Twitter	Arabic, English, French	Text	Keywords	Unknown	Crowdworkers; 3 per item	No	No	No
Petrak and Krenn, 2022	Misogyny classification of German newspaper forum comments	N/A	Misogyny, sexism	Austrian newspaper	German	Text	Unknown	Unknown	8: 7 experienced moderators, 3 male, 5 female	No	No	No
Plaza et al., 2023	Overview of EXIST 2023: sEXism Identification in Social NeTworks	http://nlp.uned.es/exist2023/	Sexism	Gab, Twitter	English, Spanish	Text	Keywords, random	Sept. 2021 - Sept. 2022	6 crowdworkers: 2 social/demographic parameters: gender (male/female), age (18-22/23-45/46+)	No	No	No
de Pelle and Moreira, 2017	Offensive Comments in the Brazilian Web: a dataset and baseline results	https://github.com/rogersdepelle/OffComBR	Sexism as category	Globo news	Portuguese (Brazilian)	Text	Targeted website sections	Unknown	5 volunteers; 3 per item	No	No	No
Rizwan et al., 2020	Hate-Speech and Offensive Language Detection in Roman Urdu	https://github.com/haroonshakeel/roman_urdu_hate_speech	Sexism as category	Twitter	Urdu	Text	Keywords	Unknown	3	No	No	No
Rodríguez-Sánchez et al., 2020	Automatic Classification of Sexism in Social Networks: An Empirical Study on Twitter Data	https://github.com/franciscorodriguez92/MeTwo	Sexism	Twitter	Spanish	Text	Keywords, random	July - Dec. 2018	4	No	No	No
Rodríguez-Sánchez et al., 2021	Overview of EXIST 2021: sEXism Identification in Social neTworks	http://nlp.uned.es/exist2021/	Sexism	Gab, Twitter	English, Spanish	Text	Keywords, hashtags	Twitter: Dec. 2020 Feb. 2021; Gab: Sept. 2016 -Aug. 2019 (Spanish), Aug. 2016 - Aug. 2019 (English)	7: 5 crowdworkers, 2 experts in gender issues (1 man, 1 woman)	No	No	No
Rodríguez-Sánchez et al., 2022	Overview of EXIST 2022: sEXism Identification in Social neTworks	http://nlp.uned.es/exist2022/	Sexism	Gab, Twitter	English, Spanish	Text	Keywords	Jan. 2022	6 experts in gender issues (3 men, 3 women)	No	No	No
Romim et al., 2022	BD-SHS: A Benchmark Dataset for Learning to Detect Online Bangla Hate Speech in Different Social Contexts	https://github.com/naurosromim/hate-speech-dataset-for-Bengali-social-media	Gender as category, male/female as targets	Facebook, TikTok, Youtube	Bangla	Text	Keywords, topics	2017 - unknown	50 students (32 male, 18 female)	No	No	No
Samory et al., 2021	“Call me sexist, but...” : Revisiting Sexism Detection Using Psychological Scales and Adversarial Samples	https://search.gesis.org/research_data/SDN-10.7802-2251?doi=10.7802/2251	Sexism	Twitter	English	Text	Keywords, hashtags	2008 - 2019	5 crowdworkers	No	No	No
Sharifirad and Jacovi, 2019	Learning and Understanding Different Categories of Sexism Using Convolutional Neural Network's Filters	https://github.com/simarad1525/Dataset-to-detect-different-types-of-sexist-language	Sexism	Twitter	English	Text	Hashtags	Unknown	13: 1 male and 12 female non-activists	No	No	No
Sharifirad and Matwin, 2019	When a Tweet is Actually Sexist. A more Comprehensive Classification of Different Online Harassment Categories and The Challenges in NLP	https://github.com/simarad1525/ECML_SIMAH_dataset_competition	Sexism	Twitter	English	Text	Hashtags	Unknown	Crowdworkers	No	No	No
Strathern and Pfeffer, 2022	Identifying Different Layers of Online Misogyny	Forthcoming	Misogyny	Twitter	English	Text	Specific user handle: @realamberheard	2019-2021	2: 1 author, 1 student	No	No	No
Talat, 2016	Are You a Racist or Am I Seeing Things? Annotator Influence on Hate Speech Detection on Twitter	https://github.com/zeeraktalat/hatespeech	Sexism as category	Twitter	English	Text	Hashtags	Unknown	1 expert (feminist and anti-racism activists) + 3 others per item	No	Yes	No
Talat and Hovy, 2016	Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter	https://github.com/zeeraktalat/hatespeech	Sexism as category	Twitter	English	Text	Hashtags	Unknown	2: author + woman studying gender studies, non-activist feminist	No	No	No
Toosi, 2019	Twitter sentiment analysis	https://www.kaggle.com/datasets/arkhoshghalb/twitter-sentiment-analysis-hatred-speech	Sexism	Twitter	English	Text	Unknown	Unknown	Unknown	No	No	No
Vidgen et al., 2021	Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection	https://github.com/bvidgen/Dynamically-Generated-Hate-Speech-Dataset	Gender as target	Synthetically generated social media	English	Text	N/A	Ongoing	Recruited	No	No	Yes
Yadav et al., 2023	LAHM : Large Annotated Dataset for Multi-Domain and Multilingual Hate Speech Identification	N/A	Sexism as category	Twitter	Arabic, English, French, German, Hindi, Spanish	Text	Keywords	Unknown	Unknown	No	No	No
Zeinert et al., 2021	Annotating Online Misogyny	https://huggingface.co/datasets/strombergnlp/bajer_danish_misogyny	Misogyny	Facebook, Twitter, Reddit	Danish	Text	Keywords	Unknown	8 recruited: 6 female, 2 male	No	No	Yes

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

Repository files navigation

GBV-Resources

Contribute to this list

Datasets

About

Releases

Packages

Contributors 2

License

HWU-NLP/GBV-Resources

Folders and files

Latest commit

History

LICENSE

LICENSE

README.md

README.md

Repository files navigation

GBV-Resources

Contribute to this list

Datasets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages