# Analysis Type of Hateful Memes

Hate speech, however, continues to be an important challenge, and multimodal hate speech remains an especially difficult machine learning problem. Hate speech is defined as a direct attack (characterized as violent or dehumanizing speech, harmful stereotypes, statements of inferiority, expressions of contempt, disgust or dismissal, cursing, and calls for exclusion or segregation) against people on the basis of what we call protected characteristics (characterized as race, ethnicity, national origin, disability, religious affiliation, caste, sexual orientation, sex, gender identity and serious disease). We operationalize this definition by making fine-grained labels for protected classes and attack types available as additional annotations on the hateful memes dataset. 


Task A (multi-label): For each meme, detect the protected category. Protected categories are: race, disability, religion, nationality, sex. If the meme is not_hateful the protected category is: pc_empty.

Task B (multi-label): For each meme, detect the attack type. Attack types are: contempt, mocking, inferiority, slurs, exclusion, dehumanizing, inciting_violence. If the meme is not_hateful the protected category is: attack_empty.

- **Protected Category** -> `gold_pc`: *race, disability, religion, nationality, sex*. If the meme is not_hateful the protected category is: *pc_empty*.

- **Attack Type** -> `gold_attack`: *contempt, mocking, inferiority, slurs, exclusion, dehumanizing, inciting_violence*. If the meme is not_hateful the protected category is: *attack_empty*.

In [14]:
import os
import pandas as pd
import numpy as np
import matplotlib as plt
import seaborn as sns

In [4]:
!ls data/annotations

dev_seen.json   dev_unseen.json test.jsonl      train.json


In [26]:
dev_seen = pd.read_json("data/annotations/dev_seen.json", lines = True)
test = pd.read_json("data/annotations/test.jsonl", lines = True)
train = pd.read_json("data/annotations/train.json", lines = True)



In [27]:
train

Unnamed: 0,id,set_name,img,text,gold_hate,gold_pc,gold_attack,pc,attack
0,42953,train,img/42953.png,its their character not their color that matters,[not_hateful],[pc_empty],[attack_empty],,
1,23058,train,img/23058.png,don't be afraid to love again everyone is not ...,[not_hateful],[pc_empty],[attack_empty],,
2,13894,train,img/13894.png,putting bows on your pet,[not_hateful],[pc_empty],[attack_empty],,
3,37408,train,img/37408.png,i love everything and everybody! except for sq...,[not_hateful],[pc_empty],[attack_empty],,
4,82403,train,img/82403.png,"everybody loves chocolate chip cookies, even h...",[not_hateful],[pc_empty],[attack_empty],,
...,...,...,...,...,...,...,...,...,...
8495,10423,train,img/10423.png,nobody wants to hang auschwitz me,[hateful],[religion],[mocking],"[[religion], [religion], [religion]]","[[mocking], [mocking], [mocking]]"
8496,98203,train,img/98203.png,when god grants you a child after 20 years of ...,[hateful],[nationality],[dehumanizing],"[[nationality], [nationality], [religion]]","[[dehumanizing], [inciting_violence], []]"
8497,36947,train,img/36947.png,gays on social media: equality! body positivit...,[hateful],[sex],[exclusion],"[[sex], [sex], [sex]]","[[exclusion], [exclusion], [exclusion]]"
8498,16492,train,img/16492.png,having a bad day? you could be a siamese twin ...,[hateful],"[sex, disability]",[inferiority],"[[sex, disability], [sex, disability], [sex, d...","[[], [inferiority], [inferiority]]"


# Analysis of Protected Category [gold_pc]

In [20]:
test_set_enselble2 = pd.read_csv("test_set_enselble2.csv")
test_unseen = pd.read_json("test_unseen.jsonl", lines = True)