Note: a GPU runtime will speedup the processing significantly

In [None]:
!pip install --quiet transformers datasets evaluate accelerate tqdm pandas numpy torch

You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m


In [None]:
import pandas as pd
import numpy as np
import torch
import re

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
from torch.nn.functional import softmax
from tqdm.autonotebook import tqdm

  from .autonotebook import tqdm as notebook_tqdm


# I. Datasets
We will preprocess the datasets so that we have a column `text` containing the  actual text and a binary column `is_hate_speech` that will signal if the text is hateful or not.

## 1. Manually annotated tweets dataset
Davidson, T., Warmsley, D., Macy, M., & Weber, I. (2017, May). Automated hate speech detection and the problem of offensive language. In Proceedings of the international AAAI conference on web and social media (Vol. 11, No. 1, pp. 512-515)

- [Paper](https://ojs.aaai.org/index.php/ICWSM/article/view/14955/14805)
- [GitHub](https://github.com/t-davidson/hate-speech-and-offensive-language)

In [None]:
df_1 = pd.read_csv("./datasets/1.csv")

df_1.head()

Unnamed: 0.1,Unnamed: 0,count,hate_speech,offensive_language,neither,class,tweet
0,0,3,0,0,3,2,!!! RT @mayasolovely: As a woman you shouldn't...
1,1,3,0,3,0,1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...
2,2,3,0,3,0,1,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...
3,3,3,0,2,1,1,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...
4,4,6,0,6,0,1,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...


`class` = class label for majority of CF users.\
  0 - hate speech\
  1 - offensive  language\
  2 - neither

In [None]:
df_1["is_hate_speech"] = df_1["class"].apply(lambda x: 0 if x == 2 else 1)
df_1 = df_1.rename(columns={"tweet": "text"})

df_1 = df_1[["text", "is_hate_speech"]]

df_1.head()

Unnamed: 0,text,is_hate_speech
0,!!! RT @mayasolovely: As a woman you shouldn't...,0
1,!!!!! RT @mleew17: boy dats cold...tyga dwn ba...,1
2,!!!!!!! RT @UrKindOfBrand Dawg!!!! RT @80sbaby...,1
3,!!!!!!!!! RT @C_G_Anderson: @viva_based she lo...,1
4,!!!!!!!!!!!!! RT @ShenikaRoberts: The shit you...,1


In [None]:
df_1.count()

text              24783
is_hate_speech    24783
dtype: int64

In [None]:
hate_speech_count = df_1[df_1["is_hate_speech"] == 1].shape[0]
hate_speech_perentage = hate_speech_count / df_1.shape[0] * 100

print("Hate speech examples:", hate_speech_count)
print("Hate speech percentage:", hate_speech_perentage)

Hate speech examples: 20620
Hate speech percentage: 83.20219505306056


## 2. Hate speech intervention datasets from Gab and Reddit
Qian, J., Bethke, A., Liu, Y., Belding, E., & Wang, W. Y. (2019). A benchmark dataset for learning to intervene in online hate speech. arXiv preprint arXiv:1909.04251

Manually labeled dataset.

- [Paper](https://arxiv.org/pdf/1909.04251)
- [GitHub](https://github.com/jing-qian/A-Benchmark-Dataset-for-Learning-to-Intervene-in-Online-Hate-Speech)

In [None]:
df_2_gab = pd.read_csv("./datasets/2 - gab.csv")
df_2_reddit = pd.read_csv("./datasets/2 - reddit.csv")

In [None]:
df_2_gab.head()

Unnamed: 0,id,text,hate_speech_idx,response
0,1. 39869714\n,1. i joined gab to remind myself how retarded ...,[1],"[""Using words that insult one group while defe..."
1,1. 39845588\n2. \t39848775\n3. \t\t39911017\n,1. This is what the left is really scared of. ...,[3],['You can disagree with someones opinion witho...
2,1. 37485560\n2. \t37528625\n,1. It makes you an asshole.\n2. \tGive it to a...,[2],['Your argument is more rational if you leave ...
3,1. 39787626\n2. \t39794481\n,1. So they manage to provide a whole lot of da...,[2],"[""You shouldn't generalize a specific group or..."
4,1. 37957930\n2. \t39953348\n3. \t\t39965219\n,"1. Hi there, i,m Keith, i hope you are doing w...",[3],['If someone is rude it is better to ignore th...


In [None]:
df_2_reddit.head()

Unnamed: 0,id,text,hate_speech_idx,response
0,1. e8q18lf\n2. \te8q9w5s\n3. \t\te8qbobk\n4. \...,1. A subsection of retarded Hungarians? Ohh bo...,[1],"[""I don't see a reason why it's okay to insult..."
1,1. e9c6naz\n2. \te9d03a5\n3. \t\te9d8e4d\n,"1. > ""y'all hear sumn?"" by all means I live i...",[3],['Persons with disabilities is the accepted te...
2,1. e84rl2i\n2. \te84w60l\n3. \t\te8544rn\n4. \...,1. wouldn't the defenders or whatever they are...,,
3,1. e7kq72n\n2. \te7m24ar\n,1. Because the Japanese aren't retarded and kn...,[1],"[""It's not right for anyone of any gender to b..."
4,1. e7hdgoh\n2. \te7iyj6a\n3. \t\te7j6iho\n4. \...,1. That might be true if we didn't have an exa...,"[2, 3]","[""You shouldn't be bringing up sensitive topic..."


In [None]:
def preprocess_second_dataset(df):
  df = df.drop(columns=["id", "response"])
  df = df.dropna(subset=["hate_speech_idx"])
  df["hate_speech_idx"] = df["hate_speech_idx"].replace("n/a", "[]")

  new_df_rows = []
  for index, row in df.iterrows():
    text = row["text"].strip()
    hate_lines = eval(row["hate_speech_idx"])

    lines = re.split(r'\b\d+\.\s*', text)
    lines = [line.strip() for line in lines if line.strip()]
    for index, line in enumerate(lines):
      is_hate_speech = (index + 1) in hate_lines
      new_df_rows.append({"text": line, "is_hate_speech": is_hate_speech})

  return pd.DataFrame(new_df_rows)

In [None]:
df_2_gab = preprocess_second_dataset(df_2_gab)
df_2_gab

Unnamed: 0,text,is_hate_speech
0,i joined gab to remind myself how retarded jew...,True
1,This is what the left is really scared of. ...,False
2,That literally looks like a monkey. Why are we...,False
3,Dumb Cunt,True
4,It makes you an asshole.,False
...,...,...
31728,fucking retard,True
31729,Twitter BANNED me and reported me to the FBI f...,False
31730,"I was routinely suspended for saying fuck, and...",True
31731,They got me for dropping the cunt bomb... a lot.,True


In [None]:
df_2_gab.count()

text              31733
is_hate_speech    31733
dtype: int64

In [None]:
hate_speech_count = df_2_gab[df_2_gab["is_hate_speech"] == 1].shape[0]
hate_speech_perentage = hate_speech_count / df_2_gab.shape[0] * 100

print("Hate speech examples:", hate_speech_count)
print("Hate speech percentage:", hate_speech_perentage)

Hate speech examples: 13999
Hate speech percentage: 44.114959190747804


In [None]:
df_2_reddit = preprocess_second_dataset(df_2_reddit)
df_2_reddit

Unnamed: 0,text,is_hate_speech
0,A subsection of retarded Hungarians? Ohh boy. ...,True
1,Hiii. Just got off work. 444 is mainly the typ...,False
2,wow i guess soyboys are the same in every country,False
3,Owen Benjamin's soyboy song goes for every cou...,False
4,"> ""y'all hear sumn?"" by all means I live in a...",False
...,...,...
17450,"OP, stop being a faggot and post videos next t...",True
17451,"In this 20 minute long video, Top Hate and Cha...",False
17452,"No clue whos these e-celebs are, but at this p...",True
17453,"I didn’t insult you, why would you insult me?",False


In [None]:
df_2_reddit.count()

text              17455
is_hate_speech    17455
dtype: int64

In [None]:
hate_speech_count = df_2_reddit[df_2_reddit["is_hate_speech"] == 1].shape[0]
hate_speech_perentage = hate_speech_count / df_2_reddit.shape[0] * 100

print("Hate speech examples:", hate_speech_count)
print("Hate speech percentage:", hate_speech_perentage)

Hate speech examples: 5257
Hate speech percentage: 30.117444858206817


## 3. CAD: the Contextual Abuse Dataset (Reddit)
Vidgen, B., Nguyen, D., Margetts, H., Rossini, P., & Tromble, R. (2021). Introducing CAD: the contextual abuse dataset

- [Paper](https://aclanthology.org/2021.naacl-main.182.pdf)
- [GitHub](https://github.com/dongpng/cad_naacl2021)

In [None]:
df_3 = pd.read_csv("./datasets/3.tsv", sep='\t')

df_3

Unnamed: 0,id,info_id,info_subreddit,info_subreddit_id,info_id.parent,info_id.link,info_thread.id,info_order,info_image.saved,annotation_Primary,...,annotation_Target_top.level.category,annotation_highlighted,meta_author,meta_created_utc,meta_date,meta_day,meta_permalink,split,subreddit_seen,meta_text
0,cad_1,alywla-post,Drama,t5_2rd2l,,,2,02-post,0,Neutral,...,,,RedGT2033,1548999908,2019-02-01T05:45:08Z,2019-02-01T00:00:00Z,/r/Drama/comments/alywla/centrist_daddy_gets_d...,exclude_empty,1,
1,cad_2,am027u-post,conspiracy,t5_2qh4r,,,3,03-post,0,Neutral,...,,,G0LD3NDAWN,1549010283,2019-02-01T08:38:03Z,2019-02-01T00:00:00Z,/r/conspiracy/comments/am027u/what_are_your_op...,test,0,. I just watched a 4 hour long disclosure buff...
2,cad_3,am80hq-post,subredditcancer,t5_2yv5q,,,5,05-post,0,Neutral,...,,,SpecialThrowaway6,1549062528,2019-02-01T23:08:48Z,2019-02-01T00:00:00Z,/r/subredditcancer/comments/am80hq/banned_and_...,exclude_empty,1,
3,cad_4,amcs27-post,Drama,t5_2rd2l,,,9,09-post,0,Neutral,...,,,[deleted],1549101640,2019-02-02T10:00:40Z,2019-02-02T00:00:00Z,/r/Drama/comments/amcs27/at_rworldnews_10_hour...,exclude_empty,1,
4,cad_5,aml76e-post,Drama,t5_2rd2l,,,12,12-post,0,Neutral,...,,,KristenLuvsCATS,1549159930,2019-02-03T02:12:10Z,2019-02-03T00:00:00Z,/r/Drama/comments/aml76e/guy_gets_in_trouble_w...,exclude_empty,1,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
27489,cad_27490,byw36e-title,CCJ2,t5_32cg1,,,1113,1113-title,0,IdentityDirectedAbuse,...,"ethnicity, gender",drinking too much hot water from your rainies?,daytradekorea,1560161607,2020-06-10T00:01:15Z,2020-06-10T00:00:00Z,/r/CCJ2/comments/byw36e/drinking_too_much_hot_...,test,0,Drinking too much hot water from your Rainies?
27490,cad_27491,au160j-title,ImGoingToHellForThis,t5_2s7yq,,,210,210-title,imageNotSaved,IdentityDirectedAbuse,...,"religion, gender",night ninjas,McBeefer69,1550962092,2020-02-23T00:01:15Z,2020-02-23T00:00:00Z,/r/ImGoingToHellForThis/comments/au160j/night_...,dev,0,Night ninjas
27491,cad_27492,b63iy8-title,conspiracy,t5_2qh4r,,,499,499-title,0,Neutral,...,,,BeezelyBillyBub,1553687505,2020-03-27T00:01:15Z,2020-03-27T00:00:00Z,/r/conspiracy/comments/b63iy8/the_decline_of_e...,test,0,The Decline of Euro/American Males and The Ris...
27492,cad_27493,avrmje-title,ShitPoliticsSays,t5_2vcl0,,,241,241-title,0,PersonDirectedAbuse,...,,r/environment thinks feinstein shouldn't hold ...,rpoliticssucksass,1551363574,2019-02-28T14:19:34Z,2019-02-28T00:00:00Z,/r/ShitPoliticsSays/comments/avrmje/renvironme...,train,1,r/environment thinks Feinstein shouldn't hold ...


In [None]:
hate_values = ["IdentityDirectedAbuse", "AffiliationDirectedAbuse", "PersonDirectedAbuse"]
non_hate_values = ["Slur", "CounterSpeech", "Neutral"]

df_3["annotation_Primary"].unique()

array(['Neutral', 'AffiliationDirectedAbuse', 'Slur',
       'PersonDirectedAbuse', 'IdentityDirectedAbuse', 'CounterSpeech'],
      dtype=object)

In [None]:
df_3["is_hate_speech"] = df_3["annotation_Primary"].apply(lambda x: 0 if x in non_hate_values else 1)
df_3 = df_3.rename(columns={"meta_text": "text"})

df_3 = df_3.dropna(subset=["text"])

df_3 = df_3[["text", "is_hate_speech"]]

df_3

Unnamed: 0,text,is_hate_speech
1,. I just watched a 4 hour long disclosure buff...,0
9,उसके लिए हमे शत शत अभारी है [linebreak] [lin...,0
12,I just got laid off. I don't even know what to...,0
16,Security has been ever increasing on the south...,0
19,"My best friend, who I grew up with and had a H...",0
...,...,...
27489,Drinking too much hot water from your Rainies?,1
27490,Night ninjas,1
27491,The Decline of Euro/American Males and The Ris...,0
27492,r/environment thinks Feinstein shouldn't hold ...,1


In [None]:
df_3.count()

text              26329
is_hate_speech    26329
dtype: int64

In [None]:
hate_speech_count = df_3[df_3["is_hate_speech"] == 1].shape[0]
hate_speech_perentage = hate_speech_count / df_3.shape[0] * 100

print("Hate speech examples:", hate_speech_count)
print("Hate speech percentage:", hate_speech_perentage)

Hate speech examples: 5062
Hate speech percentage: 19.22594857381594


# II. Models

## 1. HateXplain
Finetuned BERT on the HateXplain dataset.

[Paper](https://arxiv.org/pdf/2012.10289)

In [None]:
torch.device("cuda" if torch.cuda.is_available() else "cpu")

device(type='cuda')

In [None]:
### from models.py
from models import *

def hateXplain(df):
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

  tokenizer = AutoTokenizer.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
  model = Model_Rational_Label.from_pretrained("Hate-speech-CNERG/bert-base-uncased-hatexplain-rationale-two")
  model.to(device)

  total_samples = len(df)
  correct_predictions = 0

  pbar = tqdm(total=total_samples, desc="Processing")

  for index, row in df.iterrows():
    text = row['text']
    label = row['is_hate_speech']

    inputs = tokenizer(text, truncation=True, max_length=512, return_tensors="pt")
    inputs.to(device)

    with torch.no_grad():
      prediction_logits, _ = model(input_ids=inputs['input_ids'], attention_mask=inputs['attention_mask'])
      probabilities = softmax(prediction_logits, dim=1)

    prob_list = probabilities.squeeze().tolist()
    prob_hate_speech = prob_list[1]

    model_prediction = 1 if prob_hate_speech > 0.5 else 0

    if model_prediction == label:
          correct_predictions += 1

    pbar.set_postfix({'Correct Predictions': correct_predictions})
    pbar.update(1)

  pbar.close()

  accuracy = correct_predictions / total_samples
  print(f"Accuracy: {accuracy * 100:.2f}%")

### a) Manually annotated tweets dataset

In [None]:
hateXplain(df_1)

Processing: 100%|█| 24783/24783 [04:05<00:00, 100.95it/s, Correct Predictions=13

Accuracy: 53.29%





### b) Hate speech intervention datasets from Gab and Reddit

In [None]:
hateXplain(df_2_gab)

Processing: 100%|█| 31733/31733 [05:17<00:00, 99.88it/s, Correct Predictions=265

Accuracy: 83.79%





In [None]:
hateXplain(df_2_reddit)

Processing: 100%|█| 17455/17455 [03:04<00:00, 94.36it/s, Correct Predictions=149

Accuracy: 85.60%





### c) CAD: the Contextual Abuse Dataset (Reddit)

In [None]:
hateXplain(df_3)

Processing: 100%|█| 26329/26329 [04:29<00:00, 97.82it/s, Correct Predictions=221

Accuracy: 84.10%





## 2. RoBERTa
Round 4 trained RoBERTa.

[Paper](https://arxiv.org/pdf/2012.15761)

In [None]:
def roBERTa(df):
  device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

  tokenizer = AutoTokenizer.from_pretrained("facebook/roberta-hate-speech-dynabench-r4-target")
  model = AutoModelForSequenceClassification.from_pretrained("facebook/roberta-hate-speech-dynabench-r4-target")
  model.to(device)

  total_samples = len(df)
  correct_predictions = 0

  pbar = tqdm(total=total_samples, desc="Processing")

  for index, row in df.iterrows():
    text = row['text']
    label = row['is_hate_speech']

    inputs = tokenizer(text, truncation=True, max_length=512, return_tensors="pt")
    inputs.to(device)

    with torch.no_grad():
        outputs = model(**inputs)

    predicted_class = torch.argmax(outputs.logits).item()
    model_prediction = model.config.id2label[predicted_class]
    # confidence_score = torch.softmax(outputs.logits, dim=1).tolist()[0][predicted_class]

    model_prediction = 1 if model_prediction == 'hate' else 0

    if model_prediction == label:
        correct_predictions += 1

    pbar.set_postfix({'Correct Predictions': correct_predictions})
    pbar.update(1)

  pbar.close()

  accuracy = correct_predictions / total_samples
  print(f"Accuracy: {accuracy * 100:.2f}%")

### a) Manually annotated tweets dataset

In [None]:
roBERTa(df_1)

Processing: 100%|█| 24783/24783 [04:06<00:00, 100.69it/s, Correct Predictions=23


Accuracy: 93.12%


### b) Hate speech intervention datasets from Gab and Reddit

In [None]:
roBERTa(df_2_gab)

Processing: 100%|█| 31733/31733 [05:18<00:00, 99.71it/s, Correct Predictions=259

Accuracy: 81.82%





In [None]:
roBERTa(df_2_reddit)

Processing: 100%|█| 17455/17455 [03:03<00:00, 95.37it/s, Correct Predictions=143

Accuracy: 82.47%





### c) CAD: the Contextual Abuse Dataset (Reddit)

In [None]:
roBERTa(df_3)

Processing: 100%|█| 26329/26329 [04:29<00:00, 97.68it/s, Correct Predictions=217

Accuracy: 82.70%



