# Introduction to Natural Language Processing 2 Lab04

## Introduction

We want to sell a moderation API tackling toxic content on Twitter. We find a collection of tweets labeled on [HuggingFace](https://huggingface.co/datasets/tweet_eval).  
We want to train a model to predict the toxicity of a tweet. Two datasets seem close to our needs: `hate` and `offensive`.

We will use the `hate` dataset due to its greatest toxicity. The moderation we need here is to detect some type of high toxicity firstly instead of offensive language.

## Load the dataset


In [1]:
from datasets import load_dataset
dataset = load_dataset('tweet_eval', 'hate')

Reusing dataset tweet_eval (/home/leme/.cache/huggingface/datasets/tweet_eval/hate/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)
100%|██████████| 3/3 [00:00<00:00, 103.38it/s]


## Evaluating the dataset

In [2]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 9000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 2970
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1000
    })
})

The dataset is composed of 3 splits: `train`, `test` and `validation`.  
The `train` split is composed of **9,000** tweets.  
The `test` split is composed of **2,970** tweets.  
The `validation` split is composed of **1,000** tweets.  


Each split is composed of two features: `text` and `label`.


In [3]:
print("Number of non hate tweets in each split:")
print(dataset.filter(lambda split: split['label'] == 0).num_rows)
print("Number of hate tweets in each split:")
print(dataset.filter(lambda split: split['label'] == 1).num_rows)

Loading cached processed dataset at /home/leme/.cache/huggingface/datasets/tweet_eval/hate/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-64a2f09d35003d57.arrow
Loading cached processed dataset at /home/leme/.cache/huggingface/datasets/tweet_eval/hate/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-5a71c3687b1e1ece.arrow
Loading cached processed dataset at /home/leme/.cache/huggingface/datasets/tweet_eval/hate/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-6cb6426c5dba15f5.arrow
Loading cached processed dataset at /home/leme/.cache/huggingface/datasets/tweet_eval/hate/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-9cc66ca5333ae6aa.arrow
Loading cached processed dataset at /home/leme/.cache/huggingface/datasets/tweet_eval/hate/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343/cache-c31554a090f8450d.arrow
Loading cached processed dataset at /home/lem

Number of non hate tweets in each split:
{'train': 5217, 'test': 1718, 'validation': 573}
Number of hate tweets in each split:
{'train': 3783, 'test': 1252, 'validation': 427}


We can observe that the labels are binary: `0` for non-hate tweets and `1` for hate tweets.  
We can observe that the dataset is unbalanced: there are more non-hate tweets than hate tweets.

Let's see some examples of tweets and their labels.

In [4]:
texts, labels = dataset['train'][:10].items()
for text, label in zip(texts[1], labels[1]):
    print(f"Label: {label} | {text}")

Label: 0 | @user nice new signage. Are you not concerned by Beatlemania -style hysterical crowds crongregating on you…
Label: 1 | A woman who you fucked multiple times saying yo dick small is a compliment you know u hit that spot 😎
Label: 1 | @user @user real talk do you have eyes or were they gouged out by a rapefugee?
Label: 1 | your girlfriend lookin at me like a groupie in this bitch!
Label: 0 | Hysterical woman like @user
Label: 0 | Me flirting- So tell me about your father...
Label: 0 | The Philippine Catholic bishops' work for migrant workers should focus on families who are "paying the great...
Label: 1 | I AM NOT GOING AFTER YOUR EX BF YOU LIEING SACK OF SHIT ! I'm done with you dude that's why I dumped your ass cause your a lieing 😂😡 bitch
Label: 0 | When cuffin season is finally over
Label: 1 | Send home migrants not in need of protection, Peter Dutton tells UN, HEY DUTTON HOW ABOUT THE ONES THAT HAVE STAYED AND NOT LEFT THE COUNTRY WHEN THEY SHOULD OVERSTAYERS ? WHY DONT YO

In most of the hate tweets, we can observe some juron, insult and vulgarity. We can also observe that words in capital letters are used to emphasize the hate.

Now let's use [BERTopic](https://github.com/MaartenGr/BERTopic) to extract the topics within the data, and the main topics within each class.

In [5]:
! pip install bertopic

Defaulting to user installation because normal site-packages is not writeable


In [6]:
from bertopic import BERTopic
from umap import UMAP

umap_model = UMAP(random_state=42)
topic_model = BERTopic(umap_model=umap_model, embedding_model="all-MiniLM-L6-v2")

2022-11-23 09:34:46.504286: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-11-23 09:34:46.504313: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


Moving 0 files to the new cache system


0it [00:00, ?it/s]


In [7]:
topics, _ = topic_model.fit_transform(dataset['train']['text'])

In [8]:
topic_model.visualize_topics()

In [9]:
topic_model.visualize_barchart()

In [10]:
topic_model.get_topic_info()

Unnamed: 0,Topic,Count,Name
0,-1,1942,-1_you_bitch_women_your
1,0,4433,0_the_to_of_in
2,1,349,1_bitch_cunt_user_you
3,2,273,2_rape_women_woman_user
4,3,112,3_men_all_not_women
5,4,106,4_hoe_hoes_ho_you
6,5,99,5_bitch_whore_shit_stupid
7,6,87,6_dick_my_bitches_you
8,7,84,7_skank_you_user_re
9,8,75,8_me_when_someone_ever


TODO : What do you think about the results? How do you think it could impact a model trained on these data?

## Evaluate a model

We wanted to use a RoBERTa model on our dataset, but RoBERTa has been train on 2019 data, which do not include any tweeet.  
Fortunately, we can found a RoBERTa model trained on the same dataset as our dataset [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-hate).


In [11]:
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer
import numpy as np
from scipy.special import softmax
import csv
import torch
from sklearn.metrics import precision_score, f1_score, recall_score, accuracy_score
from tqdm import tqdm
import urllib.request

In [12]:
# Preprocess text (username and link placeholders)
def preprocess(text):
    new_text = []
    for t in text.split(" "):
        t = '@user' if t.startswith('@') and len(t) > 1 else t
        t = 'http' if t.startswith('http') else t
        new_text.append(t)
    return " ".join(new_text)

In [13]:
MODEL = f"cardiffnlp/twitter-roberta-base-hate"

tokenizer = AutoTokenizer.from_pretrained(MODEL)

In [14]:
# download label mapping
labels=[]
mapping_link = f"https://raw.githubusercontent.com/cardiffnlp/tweeteval/main/datasets/hate/mapping.txt"
with urllib.request.urlopen(mapping_link) as f:
    html = f.read().decode('utf-8').split("\n")
    csvreader = csv.reader(html, delimiter='\t')
labels = [row[1] for row in csvreader if len(row) > 1]

In [15]:
# PT
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
#model = TFAutoModelForSequenceClassification.from_pretrained(MODEL)
model.save_pretrained(MODEL)

Now let's evaluate the model on the `test` split.

In [16]:
with torch.no_grad():
    y_true = dataset['test']['label']
    y_pred = []
    for tweet in tqdm(dataset['test']['text']):
        inputs = tokenizer(preprocess(tweet), return_tensors="pt")
        outputs = model(**inputs)
        logits = outputs[0]
        probs = softmax(logits.numpy(), axis=1)
        y_pred.append(np.argmax(probs))
    print(f"Precision: {precision_score(y_true, y_pred, average='macro')}")
    print(f"Recall: {recall_score(y_true, y_pred, average='macro')}")
    print(f"F1: {f1_score(y_true, y_pred, average='macro')}")
    print(f"Accuracy: {accuracy_score(y_true, y_pred)}")
    

100%|██████████| 2970/2970 [08:17<00:00,  5.98it/s]

Precision: 0.6944830293835869
Recall: 0.6271265160841606
F1: 0.5547114323640363
Accuracy: 0.5767676767676768





We can observe that the model is not very good. It has a precision of 0.69, a recall of 0.62 and an F1-score of 0.55.

Now let's extract the top 50 tweets the model is most confident about being hate tweets, the top 50 in the neutral class, and the top 50 the model is most uncertain about.

In [17]:
# Extract top 50 tweets the model is most confident about being hate tweets

tweets_hate_rate = {}
with torch.no_grad():
    for tweet in tqdm(dataset['test']['text']):
        inputs = tokenizer(preprocess(tweet), return_tensors="pt")
        outputs = model(**inputs)
        logits = outputs[0]
        probs = softmax(logits.numpy(), axis=1)
        tweets_hate_rate[tweet] = probs[0][1]
        

100%|██████████| 2970/2970 [06:43<00:00,  7.35it/s]


In [18]:
top_50_best_rate = sorted(tweets_hate_rate.items(), key=lambda x: x[1], reverse=True)[:50]
top_50_worst_rate = sorted(tweets_hate_rate.items(), key=lambda x: x[1], reverse=False)[:50]
top_50_mid_rate = sorted(tweets_hate_rate.items(), key=lambda x: x[1], reverse=False)[len(tweets_hate_rate)//2-25:len(tweets_hate_rate)//2+25]

In [19]:
import pandas as pd
pd.set_option("display.max_colwidth", None) 
pd.DataFrame(top_50_best_rate, columns=['Tweet', 'Hate Rate'])

Unnamed: 0,Tweet,Hate Rate
0,"@user @user @user @user Good shut it all down let's get this wall built, if was up to me illegals would leave this country by Catapulting not ICE. #DeportByCatapulting #StopOpenBorders #BuildThatWall #MAGA",0.989568
1,"Think about that folks, keep them out of USA @user @user #BuildTheWall #LockHerUp #EndDACA",0.989146
2,Mollie Tibbetts And The Low Illegal Crime Rate Lie - American Thinker #MollieTibbetts #BuildTheDamnWall #BuildTheWall #BuildTheWallNow #BuildThatWall #MAGA #RedNationRising #KeepAmericaGreat #KAG2020 #KAG TW502,0.988874
3,@user @user @user @user @user @user #BuildTheWallNow @user @user BEFORE the Midterm elections #BuildTheDamnWall #EndChainMigration #MandateEverify #NoAmnesty #KeepPheonixPromise National Security,0.988597
4,#DeportThemAll The only money any illegals should recieve should be a ticket back to where they are from! Nothing More!#EndIllegalImmigration #EndDACA#EndChainMigration #NoAmnesty#EndVisaLottery #EndSanctuaryCities #NoWelfareForImmigrants#AmericaFirst #BuildTheWall,0.988451
5,They just keep on coming. Build that damn wall and deport. Please! @user #BuildThatWall,0.988386
6,"@user @user Absolutely! No DACA, No Amnesty, No Sanctury Cities, Deport All Illegal Aliens, Build The Wall! #MAGA #2A #NoDACA #DeportThemAll #BuildTheWall #OBAMAgate #StandWithICE #PEDOgate #LockThemAllUp #BanIslam #TRUMP2020 @user @user",0.98837
7,@user Yes so we need to BUILD The Wall! Blood is on your hands evertime an MS13 gang member or DUI illegal kills an American! #BuildTheWall #FundTheWall #EndVisaLottery #EnactEVerify #EndChainMigration 183 billion yearly taxpayers pay for illegals! Go home!,0.988178
8,@user #BuildTheDamnWall #BuildThatDamnWall #BuildThatWall #BuildTheWall4MollieTibbetts #BuildTheWallNow Need I say more? Solves ALL the problems!!,0.987793
9,@user Its time to give hunting licenses to border homes and make it year round open season on this invading force! Free guns and ammo to those willing to defend our borders!!! #BuildThatWall #BuildThatDamnWallNow,0.987752


In [20]:
pd.DataFrame(top_50_mid_rate, columns=['Tweet', 'Hate Rate'])

Unnamed: 0,Tweet,Hate Rate
0,"America, weve been INVADED! This is not immigration, it is an invasion-thousands now entrenched & prepared to cause mayhem, street battles far worse than gang warfare. DEMS have been highjacked by communists/socialists/subversives! #MAGA #IllegalAliens @user @user",0.850301
1,"@user You ASAP yams built body Bitch, wasn't I your body goals a few weeks ago? Fuck outta here orangutang titty flopping hoe.",0.850506
2,"So illegal alien tweets are blocked by twitter because its considered hate speech #illegalalien I actually would love for all the #illegalaliens to get legal status and become actual citizens. I welcome all people to enjoy our great country, legally.",0.851045
3,"@user @user cnn the CORRUPT NEWS NETWORK #1 in fake news, enemy of the people, lowest in ratings and falling. Pathetic. #MAGA #BUILDTHEWALL #NODACA",0.851342
4,I talk a lot of shit about hoes but Fr tho fellas why would you want to be broke fuckin wit a female you can't even get the bitch no fries 🍟,0.851521
5,"@user @user @user @user @user @user #VOTE #GOP IN #DELAWARE @user @user @user #RESEARCH #Delaware #COONS #Carney #illegals #gangs,#wilmington #delaware #1 #CrimeTown look at fact #noSanctuary #nodaca #dems #drive #DELAWARE #BANKRUPT!#GANGS #DRUGS @user @user TELL THE #TRUTH",0.85168
6,Fucking cheating bitch whore.,0.851925
7,No Filter Ill Smack The Shit outta You that Bitch the next Bitch && Whoever Tf Else,0.851955
8,#IllegalAliens are in the USA illegally and they are alien to our soil. Hence the term #IllegalAlien,0.852239
9,@user @user #ILLEGALS aren't kept in cages you fake Bishop! THAT pic was a #PublicityStunt at a whacko #Soros funded #Protest Fool! #ILLEGAL #FamiliesBelongTogether in #Mexico ! #DeportThemAll EVERY #Illegal is a criminal! #VoteRed 4 #AmericanDreamers AND #AmericanFamiliesFirst,0.85229


In [21]:
pd.DataFrame(top_50_worst_rate, columns=['Tweet', 'Hate Rate'])

Unnamed: 0,Tweet,Hate Rate
0,The latest Pundits' Post! w/ articles tweeted by @user @user @user #flipboard10fortoday #enddaca,0.008872
1,How did people respond to @user tweets today?Friday 07 Sep 2018 16:00:26 UTCTop hashtags:#fridayfeeiing#buildthatwall#fridaymotivation#retweet#maga#nikeMost influential accounts:@adventuregirl@WeSupport45@DailyCaller@MichaelRapaport@LouiseMensch,0.01209
2,Very excited that @user is sponsoring BITCH! @user @user #womendirect #ALOT!,0.015839
3,Only 14 more days to #BackToSchool #SendThemBack,0.021851
4,"Fake News: Iconic Crying Migrant Girl Never Separated from Mother, Says Father via @user @user @user",0.022212
5,"Whenever HP is on I have this sick need to watch it & bitch the whole time. Scene that grind my gears: ""HARRY DID U PUT UR NAME IN GOF!?""📢",0.023171
6,THE WHOLE #LUSTFORLIFE ALBUM IS JUST LIKE BITCH STOP ITS STILL 10AM HERE AND IM ALREADY ROLLING ON THE FLOOR AND AH…,0.023525
7,"37 seconds ago I was tearing up over how overworked 1D was, i've snapped out of it! im a whole new bitch! MOVING AL…",0.023668
8,@user Bitch😭😭😭 dude I didn't even bother to read the whole sentence😕 I almost went to Instagram to check😑,0.023793
9,I preordered 7 wings concept book and they all came without the lenticular i was a depressed bitch for a whole week,0.023833


In [22]:
mean_hate_rate = np.mean([x[1] for x in tweets_hate_rate.items()])
print("The mean of hate rate is:", mean_hate_rate, '%')

The mean of hate rate is: 0.7372107 %


We can observe that the model is returning a lot of hate tweets. Or we saw earlier that the dataset is unbalanced, with a majority of non-hate tweets.  
That seems to be the reason why the model is not very good.

In [23]:
wrong_examples = []
nb_examples = 5
with torch.no_grad():
    y_true = dataset['test']['label']
    for tweet in tqdm(dataset['test']['text']):
        if (len(wrong_examples) == nb_examples):
            break
        inputs = tokenizer(preprocess(tweet), return_tensors="pt")
        outputs = model(**inputs)
        logits = outputs[0]
        probs = softmax(logits.numpy(), axis=1)
        if np.argmax(probs) != y_true[dataset['test']['text'].index(tweet)]:
            wrong_examples.append((tweet, np.argmax(probs), y_true[dataset['test']['text'].index(tweet)]))
            
for example in wrong_examples:
    print("Tweet:", example[0])
    print("Predicted label:", labels[example[1]])
    print("True label:", labels[example[2]])
    print("")

  0%|          | 11/2970 [00:01<08:28,  5.82it/s]

Tweet: @user , you are correct that Reid certainly is a weasel. Sadly, we've got our own weasels; @user Sen McConnell & @user .The corrupt Mueller investigation w/be STOPPED if those 3 did their jobs.#MAGA #KAG #POTUS #Trump #NEWS #VoteRed #NoDACA #USA
Predicted label: hate
True label: not-hate

Tweet: Whoever just unfollowed me you a bitch
Predicted label: not-hate
True label: hate

Tweet: @user @user @user Always #NoDACA.I AM BORN IN #USA AND #USA FIRST.
Predicted label: hate
True label: not-hate

Tweet: @user @user Like he ever kept out any threats. He's lying as usual. #BuildThatWall
Predicted label: hate
True label: not-hate

Tweet: @user @user They can scrim whoever they fucking want this isn't a fucking chall you dumb bitch
Predicted label: hate
True label: not-hate






Some of the tweets in the test dataset seems to be wrong classified. Indeed, the last tweet : "@user @user They can scrim whoever they fucking want this isn't a fucking chall you dumb bitch" should be classified as hate.  
Moreover, the tweets with hashtag seems to be wrongly classified and be an argument for their high hate score.

## Annotate data

In [31]:
import random

# Extract 100 tweets containing at least 20% of hate tweets

target_perc = 0.2
nb_tweets = 100

# Extract 80 tweets randomly from test dataset
tweets = random.sample(dataset['test']['text'], int(nb_tweets * (1 - target_perc)))

with torch.no_grad():
    for tweet in tqdm(dataset['test']['text']):
        if tweet in tweets:
            continue
        if (len(tweets) == nb_tweets):
            break
        inputs = tokenizer(preprocess(tweet), return_tensors="pt")
        outputs = model(**inputs)
        logits = outputs[0]
        probs = softmax(logits.numpy(), axis=1)
        if np.argmax(probs) == 1:
            tweets.append(tweet)

print("Number of tweets:", len(tweets))
pd.DataFrame(tweets, columns=['Tweet']).to_csv('tweets.csv', index=False)


  1%|          | 27/2970 [00:05<09:46,  5.02it/s]

Number of tweets: 100





In [None]:
from sklearn.metrics import cohen_kappa_score

