# LAB08 - NLP2

## Introduction

We need to choose between `hate` and `offensive` datasets, use a specifical need for commercial use. Thus, we will use the `offensive` dataset, as HuggingFace indicates we need permissions to use the `hate` dataset for commercial use. On the datasets page on HuggingFace, we can see the following terms:

![image](img/licenses.png)
![image](img/hate-license.png)

The `hate` dataset is under the CC-BY-NC-4.0 license, which state the following:

![image](img/terms.png)

Thus, we will use the `offensive` dataset, which is under no license.

## Evaluating the dataset

####  1. Describe the dataset. Look at the splits, proportion of classes, and see what you can figure out by just looking at the text.

In [None]:
%pip install datasets
%pip install bertopic
%pip install sentence_transformers

In [166]:
import numpy as np
import json
import pandas as pd
import shap
import statsmodels.stats.inter_rater as ir
import random

from typing import List, Tuple

from datasets import load_dataset
from bertopic import BERTopic
from sentence_transformers import SentenceTransformer
from umap import UMAP

from transformers import AutoModelForSequenceClassification
from transformers import pipeline
from transformers import AutoTokenizer

from sklearn.metrics import classification_report

In [167]:
dataset = load_dataset("tweet_eval", "offensive")

Found cached dataset tweet_eval (/Users/quentinfisch/.cache/huggingface/datasets/tweet_eval/offensive/1.1.0/12aee5282b8784f3e95459466db4cdf45c6bf49719c25cdb0743d71ed0410343)


  0%|          | 0/3 [00:00<?, ?it/s]

In [168]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 11916
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 860
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1324
    })
})


The dataset has 3 splits: `train`, `validation` and `test`. The `train` split contains 11916 samples, the `validation` split contains 1324 samples and the `test` split contains 860 samples. Let's take a look at some samples

In [4]:
NB_PRINT_SAMPLES = 5
for i in range(NB_PRINT_SAMPLES):
    print(dataset['train'][i])

{'text': '@user Bono... who cares. Soon people will understand that they gain nothing from following a phony celebrity. Become a Leader of your people instead or help and support your fellow countrymen.', 'label': 0}
{'text': '@user Eight years the republicans denied obama’s picks. Breitbarters outrage is as phony as their fake president.', 'label': 1}
{'text': '@user Get him some line help. He is gonna be just fine. As the game went on you could see him progressing more with his reads. He brought what has been missing. The deep ball presence. Now he just needs a little more time', 'label': 0}
{'text': '@user @user She is great. Hi Fiona!', 'label': 0}
{'text': "@user She has become a parody unto herself? She has certainly taken some heat for being such an....well idiot. Could be optic too  Who know with Liberals  They're all optics.  No substance", 'label': 1}


This data is a collection of tweets, with a label indicating if the tweet is offensive or not. The label is a boolean, `0` for non-offensive and `1` for offensive.
Tweets that are classified as offensive seem to be mostly insults, or tweets that are not politically correct. Some tweets are also classified as offensive because they are not politically correct, but are not insults. The last tweet printed above is classified as offensive because it contains the word "idiot".

#### 2. Use BERTopic to extract the topics within the data, and the main topics within each class.

In [136]:
SEED = 42
umap_model = UMAP(random_state=SEED)

model = SentenceTransformer('all-MiniLM-L6-v2')

topic_model = BERTopic(language="english", calculate_probabilities=True, embedding_model=model, umap_model=umap_model)
topics, probs = topic_model.fit_transform(dataset['train']['text'])

In [159]:
topic_model.get_topic_freq().head(10)

Unnamed: 0,Topic,Count
3,0,3939
4,-1,3069
5,1,991
28,2,382
13,3,317
27,4,262
21,5,195
14,6,161
25,7,153
17,8,113


Let's visualize the topics within the data, and the main topics within each class (20 topics)

In [161]:
topic_per_class = topic_model.topics_per_class(dataset['train']['text'], topics)
topic_model.visualize_topics_per_class(topic_per_class, top_n_topics=20)

#### 3. What do you think about the results? How do you think it could impact a model trained on these data?

The topic with the highest frequency is `she_you_is_he`, which is just a list of pronouns. It makes sense that this topic is the most frequent, as pronouns are very common in the english language. Then, the topics are mainly related to diverse subjects around politics. Some subjects are surprising such as `nfl_football_he_game`, but it makes sense that controversial tweets might appear on this topic.

The model trained on these data might be biased towards politics, as the topics are mainly related to politics. This might be a problem if the model is used to classify tweets that are not related to politics.

#### 4. (Bonus) By default, BERTopic extracts single keywords. Play with the model to extract bigrams or more. See if you can go deeper in your analysis.

In [149]:
topic_model.get_topics().keys()

dict_keys([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78])

In [157]:
topic_model.visualize_term_rank()

The c-TF-IDF score is a measure of how important a word is in a topic. The higher the score, the more important the word is in the topic. We see that the topic with the highest score is topic 50 (bono_u2_asshold_his_tax_dublin_inc), followed by topics related to sexual words, popular brands or classic words.

Let's now see what topics are extracted when we use bigrams.

In [162]:
topic_model = BERTopic(language="english", calculate_probabilities=True, embedding_model=model, umap_model=umap_model, n_gram_range=(1, 2))
topics, probs = topic_model.fit_transform(dataset['train']['text'])
topic_per_class = topic_model.topics_per_class(dataset['train']['text'], topics)
topic_model.visualize_topics_per_class(topic_per_class, top_n_topics=20)

There is not a big difference using bigrams, topics names are more repetitive, but the topics are still in the same domain as before.

## Evaluate a model

#### 1. Evaluate their model on the test split of the dataset you picked, using precision, recall, and F1-score.

So we will pick the RoBERTa model that has been fine-tuned on the offensive dataset. Let's use it through the HugginFace library

In [11]:
!rm -rf cardiffnlp

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


In [12]:
MODEL = f"cardiffnlp/twitter-roberta-base-offensive"

tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)

#### 1. Evaluate their model on the test split of the dataset you picked, using precision, recall, and F1-score.

In [13]:
roberta_pipeline = pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer
)

Xformers is not installed correctly. If you want to use memorry_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


In [27]:
predictions = roberta_pipeline(dataset['test']['text'])
predictions

[{'label': 'offensive', 'score': 0.856173038482666},
 {'label': 'offensive', 'score': 0.6463530659675598},
 {'label': 'non-offensive', 'score': 0.5957751274108887},
 {'label': 'non-offensive', 'score': 0.7610461711883545},
 {'label': 'non-offensive', 'score': 0.9149481654167175},
 {'label': 'non-offensive', 'score': 0.9536920189857483},
 {'label': 'non-offensive', 'score': 0.7719836831092834},
 {'label': 'non-offensive', 'score': 0.9647331237792969},
 {'label': 'offensive', 'score': 0.8978868722915649},
 {'label': 'non-offensive', 'score': 0.505090594291687},
 {'label': 'offensive', 'score': 0.7437807321548462},
 {'label': 'non-offensive', 'score': 0.9143494963645935},
 {'label': 'non-offensive', 'score': 0.9382737278938293},
 {'label': 'non-offensive', 'score': 0.6899848580360413},
 {'label': 'non-offensive', 'score': 0.5975456237792969},
 {'label': 'offensive', 'score': 0.8460323810577393},
 {'label': 'non-offensive', 'score': 0.7833716869354248},
 {'label': 'non-offensive', 'score':

In [43]:
def evaluate(predictions: List, labels: List) -> None:
    """
    Evaluate the predictions of a model.

    ## Parameters
    predictions: List
        The predictions of a model.
    labels: List
        The labels of the test set.
    """
    predictions = [0 if p['label'] == "non-offensive" else 1 for p in predictions]
    print(classification_report(labels, predictions))

In [44]:
evaluate(predictions, dataset['test']['label'])

              precision    recall  f1-score   support

           0       0.88      0.93      0.91       620
           1       0.80      0.67      0.73       240

    accuracy                           0.86       860
   macro avg       0.84      0.80      0.82       860
weighted avg       0.86      0.86      0.86       860



The global average F1 score is 0.82, which is pretty good. The model seems to be better at classifying non-offensive tweets than offensive tweets, as the F1 score for non-offensive tweets is 0.91, and the F1 score for offensive tweets is 0.73. Also, the dataset is unbalanced, as there are more non-offensive tweets than offensive tweets. Thus, the model is better at classifying the majority class, which is non-offensive tweets.

#### 2. Look for prediction failures. Extract the top 5 misclassified tweets (highest score in wrong class) for each class and discuss what could be wrong with the model.

In [48]:
def get_top_misclassified(predictions: List, labels: List, nb_samples: int = 5) -> Tuple[List, List]:
    """
    Extract nb_samples misclassified samples per class.

    ## Parameters
    predictions: List
        The predictions of a model.
    labels: List
        The labels of the test set.
    nb_samples: int
        The number of samples to extract per class.

    ## Returns
    offensive_missclassified: List
        The top nb_samples offensive samples that were missclassified.
    non_offensive_missclassified: List
        The top nb_samples non-offensive samples that were missclassified.
    """
    offensive_missclassified = []
    non_offensive_missclassified = []
    
    for i, p in enumerate(predictions):
        if p['label'] == "non-offensive" and labels[i] == 1:
            offensive_missclassified.append((i, p['score']))
        elif p['label'] == "offensive" and labels[i] == 0:
            non_offensive_missclassified.append((i, p['score']))
    offensive_missclassified.sort(key=lambda x: x[1], reverse=True)
    non_offensive_missclassified.sort(key=lambda x: x[1], reverse=True)

    return offensive_missclassified[:nb_samples], non_offensive_missclassified[:nb_samples]

In [49]:
offensive_missclassified, non_offensive_missclassified = get_top_misclassified(predictions, dataset['test']['label'])

In [164]:
for i, score in offensive_missclassified:
    print(score, dataset['test']["text"][i])

0.9338217973709106 #Liberals / #Democrats THIS is what you stand for. If not, then #WalkAway
0.919756293296814 #Liberals Are Reaching Peak Desperation To Call On #PhillipRuddock To Talk With #Turnbull To Convince Him To Help with #WentworthVotes 18 Sept 2018  @user #Auspol #LNP #NSWpol @user  @user @user #LNPMemes
0.9112220406532288 #NoPasaran: Unity demo to oppose the far-right in #London – #antifa #Oct13 — Enough is Enough!
0.9081719517707825 #BREXIT deal HAS been reached - and will be unveiled at special summit in NOVEMBER, Has @user sold out the #UK to the eu??? She better have not or the @user are finished!! @user
0.895766019821167 #America  ... tear down that #Wall! #tcot #partisanship #Trump #thewall #Borderwall #liberty #civilsociety #think #Conservatives #Democrats #Progressives #liberals #Independent #libertarians #GOP #DNC #CriticalThinking


In [165]:
for i, score in non_offensive_missclassified:
    print(score, dataset['test']["text"][i])

0.9010688662528992 Are you fucking serious?
0.8939998745918274 @user I guess that’s where swamp ass originated
0.8576400279998779 An American Tail really is one of the most underrated animations ever ever ever. Fuck I cried in this scene
0.8492959141731262 @user @user Bull crap. You know she doesn't care.  She is trying to get attention for her Presidential run.  Do you see any other Senator giving nonsense?  Nope.
0.8418039083480835 #Room25 is actually incredible, Noname is the shit, always has been,  and I’m seein her in like 5 days in Melbourne. Life is good. Have a nice day.


Regarding offensive tweets misclassified as non-offensive, the model seems to have a hard time classifying tweets that are not insults, but are not politically correct. For example, the second tweet is classified as non-offensive, but seems to have strong opinion about Liberals. Also, the model is really confident about these classifications, with almost all scores above 0.90.

Regarding non-offensive tweets misclassified as offensive, it looks like the model is classifying these example as offensive because they contain insults. But actually, these insults do not seem to be put in an offensive way, more as an accent on the sentence. So this is very understandable that the model is misclassifying these examples.


#### 3. Extract the top 10 tweets your model is most confident about in the target class (offensive or hateful), the top 10 in the neutral class, and the top 10 your model is most uncertain about. Do you believe the model is doing a great job?

In [52]:
with open('tweets.json') as f:
    data = json.load(f)

df = pd.DataFrame(data)
df = df.dropna()
df = df.reset_index(drop=True)
df = df.drop_duplicates(subset=['text'])

df

Unnamed: 0,id,id_str,text,lang,created_at
0,1410492618790817793,1410492618790817793,YOU BETTER SUCK HIS DICK KOZY I SEE YOU WITH K...,en,Thu Jul 01 06:57:00 +0000 2021
1,1410492618769780742,1410492618769780742,I still canr believe it.😭😭😭😭😭,en,Thu Jul 01 06:57:00 +0000 2021
2,1410492618790686720,1410492618790686720,You should raise the webform....how would they...,en,Thu Jul 01 06:57:00 +0000 2021
3,1410492618803335174,1410492618803335174,im tired too but this is so entertaining i cant,en,Thu Jul 01 06:57:00 +0000 2021
4,1410492618778157059,1410492618778157059,Fuckof,en,Thu Jul 01 06:57:00 +0000 2021
...,...,...,...,...,...
9995,1410721732642492418,1410721732642492418,Because It’s My Business: Hear Tabitha Brown’s...,en,Thu Jul 01 22:07:25 +0000 2021
9996,1410721732659322881,1410721732659322881,comer pipoca enquanto assisto girl from nowher...,en,Thu Jul 01 22:07:25 +0000 2021
9997,1410721736841056259,1410721736841056259,They will be mad with me if they’re not 504Boy...,en,Thu Jul 01 22:07:26 +0000 2021
9998,1410721736828411905,1410721736828411905,Omg so beautiful 😍😍😍,en,Thu Jul 01 22:07:26 +0000 2021


In [53]:
tweets_preds = roberta_pipeline(df['text'].tolist())

In [69]:
def extract_top_tweets(predictions: List, nb_samples: int = 10) -> Tuple[List, List, List]:
    """
    Extract the top nb_samples offensive, non-offensive and uncertain tweets.

    ## Parameters
    predictions: List
        The predictions of a model.
    nb_samples: int
        The number of samples to extract per class.

    ## Returns
    top_offensive_tweets: List
        The top nb_samples offensive tweets.
    top_non_offensive_tweets: List
        The top nb_samples non-offensive tweets.
    top_uncertain_tweets: List
        The top nb_samples uncertain tweets.
    """
    top_offensive_tweets = []
    top_non_offensive_tweets = []
    top_uncertain_tweets = []

    for i, p in enumerate(predictions):
        if p['label'] == "non-offensive":
            top_non_offensive_tweets.append((i, p['score']))
        elif p['label'] == "offensive":
            top_offensive_tweets.append((i, p['score']))
        top_uncertain_tweets.append((i, p['score']))
    top_offensive_tweets.sort(key=lambda x: x[1], reverse=True)
    top_non_offensive_tweets.sort(key=lambda x: x[1], reverse=True)
    top_uncertain_tweets.sort(key=lambda x: x[1])
    
    return top_offensive_tweets[:nb_samples], top_non_offensive_tweets[:nb_samples], top_uncertain_tweets[:nb_samples]

In [70]:
top_offensive_tweets, top_non_offensive_tweets, top_uncertain_tweets = extract_top_tweets(tweets_preds)

In [72]:
for i, score in top_offensive_tweets:
    print(score, df['text'][i])

0.9484737515449524 Stop with the slow mo it make it look bad
0.9465802907943726 morninggggg
0.9437395930290222 YOU GET ITTT and same omg, i think the last time i had one of those was in 2018 but its so good
0.9423408508300781 Me too. Buck up; you are not alone. Good people agree, and we are all in this together.
0.9394720792770386 sexy
0.9366845488548279 she should pay attention more omg it’s so annoying :/
0.9325129985809326 Or how about rather than playing a game and tweeting about it you pull your finger out and reply to your backers you are letting down every single day. Shame on you
0.9313321113586426 Finally a good take ❤️
0.9305409789085388 desoff
0.9289026856422424 mans is lifting two of you.


In [73]:
for i, score in top_non_offensive_tweets:
    print(score, df['text'][i])

0.9816755652427673 Cool
0.9815455079078674 Man you guys really know how to make a mofo feel totally socially inept.
0.9814001321792603  LITERALLY  WAKE  UP  RN  WHERE  R  U
0.9809073805809021 I was referred to her by a friend online and I thought is a scam ...but I was moved to try and here I earned.. just want to share this to people too. @user
0.9806414246559143 Deja vu

BLINKS U KNOW WHAT 2 DO 
#PremiosMTVMIAW
#MTVLAKPOPROSE 
#MTVLAFANDOMBLINKS
@user
0.9802667498588562 Who doesn’t love u
0.9802438020706177 one person followed me and 3 people unfollowed me // automatically checked by http
0.979927659034729 Just the way I make money and would not marry or even date who doesn’t make her own money, same applies to cooking. I know how to cook and I won’t settle with a woman who can’t cook unless she’s Rich/wealthy or sum. You can’t be broke and still be lazy
0.9798141121864319 July
0.9793971180915833 Good afternoon&lt;333


In [74]:
for i, score in top_uncertain_tweets:
    print(score, df['text'][i])

0.5004608631134033 Johor recorded most suicide cases for two consecutive years - 2019, 2020. As of May 2021, Selangor recorded the highest number of suicide cases, with 117 or 25% of 468 cases reported this year
0.500519871711731 the part about this that scares me the most is that i drink heavily i take vyvanse i occasionally smoke and i use retinol every single night. like that baby would be gambling with its life keeping me out of the loop bruh
0.5005561709403992 STOP HE WAITED TILL 3:25 HUH
0.50067538022995 😲 stay safe.
0.500801682472229 too scared to soend my own money bc my mom gets notifs if i buy something🤣🤣🙏
0.5008642673492432 Having my nipples pierced again makes me feel closer to the person i was.
0.5010842084884644 And you didn't need the Americans at all this time!
0.5010976195335388 You feel like that, perhaps something in your past may have informed that. So maybe try understand what about sharing a milestone with loved ones is deeply making you feel like you're attention

In [77]:
print(df['text'][0], tweets_preds[0])

YOU BETTER SUCK HIS DICK KOZY I SEE YOU WITH KNUCKLES GET EM GYAAAAL {'label': 'offensive', 'score': 0.8737722039222717}


Looking at the tweets that the model is most confident about, we can see that it's very confident when classifying non-offensive tweets as offensive. For example, it classified *"morninggggg"* as offensive with a score of $0.946$, which makes no sense. Top non-offensive tweets looks better classified than top offensive tweets. The uncertain ones also make sense, as some are about drinking, sex related vocabulary, or strange formulations. Overall, we can say it's not doing a great job, but in this case it seems better to have false positives than false negatives, so it's not that bad. It can still classify very offensive tweets such as the last example, which is good.

#### 4. (Bonus) Use SHAP on the provided tweets, or manually written texts, to see if you can find topics on which the model is biased.

In [62]:
some_tweets = df['text'].tolist()[:10]
some_tweets

['YOU BETTER SUCK HIS DICK KOZY I SEE YOU WITH KNUCKLES GET EM GYAAAAL',
 'I still canr believe it.😭😭😭😭😭',
 'You should raise the webform....how would they know then that you completed ur medicals',
 'im tired too but this is so entertaining i cant',
 'Fuckof',
 'People 😋',
 'Even if they didn’t exploit people to acquire their riches, how are you gonna be okay literally wasting thousands and thousands of dollars while there are still people who are homeless? While there are people skipping life saving medical treatments bc of the cost?',
 'He rather have used the cash to buy some clothes that dont resemble a duvet.',
 'Baret',
 'Bloody awesome!']

In [65]:
explainer = shap.Explainer(roberta_pipeline)
shap_values = explainer(some_tweets)

AttributeError: module 'numpy' has no attribute 'bool'.
`np.bool` was a deprecated alias for the builtin `bool`. To avoid this error in existing code, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

In [None]:
shap.plots.text(shap_values[:10])

SHAP seems to not be working anymore...

#### 5. What are the advantages of using a pre-trained transformer vs naive Bayes? Think about training, and usage in production.

The advantages of using a pre-trained transformer are the following:
- The model is already trained, so we don't need to train it again. This is a huge advantage because training a model can take a lot of time, and a lot of data.
- The model can capture a lot of information, and can be used for a lot of different tasks. It can be used on tasks that are not related to the task it has been trained on, which is not the case for a Naive Bayes model.

The advantages of using a Naive Bayes model are the following:
- The model is very simple, and can be trained very quickly. It can also be trained on a small amount of data.
- Since the model is very simple, it is way easier to understand how it works, and to debug it.

In term of efficiency, the pre-trained transformer is way more efficient than the Naive Bayes model. However, the Naive Bayes model is way easier to understand and to debug, and it will be more efficient on a small dataset. It will also require less resources to run. So this is a trade-off between efficiency and simplicity.

#### 6. Train a naive Bayes model on the data, and compare its results with this model.

In [60]:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report

# define a pipeline
pipeline = Pipeline([
    ('vectorizer', CountVectorizer()),
    ('classifier', MultinomialNB())
])

# train the model
pipeline.fit(dataset['train']['text'], dataset['train']['label'])

# evaluate the model
predictions = pipeline.predict(dataset['test']['text'])
print(classification_report(dataset['test']['label'], predictions))

              precision    recall  f1-score   support

           0       0.84      0.87      0.85       620
           1       0.62      0.56      0.59       240

    accuracy                           0.78       860
   macro avg       0.73      0.71      0.72       860
weighted avg       0.78      0.78      0.78       860



The F1-score of the Naive Bayes model is 0.72, which is way lower than the F1-score of the RoBERTa model, which is 0.82. This makes sense and confirms what we said in the previous question: the RoBERTa model is way more efficient than the Naive Bayes model.

## Annotate data

#### 1. Extract about 100 tweets containing at least 20% of your target class (offensive/hateful), from the 10K tweets provided. You can use the pretrained model to help you find tweets in the target class.

In [85]:
offensive_tweets = []
for i, p in enumerate(tweets_preds):
    if p['label'] == "offensive":
        offensive_tweets.append((df["text"][i], p["label"]))
    if len(offensive_tweets) == 30:
        break

non_offensive_tweets = []
for i, p in enumerate(tweets_preds):
    if p['label'] == "non-offensive":
        non_offensive_tweets.append((df["text"][i], p["label"]))
    if len(non_offensive_tweets) == 70:
        break

extracted_tweets = offensive_tweets + non_offensive_tweets

# shuffle the tweets
random.seed(SEED)
random.shuffle(extracted_tweets)
extracted_tweets[:10]

[('agree.', 'non-offensive'),
 ('I guess looking at interannotator metrics such as fleiss kappa may be useful to understand how consistent they are? You could look at some kind of document similarity to infer information about the related pages?',
  'non-offensive'),
 ('Tried and tested 😂😂😂', 'non-offensive'),
 ('setiap lihat teman2ku baru main hades and they’re feeing horny for no reason yeah that’s the point babe this game and the fandom are so horny',
  'offensive'),
 ('Free Britney', 'non-offensive'),
 ('Correction: Eastern WA will see *smoke from the #LavaFire in CA',
  'non-offensive'),
 ('Fuckof', 'offensive'),
 ('literally', 'non-offensive'),
 ('feeling so sexy and principled at my horrifically corrupt and unfulfilling blue collar job',
  'offensive'),
 ('VOTE FOR CHANGBIN RN', 'non-offensive')]

In [86]:
extracted_tweets_without_labels = [t[0] for t in extracted_tweets]

In [90]:
extracted_tweets_df = pd.DataFrame(extracted_tweets_without_labels, columns=["text"])
extracted_tweets_df["label"] = ""
extracted_tweets_df.to_csv("extracted_tweets.csv", index=False)

#### 2. Write down an annotation guildeline.

Annotation Guideline for Tweet Classification

Objective:
The aim of this annotation guideline is to provide clear instructions for annotating tweets into three distinct classes: "neutral," "offensive," and "can't tell." The guidelines ensure consistent annotation across different annotators and help define the target classes, provide examples for ambiguous cases, and clarify the meaning of the "can't tell" class.

1. Target Classes:
   a. Neutral: Tweets that do not contain offensive or biased language, and express a neutral or non-controversial sentiment.
   b. Offensive: Tweets that contain offensive, abusive, derogatory, or inappropriate language targeting individuals or groups.
   c. Can't tell: Use this class when the tweet is too ambiguous, lacks context, or the annotator cannot confidently determine whether it belongs to the "neutral" or "offensive" class.

2. Characteristics of Each Class:
   a. Neutral:
      - The tweet presents a non-controversial or unbiased opinion.
      - It does not contain any offensive language, personal attacks, or discriminatory content.
      - The sentiment expressed in the tweet is neither positive nor negative.
      - The tweet express an opinion that is not likely to provoke any strong reactions.
      - It does not contain any profanity, vulgar language, or sexually explicit content.

   b. Offensive:
      - The tweet includes explicit or implicit offensive language, hate speech, or derogatory remarks targeting individuals or groups based on attributes such as race, gender, religion, ethnicity, etc.
      - It contains personal attacks, threats, or intends to demean or harm others.
      - The tweet may provoke anger, disgust, or be considered inappropriate or disrespectful.
      - It takes party in political or social discussions that are controversial or sensitive in nature.
      - Contains profanity, vulgar language, or sexually explicit content.
      - It contains offensive or abusive terms that are used to insult others.
      - The tweet expresses extreme political or religious views that are likely to provoke strong reactions.

   c. Can't tell:
      - Select this class if the tweet is ambiguous, lacks sufficient context, or contains language that makes it difficult to confidently assign it to "neutral" or "offensive."
      - The tweet might be written in an unclear or sarcastic tone, making it hard to discern the true intent.
      - The tweet could be in a language or cultural context that is unfamiliar to the annotator.
      
3. Examples of Ambiguous Cases:
   a. Ambiguous "neutral" cases:
      - Tweets that contain mild sarcasm or irony that might be mistaken for offensive language without proper context.
      - Statements that mention controversial topics without expressing a clear opinion (e.g. "People talk out more and more about racism these days", this example should not be considered offensive because it does not express a clear opinion about racism, it just mentions that people talk about it more and more)
      - Tweets with ambiguous humor that could be perceived as offensive without further clarification.

   b. Ambiguous "offensive" cases:
      - Tweets that mention sensitive topics but do not directly contain offensive language (e.g. "Nazis were the enemy of Europe in WWII", this example does not contain direct offensive language, but the topic is sensitive and could be considered offensive by some people).
      - Statements that criticize public figures or institutions without crossing the line into offensive territory (e.g. "The president is not doing a good job", this example is not offensive because it does not contain any offensive language, it just expresses an opinion about the president, but should be considered offensive nonetheless because it criticizes a public figure).
      - Tweets that include euphemisms, coded language, or implicit offensive content (e.g. "I don't like people who are not like me", this example does not contain any offensive language, but it could be considered offensive by some people because it implies that the author does not like people who are different from him/her)

4. Handling "Can't Tell" Class:
   - The "can't tell" class should be used sparingly when there is genuine uncertainty or lack of information to make a clear determination.
   - Annotators should strive to provide clear and well-supported annotations for as many tweets as possible.
   - Whenever possible, annotators should seek additional context, or utilize external resources to aid in the classification process.

Consistency is key in maintaining high-quality annotations. Annotators should review and familiarize themselves with this guideline thoroughly before starting the annotation process.

#### 4. Evaluate your inter-annotaor agreement using Fleiss Kappa

In [94]:
df = pd.read_csv("extracted_tweets2.csv")
df

Unnamed: 0,text,label
0,agree.,non-offensive
1,I guess looking at interannotator metrics such...,non-offensive
2,Tried and tested 😂😂😂,non-offensive
3,setiap lihat teman2ku baru main hades and they...,offensive
4,Free Britney,non-offensive
...,...,...
95,He rather have used the cash to buy some cloth...,non-offensive
96,Come one come all into 6988,non-offensive
97,"Make up, dress up, like a princess \nfor him s...",offensive
98,"⠀⠀⠀""Tell me then, YOU BROKEDICK SON OF A BITCH...",offensive


In [120]:
df_annotators = df.drop(columns=["text"])
df_annotators = df_annotators.applymap(lambda x: 1 if x == "offensive" else 0)
df_annotators["label2"] = df_annotators["label"]
df_annotators

Unnamed: 0,label,label2
0,0,0
1,0,0
2,0,0
3,1,1
4,0,0
...,...,...
95,0,0
96,0,0
97,1,1
98,1,1


In [122]:
vals, _ = ir.aggregate_raters(df_annotators.values)
ir.fleiss_kappa(vals, method='fleiss')

1.0

We obtained a Fleiss Kappa score of 1.0, which means that there is a perfect agreement between annotators. This is a very good score, and it means that our annotation guideline is very clear and easy to understand.

#### 6. (Bonus) Evaluate the model your data. Use a majority vote for labels (remove majority "can't tell") and compute the precision, recall, and F1-score.

In [None]:
evaluate(tweets_preds, df_annotators["label"].tolist())