# Introduction to Natural Language Processing 2 Lab08

**This lab is mainly about data and model analysis. There is very little code. Make sure you send back a proper report with your code, guideline, annotated sheets, and theoretical answers.**


---

## Introduction (1 point)

Your company wants to sell a moderation API tackling toxic content on Twitter. They ask you to come up with a model which detect toxic tweets. You remember your NLP classes, and start looking for existing models or datasets, and find a collection of [academic Twitter dataset on HuggingFace hub](https://huggingface.co/datasets/tweet_eval). Especially, the `hate` and `offensive` datasets seem close to what you are looking for.

1. (1 point) Pick one of the datasets between `hate` and `offensive`, and justify your choice. Remember that it is for a commercial application.

We chose the "hate" subset of the dataset because it contains many insults and comments inappropriate for a commercial application. Comments labeled as offensive are often much more acceptable and less toxic than messages labeled as hateful.

## Evaluating the dataset (5 points)

Before using the data to train a model, you have the right reflex and start with a data analysis.

1. (1 point) Describe the dataset. Look at the splits, proportion of classes, and see what you can figure out by just looking at the text.

In [29]:
%%capture
! pip install transformers
! pip install datasets
! pip install bertopic

In [96]:
import pandas as pd
import numpy as np

from datasets import load_dataset, Dataset

from transformers import pipeline, AutoTokenizer, TFAutoModelForSequenceClassification, AutoModelForSequenceClassification
from bertopic import BERTopic

from sklearn.metrics import precision_recall_fscore_support

from umap import UMAP

SEED = 42

umap_model = UMAP(random_state=SEED)

In [36]:
dataset = load_dataset("tweet_eval", "hate")
splits = ["train", "validation", "test"]

for split in splits:

  # "Label 1 = Hateful tweet, label 0 = non hatful tweet"
  hateful_rate = len(dataset[split].filter(lambda x: x["label"] == 1)) / len(dataset[split])

  print("Split:", split, ", proportion of hateful tweet:", hateful_rate)

print("\n" + "Hateful tweet exemples:")
print(np.array(dataset["train"].filter(lambda x: x["label"] == 1)["text"][:10]))

print("\n" + "Non-hateful tweets exemples:")
print(np.array(dataset["train"].filter(lambda x: x["label"] == 0)["text"][:10]))



  0%|          | 0/3 [00:00<?, ?it/s]



Split: train , proportion of hateful comment: 0.42033333333333334
Split: validation , proportion of hateful comment: 0.427
Split: test , proportion of hateful comment: 0.42154882154882156

Hateful comment exemples:
['A woman who you fucked multiple times saying yo dick small is a compliment you know u hit that spot 😎'
 '@user @user real talk do you have eyes or were they gouged out by a rapefugee?'
 'your girlfriend lookin at me like a groupie in this bitch!'
 "I AM NOT GOING AFTER YOUR EX BF YOU LIEING SACK OF SHIT ! I'm done with you dude that's why I dumped your ass cause your a lieing 😂😡 bitch"
 'Send home migrants not in need of protection, Peter Dutton tells UN, HEY DUTTON HOW ABOUT THE ONES THAT HAVE STAYED AND NOT LEFT THE COUNTRY WHEN THEY SHOULD OVERSTAYERS ? WHY DONT YOU GO AND ROUND ALL THEM UP ?'
 'Cory Booker and Kamala Harris competing for Most Hysterical Woman at the Kavanaugh hearings, Coulter hilariously tweeted.And yes, liberals immediately got triggered on Twitter, 

We can see that the 3 splits are almost stratified.

With the examples of tweets, we realize that the hateful tweets contain a lot of insults, these tweets should definitely be deleted by the moderation API. 

2. (3 points) Use [BERTopic](https://github.com/MaartenGr/BERTopic) to extract the topics within the data, and the main topics within each class. Please, think about [fixing the random seed](https://stackoverflow.com/questions/71320201/how-to-fix-random-seed-for-bertopic).
  * A [good model](https://github.com/MaartenGr/BERTopic#embedding-models) for sentence similarity is `all-MiniLM-L6-v2`, as it is [fast, light, and pretty accurate](https://www.sbert.net/docs/pretrained_models.html). You can use another one, but make sure to document your choice.

In [80]:
def get_main_topics(data: list) -> np.ndarray:
  '''
    Returns the main topics from the data.

      Parameters:
        data (list): List of strings representing the data from which topics are extracted.

    Returns:
        main_topics (numpy.ndarray): Main word for each extracted topic
  '''

  model = BERTopic(embedding_model="all-MiniLM-L6-v2", umap_model=umap_model)
  topics, probs = model.fit_transform(data)
  topics = model.get_topics()
  main_topics = [topic[np.argmax([word[1] for word in topic])][0] for topic in list(topics.values())]
  return np.array([topics])

topics = get_main_topics(dataset["train"].filter(lambda x: x["label"] == 0)["text"])
hate_topics = get_main_topics(dataset["train"].filter(lambda x: x["label"] == 1)["text"])

print("Hateful tweets topics:")
print(hate_topics)

print("Non-hateful tweets topics:")
print(topics)

Hateful tweets topics:
['you' 'user' 'illegal']
Non-hateful tweets topics:
['you' 'the' 'rape' 'men' 'whore' 'when' 'user' 'cunt' 'hysterical' 'me'
 'my' 'cunt' 'skank' 'hysterical' 'ram' 'when' 'drunk' 'kunt' 'skank'
 'hoe' 'ram' 'yesallmen' 'maledominance' 'nelly' 'talent' 'rape' 'me'
 'weekend' 'burundi' 'trash' 'unfollowing' 'blog' 'men' 'hole' 'cans' 'he'
 'flirting' 'sacred' 'count' 'life']


3. (1 point) What do you think about the results? How do you think it could impact a model trained on these data?


Surprisingly, There is a way more topics in tweets labeled as "Non-hateful" some of this topics (such as "rape", "whore", "cunt") seems more toxic than the the topics of the tweets labeled as hateful. This could have a negatice impact on the performance of a model trained on these data

4. **Bonus** By default, BERTopic extracts single keywords. Play with the model to extract bigrams or more. See if you can go deeper in your analysis.

## Evaluate a model (5 points)

You were thinking about fine-tuning a [RoBERTa](https://arxiv.org/abs/1907.11692) model on the dataset, but RoBERTa has been train on 2019 data, which do not include any tweet. Moreover, pretraining a model from scratch can be costly. Fortunately, a [reliable entity](https://github.com/cardiffnlp) pretrained RoBERTa on Tweets and even fine-tuned it on both datasets [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-offensive?text=I+like+you.+I+love+you) and [here](https://huggingface.co/cardiffnlp/twitter-roberta-base-hate?text=I+like+you.+I+love+you).

1. (2 points) Evaluate their model on the test split of the dataset you picked, using precision, recall, and F1-score.

In [93]:
checkpoint = "cardiffnlp/twitter-roberta-base-hate"

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

classifier = pipeline('text-classification', model=model, tokenizer=tokenizer)


predictions = classifier.predict(dataset["test"]["text"])
predictions[:5]

[{'label': 'LABEL_1', 'score': 0.8879395127296448},
 {'label': 'LABEL_0', 'score': 0.726382851600647},
 {'label': 'LABEL_1', 'score': 0.7972471117973328},
 {'label': 'LABEL_1', 'score': 0.873728334903717},
 {'label': 'LABEL_1', 'score': 0.7790350317955017}]

In [95]:
precision, recall, fscore, _ = precision_recall_fscore_support(dataset["test"]["label"], [0 if pred == "LABEL_0" else 1 for pred in predictions], average='micro')
print("Precision:", precision)
print("Recall:", recall)
print("F1-score:", fscore)

Precision: 0.42154882154882156
Recall: 0.42154882154882156
F1-score: 0.42154882154882156


To see how the model would fare on production data, you have 10K English tweets and replies available on the tweets.json file (taken from [internet archive](https://archive.org/details/archiveteam-twitter-stream-2021-07)). Note that the language was filtered using the Twitter API, so there might still be tweets in more than just English. The JSON fields were trimmed to minimum and the text was already preprocessed to mask user handles and URLs, like the tweets in your dataset.

2. (3 points) Extract the top 50 tweets your model is most confident about in the target class (offensive or hateful), the top 50 in the neutral class, and the top 50 your model is most uncertain about. Do you believe the model is doing a great job? For at least 2 tweets the model wrongly classified in your target class, try explaining what could have gone wrong.

In [98]:
jsondata = pd.read_json('tweets.json')
json_dataset = Dataset.from_pandas(jsondata)
predictions = classifier.predict(json_dataset["text"])

In [132]:
pred_confidence = np.array([pred["score"] if pred["label"] == "LABEL_0" else 1 - pred["score"] for pred in predictions])
confidence_threshold = sorted(pred_confidence, reverse=True)[50]
most_confident_mask = pred_confidence >= confidence_threshold
most_confident_preds = np.array(json_dataset["text"])[most_confident_mask]

hate_pred_confidence = np.array([pred["score"] if pred["label"] == "LABEL_1" else 1 - pred["score"] for pred in predictions])
hate_confidence_threshold = sorted(hate_pred_confidence, reverse=True)[50]
hate_most_confident_mask = hate_pred_confidence >= hate_confidence_threshold
hate_most_confident_preds = np.array(json_dataset["text"])[hate_most_confident_mask]

least_pred_confidence = np.array([pred["score"] for pred in predictions])
least_confidence_threshold = sorted(least_pred_confidence, reverse=True)[50]
least_confident_mask = least_pred_confidence >= least_confidence_threshold
least_confident_preds = np.array(json_dataset["text"])[least_confident_mask]

print("Top 50 most confident 'neutral' predictions:")
[print("- ",pred) for pred in most_confident_preds]

print("\n============================================================\n")

print("Top 50 most confident 'hate' predictions:")
[print("- ",pred) for pred in hate_most_confident_preds]

print("\n============================================================\n")

print("Top 50 least confident predictions:")
[print("- ",pred) for pred in least_confident_preds];

Top 50 most confident 'neutral' predictions:
-  Just wondering why students on student allowance weren't judged worthy of receiving additional financial support today along with all other beneficiaries @user @user @user
-  Now this is what you call ‘community’ 😁👇🏽          King's Lynn: The community shop with a chair for those who 'are not OK' http
-  Gilmour Space raises $46 million http
-  http Ten Representatives Join Coalition Supporting Local Radio Freedom Act
-  Studios and Hollywood Unions Extend Talks on Return to Work Rules http
-  Tears grips my heart knowing how hard I’ve struggled to find a legit trader.I never wanted to do this but I’ll be guilty if I don’t share this good news to people @user is the best trader so far Thanks for helping me and i will forever be grateful to you
@user
-  EXCLUSIVE: New Liberals lawyers to take on case against Christian Porter http @user
-  Heimlich Maneuver on Yourself When You're Alone and Choking | ProCPR http
-  Just posted a video @ New

3. **Bonus** Use [SHAP](https://github.com/slundberg/shap/tree/45b85c1837283fdaeed7440ec6365a886af4a333#natural-language-example-transformers) on the provided tweets, or manually written texts, to see if you can find topics on which the model is biased.
4. **Bonus** Train a naive Bayes model on the data, and compare its results with this model.

## Annotate data (7 points)

Regardless of the model's performances, you decide to annotate your own collection of tweets.

1. (1 point) Extract about 100 tweets containing at least 20% of your target class (offensive/hateful), from the 10K tweets provided. You can use the pretrained model to help you find tweets in the target class.
2. (3 points) Altogether, write down an annotation guildeline (which should be at least 2/3 of a page long).
    * What does the target class look like?
    * Any examples you could provide for ambiguous cases?
    * Keep "Can't tell / not annotable" class. Make sure you document what this class mean in your guideline.
3. (1 point) Every person in your group is going to annotate these tweets separately. So if you are 4, annotate them 4 times.
    * Typically, create a Google sheet or an excel document, one tab per person, in each tab one column for the text, and annother on the class.
4. (2 point) Evaluate your inter-annotaor agreement using Cohen Kappa (if you are 2) or Fleiss Kappa.
    * If, like your teacher, you have issues making the [NLTK implementation](https://www.nltk.org/_modules/nltk/metrics/agreement.html) work on the latest version of python (3.10+), you can use the [scikit-learn implementation](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html) of Cohen Kappa, and compute a matrix by pair of annotators.
    * What does the score mean? Are you doing a good job annotating the data and, if not, why?
5. **Bonus** Iterate on your annotation guideline with what you learned. Please send both version in your report.

Please provide the annotation sheets, the guideline, and the inter-annotator agreement in your report.
