# This notebook demonstrates the process of analyzing sentiments in tweets related to 'Drag Race' and 'Love Island'. The notebook leverages the `transformers` library to load a pre-trained RoBERTa model, and the `captum` library to explain and visualize model predictions using the Integrated Gradients method.

Code to import the pandas library

In [None]:
import pandas as pd

Installing the `transformers` library for working with pre-trained NLP models and the `captum` library for model interpretability.

In [None]:
!pip install transformers
!pip install captum

Collecting transformers
  Downloading transformers-4.32.0-py3-none-any.whl (7.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.5/7.5 MB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.15.1 (from transformers)
  Downloading huggingface_hub-0.16.4-py3-none-any.whl (268 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m36.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.3.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m40.1 MB/s[0m eta [36m0:00:0

Reading a CSV file containing classified sentiment data for drag race tweets and storing it in a pandas DataFrame.

In [None]:
drag_tweets = pd.read_csv('Notebook_2_3_4_dragrace_tweets.csv')

Filtering the `drag_tweets` DataFrame to include only the tweets classified with the 'hate' label and storing the result in `drag_tweets_hate`.

In [None]:
drag_tweets_hate = drag_tweets[drag_tweets['Predicted_label'] == 'hate']

Setting the pandas display option to show the full content of each column without truncation.

In [None]:
pd.set_option('display.max_colwidth', None)

Accessing and displaying the third row of the `drag_tweets_hate` DataFrame.

In [None]:
drag_tweets_hate.iloc[2]

Datetime                                                                    2023-06-01 18:33:00+00:00
Tweet Id                                                                          1664339280146079755
Text                   Oh look, it's the gay ass bitch Joey Jay! 😍🤩 #DragRace https://t.co/j9bodSfFFp
Username                                                                              StabilnoLabilno
Predicted_label                                                                                  hate
Classifier_score                                                                              0.96734
Predicted_sentiment                                                                          positive
Sentiment_scores                                                                             0.738002
Name: 33, dtype: object

Reading a CSV file containing classified sentiment data for 'Love Island' tweets and storing it in a pandas DataFrame.

In [None]:
love_tweets = pd.read_csv('Notebook_2_loveisland_tweets.csv')

Filtering the `love_tweets` DataFrame to include only the tweets classified with the 'hate' label and storing the result in `love_tweets_hate`.

In [None]:
love_tweets_hate = love_tweets[love_tweets['Predicted_label'] == 'hate']

Displaying the first 100 rows of the `love_tweets_hate` DataFrame, which contains 'Love Island' tweets classified with the 'hate' label.

Importing the `torch` library, part of the PyTorch deep learning framework.

In [None]:
import torch

Setting up the device for PyTorch computations, using a CUDA-enabled GPU if available, or the CPU otherwise.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Defining a custom PyTorch module named `SurrogateModel` to wrap the existing RoBERTa model, allowing for custom handling of embeddings and logits. Creating an instance of this surrogate model.

Importing necessary classes and loading a pre-trained RoBERTa model for hate speech detection along with the corresponding tokenizer.

In [None]:
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from captum.attr import IntegratedGradients

# Load the model and tokenizer
model = AutoModelForSequenceClassification.from_pretrained("facebook/roberta-hate-speech-dynabench-r4-target", return_dict=False)
tokenizer = AutoTokenizer.from_pretrained("facebook/roberta-hate-speech-dynabench-r4-target")

Downloading (…)lve/main/config.json:   0%|          | 0.00/816 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/1.11k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

Downloading (…)olve/main/merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [None]:
class SurrogateModel(torch.nn.Module):
    def __init__(self, model):
        super(SurrogateModel, self).__init__()
        self.model = model

    def forward(self, emb):
        # Use only the logits output
        return self.model(inputs_embeds=emb)[0]

surrogate_model = SurrogateModel(model)

Accessing the `id2label` attribute of the model's configuration to obtain a mapping from numeric class IDs to human-readable labels.

In [None]:
model.config.id2label

{0: 'nothate', 1: 'hate'}

Importing specific modules from the `captum` library related to attribution and visualization.

In [None]:
import captum.attr as attr
from captum.attr import visualization as viz

Defining a custom PyTorch module named `SurrogateModel` to wrap the existing RoBERTa model, allowing for custom handling of embeddings and logits. Creating an instance of this surrogate model.

In [None]:
class SurrogateModel(torch.nn.Module):
    def __init__(self, model):
        super(SurrogateModel, self).__init__()
        self.model = model

    def forward(self, emb):
        # Use only the logits output
        return self.model(inputs_embeds=emb)[0]

surrogate_model = SurrogateModel(model)

Accessing the `id2label` attribute of the model's configuration to obtain a mapping from numeric class IDs to human-readable labels.

In [None]:
model.config.id2label

{0: 'nothate', 1: 'hate'}

Importing specific modules from the `captum` library related to attribution and visualization.

In [None]:
import captum.attr as attr
from captum.attr import visualization as viz

Commented code demonstrating how to use Integrated Gradients to explain a prediction made by the RoBERTa model for a given sentence.

In [None]:
# Create an instance of the IntegratedGradients class
ig = IntegratedGradients(surrogate_model)

# Define the sentence
sentence = "to all the boys this season (except ron) i’m so sorry for how all these psycho women treated you #loveisland #LoveIslandUK"

# Encode the sentence to get input ids and attention masks
inputs = tokenizer.encode_plus(sentence, return_tensors='pt', add_special_tokens=True)

# Get the input ids and attention mask tensors
input_ids = inputs['input_ids']

# Get embeddings
embeddings = model.roberta.embeddings(input_ids)

# Run the model forward
model.zero_grad()
outputs = model(input_ids)

# Get the prediction
prediction = torch.argmax(outputs[0])

# Calculate attributions with Integrated Gradients and sum along the sequence dimension
attributions = ig.attribute(inputs=embeddings, target=prediction)
attributions_sum = attributions.sum(dim=-1).squeeze(0)

# Tokenize the sentence
tokens = tokenizer.tokenize(sentence)

# Print the attributions for each token
for token, attribution in zip(tokens, attributions_sum):
    print(f"{token}: {attribution.item()}")


to: 0.005703249442299111
Ġall: -0.15845322437345738
Ġthe: 0.09488827653006504
Ġboys: -0.0008455397996363252
Ġthis: 0.0271462471793234
Ġseason: 0.03720827359162518
Ġ(: -0.12310148483287017
except: -0.10783116708646277
Ġr: -0.038939239103849985
on: -0.06033520583942734
): -0.10870488822738561
Ġi: -0.13659175415892905
âĢ: -0.1274873801250756
Ļ: 0.027803399406990267
m: 0.005201486971912896
Ġso: 0.1285839129377941
Ġsorry: 0.12341028821802044
Ġfor: 0.07363307059522295
Ġhow: -0.10237083257680424
Ġall: 0.04642534076308989
Ġthese: -0.09638162132632189
Ġpsycho: 0.5028284913459755
Ġwomen: 0.6220171720909309
Ġtreated: 0.180092419114054
Ġyou: -0.025114818699469257
Ġ#: 0.22629191290052836
love: -0.369499028485138
is: -0.2549108534346244
land: -0.05151981471579359
Ġ#: 0.01851559084126783
Love: 0.1918040002533875
Is: 0.046408015689690385
land: 0.023480527697325074
UK: 0.015318906354987123


Commented code demonstrating how to prepare and visualize the explanation of a prediction made by the RoBERTa model using Integrated Gradients.

In [None]:
import captum.attr as attr
from captum.attr import visualization as viz

# Get prediction
pred_class = torch.max(outputs[0], 1)[1].item()
pred_prob = torch.max(torch.softmax(outputs[0], dim=1)).item()

# Prepare data for VisualizationDataRecord
tokens = tokenizer.convert_ids_to_tokens(input_ids[0]) # convert input_ids to tokens
attributions_sum = attributions_sum.detach().numpy()

# Make VisualizationDataRecord
vis_data_record = viz.VisualizationDataRecord(
                        attributions_sum,
                        pred_prob,
                        pred_class,
                        pred_class,
                        str(pred_class),
                        attributions_sum.sum(),
                        tokens,
                        None)

# Make the records (only one record here)
vis_data_records = [vis_data_record]

# Visualize the explanation
viz.visualize_text(vis_data_records)



True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.65),1.0,0.33,#s to Ġall Ġthe Ġboys Ġthis Ġseason Ġ( except Ġr on ) Ġi âĢ Ļ m Ġso Ġsorry Ġfor Ġhow Ġall Ġthese Ġpsycho Ġwomen Ġtreated Ġyou Ġ# love is land Ġ# Love Is land UK #/s
,,,,


True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
1.0,1 (0.65),1.0,0.33,#s to Ġall Ġthe Ġboys Ġthis Ġseason Ġ( except Ġr on ) Ġi âĢ Ļ m Ġso Ġsorry Ġfor Ġhow Ġall Ġthese Ġpsycho Ġwomen Ġtreated Ġyou Ġ# love is land Ġ# Love Is land UK #/s
,,,,
