# This notebook is dedicated to analyzing tweets related to 'Drag Race'. It leverages the power of the `transformers` library to use a pre-trained RoBERTa model for hate-speech analysis. Furthermore, the notebook employs the `captum` library to identify which words in the tweets contribute most significantly to the 'hate' sentiment prediction. By the end, the notebook presents a frequency analysis of these high-contributing words.

Installing the `transformers` library for working with pre-trained NLP models and the `captum` library for model interpretability.

In [None]:
!pip install transformers
!pip install captum



Importing the `pandas` library for data manipulation and analysis and the PyTorch library.

In [None]:
import pandas as pd
import torch

Reading a CSV file named `dragrace_tweets_classified_sentiment.csv` and storing its content in a pandas DataFrame named `drag_df`.

In [None]:
drag_df = pd.read_csv('Notebook_2_3_4_dragrace_tweets.csv')

Displaying the `drag_df` DataFrame containing classified sentiment data for drag race tweets.

In [None]:
drag_df

Unnamed: 0,Datetime,Tweet Id,Text,Username,Predicted_label,Classifier_score,Predicted_sentiment,Sentiment_scores
0,2023-06-01 23:59:13+00:00,1664421373329166336,Y’all already spoiled the challenge winner and...,RyneMarshall,nothate,0.904485,positive,0.542502
1,2023-06-01 23:09:25+00:00,1664408839276359681,Almost time for Sasha Colby! #DragRace #PrideM...,StarianBlake,nothate,0.939533,positive,0.926387
2,2023-06-01 23:05:00+00:00,1664407728297754624,https://t.co/yxRjGfzup5……\nCollectible Hardcov...,MarkColeAuthor,nothate,0.990348,neutral,0.724652
3,2023-06-01 22:45:47+00:00,1664402894358020096,This week's The Fame Games is a true horror sh...,amoraappetit,nothate,0.926366,negative,0.595444
4,2023-06-01 22:38:18+00:00,1664401010469609473,"(for those interested, this is ""And Don't F&am...",ReinaDeLaIsla,nothate,0.983931,positive,0.857734
...,...,...,...,...,...,...,...,...
4995,2023-05-13 04:05:53+00:00,1657235692634664960,all stars 8 episode 1 reaction thread! spoiler...,adorejantrixsy,nothate,0.998698,neutral,0.745488
4996,2023-05-13 04:02:51+00:00,1657234928512258052,naysha is hot in and out of drag yet it doesn’...,tux_masque,nothate,0.570454,neutral,0.484673
4997,2023-05-13 04:01:30+00:00,1657234587049922561,How Jinkx Monsoon Won #rupaulsdragrace season ...,The_DudeandI,nothate,0.994871,neutral,0.613986
4998,2023-05-13 04:00:40+00:00,1657234379461267457,Someone never likes the choreography #allstars...,CheetahMuneca,hate,0.512131,negative,0.847118


Filtering the `drag_df` DataFrame to include only the tweets classified with the 'hate' label and storing the result in `drag_tweets_hate`.

In [None]:
drag_tweets_hate = drag_df[drag_df['Predicted_label'] == 'hate']

Extracting the text of tweets classified with the 'hate' label from the `drag_tweets_hate` DataFrame and storing it in a list named `drag_sentences`.

In [None]:
drag_sentences = drag_tweets_hate['Text'].to_list()

Setting up the device for PyTorch computations, using a CUDA-enabled GPU if available, or the CPU otherwise.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

Loading necessary libraries, the RoBERTa model, and its tokenizer.

In [None]:
from transformers import RobertaTokenizer, RobertaForSequenceClassification

# Load the model and tokenizer
model = RobertaForSequenceClassification.from_pretrained("facebook/roberta-hate-speech-dynabench-r4-target")
tokenizer = RobertaTokenizer.from_pretrained("facebook/roberta-hate-speech-dynabench-r4-target")
model.eval()
model.to(device)

RobertaForSequenceClassification(
  (roberta): RobertaModel(
    (embeddings): RobertaEmbeddings(
      (word_embeddings): Embedding(50265, 768, padding_idx=1)
      (position_embeddings): Embedding(514, 768, padding_idx=1)
      (token_type_embeddings): Embedding(1, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): RobertaEncoder(
      (layer): ModuleList(
        (0-11): 12 x RobertaLayer(
          (attention): RobertaAttention(
            (self): RobertaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): RobertaSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
             

In [None]:
class SurrogateModel(torch.nn.Module):
    def __init__(self, model):
        super(SurrogateModel, self).__init__()
        self.model = model

    def forward(self, emb):
        # Use only the logits output
        return self.model(inputs_embeds=emb)[0]

surrogate_model = SurrogateModel(model)

surrogate_model.to(device)

SurrogateModel(
  (model): RobertaForSequenceClassification(
    (roberta): RobertaModel(
      (embeddings): RobertaEmbeddings(
        (word_embeddings): Embedding(50265, 768, padding_idx=1)
        (position_embeddings): Embedding(514, 768, padding_idx=1)
        (token_type_embeddings): Embedding(1, 768)
        (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (dropout): Dropout(p=0.1, inplace=False)
      )
      (encoder): RobertaEncoder(
        (layer): ModuleList(
          (0-11): 12 x RobertaLayer(
            (attention): RobertaAttention(
              (self): RobertaSelfAttention(
                (query): Linear(in_features=768, out_features=768, bias=True)
                (key): Linear(in_features=768, out_features=768, bias=True)
                (value): Linear(in_features=768, out_features=768, bias=True)
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (output): RobertaSelfOutput(
                (dense):

Using Integrated Gradients to compute attributions for each token in the 'hate' sentiment tweets from `drag_sentences`. Identifying and storing the token with the maximum attribution for each tweet in the `max_contrib_words` list and displaying the results.

In [None]:
from collections import Counter
from tqdm import tqdm
from captum.attr import IntegratedGradients
from captum.attr import LayerIntegratedGradients, visualization


# Create an instance of the IntegratedGradients class
ig = IntegratedGradients(surrogate_model)  # Ensure 'surrogate_model' is defined

# Initialize the list to store maximum contributing words
max_contrib_words = []

# Iterate through each sentence, wrapping the iterable with tqdm
for sentence in tqdm(drag_sentences, desc="Processing sentences"):

    # Encode the sentence to get input ids and attention masks
    inputs = tokenizer.encode_plus(sentence, return_tensors='pt', add_special_tokens=True)

    # Get the input ids and attention mask tensors
    input_ids = inputs['input_ids'].to(device)

    # Get embeddings
    embeddings = model.roberta.embeddings(input_ids).to(device)

    # Run the model forward
    model.zero_grad()
    outputs = model(input_ids)

    # Get the prediction
    prediction = torch.argmax(outputs[0])

    # Calculate attributions with Integrated Gradients and sum along the sequence dimension
    attributions = ig.attribute(inputs=embeddings, target=prediction)
    attributions_sum = attributions.sum(dim=-1).squeeze(0)

    # Tokenize the sentence with special tokens
    tokens_with_special_tokens = tokenizer.convert_ids_to_tokens(input_ids[0].tolist())

    # Find the token with the maximum attribution
    max_attrib_idx = attributions_sum.argmax().item()
    max_contrib_word = tokens_with_special_tokens[max_attrib_idx]

    # Append to the list
    max_contrib_words.append(max_contrib_word)

print(max_contrib_words)

# Displaying the counts of different values in max_contrib_words as percentages:
word_counts = Counter(max_contrib_words)
total_words = len(max_contrib_words)
percentage_counts = {word: (count / total_words) * 100 for word, count in word_counts.items()}

print("Word Percentages:")
for word, percentage in percentage_counts.items():
    print(f"{word}: {percentage:.2f}%")


Processing sentences: 100%|██████████| 430/430 [1:29:48<00:00, 12.53s/it]

['itch', 'Drag', 'Ġbitch', 'Drag', 'itch', 'Drag', 'Ġbitch', 'Ġass', 'The', 'Ġpussy', 'Ġbitch', '</s>', 'Race', 'Race', 'Ġwill', 'Ġhair', 'Drag', 'Ġass', 'He', 'Race', 'Ġis', '#', '.', 'k', 'Ġque', '@', 'Drag', '</s>', 'Stars', 'Race', 'Ġlike', 'Ġhope', 'Drag', 'Ġa', 'Drag', 'Drag', 'Ġvagina', 'Y', 'If', 'race', 'Drag', 'Drag', '</s>', 'Que', 'es', 'ĠThe', 'Race', 'ĠYou', 'Ġfucking', 'Drag', 'Ġcommunity', 'Ġbitch', 'ches', 'Race', 'ĠIts', 'Drag', 'Drag', 'Drag', 'Ġqueer', '</s>', 'Ġshut', 'Ġcurse', 'ĠThat', 'Ġpull', 'Ġshe', 'ches', 'I', 'Drag', 'She', 'ĠTik', 'ches', 'ĠSo', 'Drag', 'Drag', 'Ġass', 'Ġdrag', 'Drag', 'This', '</s>', 'Drag', 'ert', 'ina', 'Ġgay', '."', 'Drag', 'Ġfucking', 'Drag', 'Drag', 'Drag', 'Ġher', 'ĠI', 'Drag', 'OIL', 'Ġdrag', 'Drag', 'Ġqueer', 'Ġfucking', 'Ġass', 'Ġque', '</s>', '</s>', 'Race', 'Ġbitch', 'ĠBos', 'Miss', 'Drag', 'bo', 'Ġt', 'Drag', 'Id', 'Oh', 'Drag', 'Race', 'Drag', 'Ġc', 'Drag', 'Drag', 'Drag', 'Drag', 'Ġfucked', 'Ġate', 'Ġgays', 'Ġass', 'Drag', 'Ġ


