# Project Setup

Follow these steps to set up the necessary files and structure for the project.

## Folder Structure and Files

1. **Create a `data` Folder**: In the main project directory, create a folder named `data` and place the file `labeled_data_2.csv` inside it.

2. **Create a `.env` File**: In the main project directory, create a file named `.env`.

3. **Add Your Hugging Face API Key**:
   - Open the `.env` file and add the following line:
   
     ```plaintext
     HUGGINGFACE_API_KEY=your_api_key_here
     ```

   - Replace `your_api_key_here` with your actual Hugging Face API key.

In [1]:
# Standard Library Imports
import os
import json
import time
import random
from dotenv import load_dotenv

# Data Handling
import pandas as pd
import numpy as np
from datasets import Dataset

# NLP and Transformers
import transformers
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
from nltk.corpus import stopwords
import torch
import torch.nn as nn
from sklearn.metrics import f1_score, precision_score, recall_score

# API and Hugging Face Integration
import requests
from huggingface_hub import login

# AI APIs
import google.generativeai as genai
from googleapiclient import discovery
from openai import OpenAI

# Visualization
import matplotlib.pyplot as plt

# Utilities
from tqdm import tqdm
import ast

# huggingface API key
hf_api_key = os.getenv('HUGGINGFACE_API_KEY')
login(token=hf_api_key)

if torch.cuda.is_available():
    device = torch.device("cuda")
    device_name = torch.cuda.get_device_name(torch.cuda.current_device())
    print(f'Device in use: {device_name}')
else:
    device = torch.device("cpu")
    print('Device in use: CPU')

load_dotenv()

  from .autonotebook import tqdm as notebook_tqdm


The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to C:\Users\caboo\.cache\huggingface\token
Login successful
Device in use: NVIDIA GeForce RTX 3080 Laptop GPU


True

# Reading in data

In [2]:
df = pd.DataFrame()

###   VALIDATION DATASET   ###
df = pd.read_csv('data/labeled_data_2.csv')
df['combined'] = df['hateful'] | df['toxic']
print(df.head())

print(df.shape)


                                                text        timestamp  \
0     Expensive eh now that Uglyfoods closed down :(   30/1/2023 1:04   
1                How dare you.. wan go lim kopi ah??   4/5/2022 18:57   
2  Yeah the governments can politick all they wan...  28/6/2022 13:44   
3               Hijacks event, then complains. Wild.   12/7/2022 7:29   
4  Hate to break it to you. But once someone accu...   23/8/2023 2:08   

              username                                               link  \
0      MangoDangoLango  /r/singapore/comments/10nqt5h/rsingapore_rando...   
1               900122  /r/SingaporeRaw/comments/ui0rmg/dont_take_offe...   
2  DisillusionedSinkie  /r/singapore/comments/vmb197/malaysias_top_tal...   
3            nehjipain  /r/singapore/comments/vx42x1/nus_student_tried...   
4          KeenStudent  /r/singapore/comments/15ybdme/sorry_doesnt_cut...   

      link_id   parent_id       id subreddit_id  \
0  t3_10nqt5h  t1_j6dwxo8  j6fuv4x     t5_2qh8c

# Cleaning

In [3]:
df_normalized = df

### removing deleted or removed text ###
df_normalized = df_normalized[df_normalized['text'] != '[deleted]']
df_normalized = df_normalized[df_normalized['text'] != '[removed]']
df_normalized = df_normalized.dropna(subset=['text'])


# 2 Finding the best labeller
https://huggingface.co/sileod/deberta-v3-base-tasksource-toxicity


Originally, we planned to separate toxic and hate content to allow for a more nuanced analysis. However, after evaluating our approach, we realized that both types of content would lead to the same final recommendations. To streamline our process, we decided to combine toxic and hate categories instead of retaining them separately. This way, we focused our analysis on the combined data to drive straightforward, actionable insights. <br>

After testing, as sileod/deberta-v3-base-tasksource-toxicity has the best high f1 score and a takes a relatively low time to label the text data, we decided to use it. <br>
| Model                                         | Best Toxic F1 Score | Toxic Threshold | Best Hate F1 Score | Hate Threshold | Combined Best F1 Score | Combined Threshold | Time Taken |
|-----------------------------------------------|----------------------|-----------------|--------------------|----------------|------------------------|--------------------|------------|
| sileod/deberta-v3-base-tasksource-toxicity    | 0.547368            | 0.01            | 0.573034           | 0.04          | 0.675079               | 0.01               | 12s        |
| unitary/toxic-bert                            | 0.543689            | 0.00            | 0.513889           | 0.40          | 0.648649               | 0.00               | 4s         |
| GroNLP/hateBERT                               | 0.554455            | 0.40            | 0.397849           | 0.38          | 0.651584               | 0.38               | 4s         |
| textdetox/xlmr-large-toxicity-classifier      | 0.540146            | 0.00            | 0.493671           | 0.05          | 0.645598               | 0.00               | 4s         |
| facebook/roberta-hate-speech-dynabench-r4-target | 0.540146         | 0.00            | 0.422360           | 0.04          | 0.645598               | 0.00               | 4s         |
| cointegrated/rubert-tiny-toxicity             | 0.540146            | 0.00            | 0.429268           | 0.04          | 0.645598               | 0.00               | 1s         |
| badmatr11x/distilroberta-base-offensive-hateful-speech-text-multiclassification | 0.540146 | 0.00 | 0.391421 | 0.00 | 0.645598 | 0.00 | 2s         |
| citizenlab/distilbert-base-multilingual-cased-toxicity | 0.540146 | 0.00 | 0.48062  | 0.57 | 0.645598 | 0.00 | 3s         |
| GANgstersDev/singlish-hate-offensive-finetuned-model-v2.0.1 | 0.543689 | 0.00 | 0.395722 | 0.00 | 0.648649 | 0.00 | 3s         |
| Hate-speech-CNERG/dehatebert-mono-english     | 0.543689            | 0.00            | 0.476190           | 0.08          | 0.648649               | 0.00               | 4s         |
| cardiffnlp/twitter-roberta-base-hate          | 0.540146            | 0.00            | 0.423077           | 0.04          | 0.645598               | 0.00               | 4s         |
| Hate-speech-CNERG/bert-base-uncased-hatexplain | 0.571429           | 0.04            | 0.444444           | 0.04          | 0.658824               | 0.04               | 6s         |
| mrm8488/distilroberta-finetuned-tweets-hate-speech | 0.556122          | 0.04            | 0.391421           | 0.00          | 0.646226               | 0.04               | 2s         |
| meta-llama/Llama-3.2-1B-Instruct              | 0.0992908           | NaN             | 0.211382           | NaN           | 0.364341               | NaN                | 30s        |
| meta-llama/Llama-3.2-3B-Instruct              | 0.493023            | NaN             | 0.390244           | NaN           | 0.534483               | NaN                | 14min      |
| aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct | 0.514286         | NaN             | 0.324786           | NaN           | 0.517857               | NaN                | 53min      |

All the test results will be below

## 2.1 Testing Bert models

### 2.1.1 sileod/deberta-v3-base-tasksource-toxicity

In [4]:
# Choose model here
model = 'sileod/deberta-v3-base-tasksource-toxicity'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'hate':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::   3%|▎         | 9/300 [00:00<00:11, 24.58it/s]You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Classifying:: 100%|██████████| 300/300 [00:12<00:00, 23.15it/s]


Best Threshold for toxic: 0.01, Best F1 Score: 0.5473684210526316
Best Threshold for hateful: 0.04, Best F1 Score: 0.5730337078651685
Best Threshold for combined: 0.01, Best F1 Score: 0.6750788643533123

Counts for toxic:
temp_toxic
True     173
False    127
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    196
True     104
Name: count, dtype: int64

Counts for combined:
temp_combined
True     173
False    127
Name: count, dtype: int64


### 2.1.2 unitary/toxic-bert

In [5]:
# Choose model here
model = 'unitary/toxic-bert'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'toxic':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


  attn_output = torch.nn.functional.scaled_dot_product_attention(
Classifying:: 100%|██████████| 300/300 [00:05<00:00, 58.01it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5436893203883495
Best Threshold for hateful: 0.4, Best F1 Score: 0.5138888888888888
Best Threshold for combined: 0.0, Best F1 Score: 0.6486486486486487

Counts for toxic:
temp_toxic
True    300
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    230
True      70
Name: count, dtype: int64

Counts for combined:
temp_combined
True    300
Name: count, dtype: int64


### 2.1.3 GroNLP/hateBERT (loves to fluctuate .-.)

In [6]:
# Choose model here
model = 'GroNLP/hateBERT'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'LABEL_0':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at GroNLP/hateBERT and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Classifying:: 100%|██████████| 300/300 [00:05<00:00, 57.23it/s]


Best Threshold for toxic: 0.48, Best F1 Score: 0.5521126760563381
Best Threshold for hateful: 0.44, Best F1 Score: 0.3967391304347826
Best Threshold for combined: 0.43, Best F1 Score: 0.65

Counts for toxic:
temp_toxic
True     243
False     57
Name: count, dtype: int64

Counts for hateful:
temp_hateful
True     294
False      6
Name: count, dtype: int64

Counts for combined:
temp_combined
True     296
False      4
Name: count, dtype: int64


### 2.1.4 textdetox/xlmr-large-toxicity-classifier

In [7]:
# Choose model here
model = 'textdetox/xlmr-large-toxicity-classifier'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'toxic':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  38%|███▊      | 115/300 [00:02<00:03, 56.80it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (547 > 512). Running this sequence through the model will result in indexing errors
Classifying::  43%|████▎     | 128/300 [00:02<00:03, 54.69it/s]

Error processing hate speech at index 121: The expanded size of the tensor (547) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 547].  Tensor sizes: [1, 514]


Classifying:: 100%|██████████| 300/300 [00:05<00:00, 54.07it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5401459854014599
Best Threshold for hateful: 0.05, Best F1 Score: 0.4936708860759494
Best Threshold for combined: 0.0, Best F1 Score: 0.6455981941309256

Counts for toxic:
temp_toxic
True     299
False      1
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    216
True      84
Name: count, dtype: int64

Counts for combined:
temp_combined
True     299
False      1
Name: count, dtype: int64


### 2.1.5 facebook/roberta-hate-speech-dynabench-r4-target

In [8]:
# Choose model here
model = 'facebook/roberta-hate-speech-dynabench-r4-target'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'hate':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  39%|███▉      | 118/300 [00:02<00:03, 54.89it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (532 > 512). Running this sequence through the model will result in indexing errors
Classifying::  44%|████▎     | 131/300 [00:02<00:02, 57.14it/s]

Error processing hate speech at index 121: The expanded size of the tensor (532) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 532].  Tensor sizes: [1, 514]


Classifying:: 100%|██████████| 300/300 [00:05<00:00, 54.22it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5401459854014599
Best Threshold for hateful: 0.04, Best F1 Score: 0.422360248447205
Best Threshold for combined: 0.0, Best F1 Score: 0.6455981941309256

Counts for toxic:
temp_toxic
True     299
False      1
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    213
True      87
Name: count, dtype: int64

Counts for combined:
temp_combined
True     299
False      1
Name: count, dtype: int64


### 2.1.6 cointegrated/rubert-tiny-toxicity 

In [9]:
# Choose model here
model = 'cointegrated/rubert-tiny-toxicity'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'non-toxic':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x < threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] < best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  38%|███▊      | 115/300 [00:00<00:01, 122.63it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (553 > 512). Running this sequence through the model will result in indexing errors
Classifying::  47%|████▋     | 142/300 [00:01<00:01, 121.27it/s]

Error processing hate speech at index 121: The size of tensor a (553) must match the size of tensor b (512) at non-singleton dimension 1


Classifying:: 100%|██████████| 300/300 [00:02<00:00, 121.58it/s]


Best Threshold for toxic: 1.0, Best F1 Score: 0.5401459854014599
Best Threshold for hateful: 0.96, Best F1 Score: 0.4292682926829268
Best Threshold for combined: 1.0, Best F1 Score: 0.6455981941309256

Counts for toxic:
temp_toxic
True     299
False      1
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    169
True     131
Name: count, dtype: int64

Counts for combined:
temp_combined
True     299
False      1
Name: count, dtype: int64


### 2.1.7 badmatr11x/distilroberta-base-offensive-hateful-speech-text-multiclassification

In [10]:
# Choose model here
model = 'badmatr11x/distilroberta-base-offensive-hateful-speech-text-multiclassification'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'NEITHER':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x < threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] < best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  39%|███▊      | 116/300 [00:01<00:01, 99.77it/s] Token indices sequence length is longer than the specified maximum sequence length for this model (532 > 512). Running this sequence through the model will result in indexing errors
Classifying::  42%|████▏     | 127/300 [00:01<00:01, 99.08it/s]

Error processing hate speech at index 121: The expanded size of the tensor (532) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 532].  Tensor sizes: [1, 514]


Classifying:: 100%|██████████| 300/300 [00:03<00:00, 94.94it/s]


Best Threshold for toxic: 1.0, Best F1 Score: 0.5401459854014599
Best Threshold for hateful: 1.0, Best F1 Score: 0.3914209115281501
Best Threshold for combined: 1.0, Best F1 Score: 0.6455981941309256

Counts for toxic:
temp_toxic
True     299
False      1
Name: count, dtype: int64

Counts for hateful:
temp_hateful
True     299
False      1
Name: count, dtype: int64

Counts for combined:
temp_combined
True     299
False      1
Name: count, dtype: int64


### 2.1.8 citizenlab/distilbert-base-multilingual-cased-toxicity

In [11]:
# Choose model here
model = 'citizenlab/distilbert-base-multilingual-cased-toxicity'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'toxic':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  38%|███▊      | 114/300 [00:01<00:02, 80.38it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (553 > 512). Running this sequence through the model will result in indexing errors
Classifying::  44%|████▎     | 131/300 [00:01<00:02, 76.14it/s]

Error processing hate speech at index 121: The size of tensor a (553) must match the size of tensor b (512) at non-singleton dimension 1


Classifying:: 100%|██████████| 300/300 [00:03<00:00, 75.48it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5401459854014599
Best Threshold for hateful: 0.5700000000000001, Best F1 Score: 0.4806201550387597
Best Threshold for combined: 0.0, Best F1 Score: 0.6455981941309256

Counts for toxic:
temp_toxic
True     299
False      1
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    245
True      55
Name: count, dtype: int64

Counts for combined:
temp_combined
True     299
False      1
Name: count, dtype: int64


### 2.1.9 GANgstersDev/singlish-hate-offensive-finetuned-model-v2.0.1 <br> 
class_labels = ["neither", "offensive", "hate"] 

In [12]:
# Choose model here
model = 'GANgstersDev/singlish-hate-offensive-finetuned-model-v2.0.1'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'LABEL_2':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying:: 100%|██████████| 300/300 [00:04<00:00, 69.06it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5436893203883495
Best Threshold for hateful: 0.0, Best F1 Score: 0.39572192513368987
Best Threshold for combined: 0.0, Best F1 Score: 0.6486486486486487

Counts for toxic:
temp_toxic
True    300
Name: count, dtype: int64

Counts for hateful:
temp_hateful
True    300
Name: count, dtype: int64

Counts for combined:
temp_combined
True    300
Name: count, dtype: int64


### 2.1.10 Hate-speech-CNERG/dehatebert-mono-english

In [13]:
# Choose model here
model = 'Hate-speech-CNERG/dehatebert-mono-english'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'HATE':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying:: 100%|██████████| 300/300 [00:05<00:00, 55.78it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5436893203883495
Best Threshold for hateful: 0.08, Best F1 Score: 0.47619047619047616
Best Threshold for combined: 0.0, Best F1 Score: 0.6486486486486487

Counts for toxic:
temp_toxic
True    300
Name: count, dtype: int64

Counts for hateful:
temp_hateful
False    206
True      94
Name: count, dtype: int64

Counts for combined:
temp_combined
True    300
Name: count, dtype: int64


### 2.1.11 cardiffnlp/twitter-roberta-base-hate

In [14]:
# Choose model here
model = 'cardiffnlp/twitter-roberta-base-hate'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'hate':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  43%|████▎     | 129/300 [00:02<00:03, 56.72it/s]

Error processing hate speech at index 121: The expanded size of the tensor (532) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 532].  Tensor sizes: [1, 514]


Classifying:: 100%|██████████| 300/300 [00:05<00:00, 53.97it/s]


Best Threshold for toxic: 0.0, Best F1 Score: 0.5401459854014599
Best Threshold for hateful: 0.04, Best F1 Score: 0.4230769230769231
Best Threshold for combined: 0.0, Best F1 Score: 0.6455981941309256

Counts for toxic:
temp_toxic
True     299
False      1
Name: count, dtype: int64

Counts for hateful:
temp_hateful
True     186
False    114
Name: count, dtype: int64

Counts for combined:
temp_combined
True     299
False      1
Name: count, dtype: int64


### 2.1.12 Hate-speech-CNERG/bert-base-uncased-hatexplain

In [15]:
# Choose model here
model = 'Hate-speech-CNERG/bert-base-uncased-hatexplain'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'hate speech':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying:: 100%|██████████| 300/300 [00:07<00:00, 42.52it/s]


Best Threshold for toxic: 0.04, Best F1 Score: 0.5714285714285714
Best Threshold for hateful: 0.04, Best F1 Score: 0.4444444444444444
Best Threshold for combined: 0.04, Best F1 Score: 0.6588235294117647

Counts for toxic:
temp_toxic
True     196
False    104
Name: count, dtype: int64

Counts for hateful:
temp_hateful
True     196
False    104
Name: count, dtype: int64

Counts for combined:
temp_combined
True     196
False    104
Name: count, dtype: int64


### 2.1.13 mrm8488/distilroberta-finetuned-tweets-hate-speech

In [16]:
# Choose model here
model = 'mrm8488/distilroberta-finetuned-tweets-hate-speech'

# Initialize the hate classifier
pipe = pipeline("text-classification", model=model, return_all_scores=True, device=device)

# Create a list to store the predicted scores for each text
df_normalized['temp_score'] = np.nan  # To store the hate score

test_response = pipe("I hate you")
# import pdb; pdb.set_trace()

# Process the texts and save the prediction scores
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying:"):
    text = row['text']
    
    # Skip invalid texts
    if not isinstance(text, str) or text.strip() == "":
        print(f"Invalid text at index {index}. Skipping row.")
        continue

    try:
        # Get predictions from pipe
        prediction = pipe(text)
        
        # Extract the score for 'hate' label
        for pred in prediction[0]:
            label = pred['label']
            score = pred['score']
            if label == 'LABEL_0':  # CHECK THE LABEL HERE
                df_normalized.at[index, 'temp_score'] = score
                break

    except Exception as e:
        print(f"Error processing hate speech at index {index}: {e}")

# Function to calculate F1 score for different thresholds and true label columns
def calculate_f1_for_threshold(df, threshold, true_labels):
    # Predict 'True' for hate if the score is above the threshold
    predicted_labels = df['temp_score'].apply(lambda x: True if x >= threshold else False)
    return f1_score(true_labels, predicted_labels)

# List of columns to compare against
label_columns = ['toxic', 'hateful', 'combined']

# Dictionary to store the best thresholds and F1 scores for each column
best_results = {}

# Iterate over each label column ('toxic', 'hateful', 'combined')
for label_column in label_columns:
    true_labels = df_normalized[label_column]
    
    best_threshold = 0
    best_f1 = 0
    
    # Search for the best threshold by calculating F1 score for different thresholds
    thresholds = np.linspace(0, 1, 101)  # Try thresholds between 0 and 1 in 0.01 increments
    
    for threshold in thresholds:
        f1 = calculate_f1_for_threshold(df_normalized, threshold, true_labels)
        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold
    
    # Store the results for this label column
    best_results[label_column] = {'Best Threshold': best_threshold, 'Best F1 Score': best_f1}

    print(f"Best Threshold for {label_column}: {best_threshold}, Best F1 Score: {best_f1}")

# Apply the best threshold for each label column to label texts as hateful or not
for label_column in label_columns:
    best_threshold = best_results[label_column]['Best Threshold']
    df_normalized[f'temp_{label_column}'] = df_normalized['temp_score'] >= best_threshold

# Print the final counts of hateful and non-hateful texts for each label column
for label_column in label_columns:
    print(f"\nCounts for {label_column}:")
    print(df_normalized[f'temp_{label_column}'].value_counts())

del pipe


Classifying::  39%|███▊      | 116/300 [00:01<00:02, 91.58it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (532 > 512). Running this sequence through the model will result in indexing errors
Classifying::  45%|████▌     | 136/300 [00:01<00:01, 92.59it/s]

Error processing hate speech at index 121: The expanded size of the tensor (532) must match the existing size (514) at non-singleton dimension 1.  Target sizes: [1, 532].  Tensor sizes: [1, 514]


Classifying:: 100%|██████████| 300/300 [00:03<00:00, 87.22it/s]


Best Threshold for toxic: 0.04, Best F1 Score: 0.5561224489795918
Best Threshold for hateful: 0.0, Best F1 Score: 0.3914209115281501
Best Threshold for combined: 0.04, Best F1 Score: 0.6462264150943396

Counts for toxic:
temp_toxic
True     280
False     20
Name: count, dtype: int64

Counts for hateful:
temp_hateful
True     299
False      1
Name: count, dtype: int64

Counts for combined:
temp_combined
True     280
False     20
Name: count, dtype: int64


## 2.2 testing Llama models

### 2.2.1 meta-llama/Llama-3.2-1B-Instruct

In [17]:
pipe = pipeline("text-generation", model='meta-llama/Llama-3.2-1B-Instruct', device_map='auto')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [18]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is toxic or hateful. Please respond with only False (not toxic or hateful) or True (very toxic or hateful) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_combined'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_combined'] = df_normalized['Llama_combined'].astype(bool)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 2/300 [00:00<00:49,  6.04it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|▏         | 4/300 [00:00<00:36,  8.18it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 5/300 [00:00<00:35,  8.24it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 6/300 [00:00<00:34,  8.41it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and h

In [19]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is toxic. Please respond with only False (not toxic) or True (very toxic) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_toxic'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_toxic'] = df_normalized['Llama_toxic'].astype(bool)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 3/300 [00:00<00:27, 10.63it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 5/300 [00:00<00:28, 10.28it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 7/300 [00:00<00:26, 10.96it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   3%|▎         | 9/300 [00:00<00:25, 11.31it/s]Setting `pad_token_id

In [20]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is hateful. Please respond with only False (not hateful) or True (very hateful) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_hate'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_hate'] = df_normalized['Llama_hate'].astype(bool)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 3/300 [00:00<00:27, 10.72it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 5/300 [00:00<00:28, 10.25it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 7/300 [00:00<00:26, 10.86it/s]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   3%|▎         | 9/300 [00:00<00:25, 11.21it/s]Setting `pad_token_id

In [21]:
print(df_normalized['Llama_combined'].value_counts())
# print(df_normalized['combined'].value_counts())
f1 = f1_score(df_normalized['combined'], df_normalized['Llama_combined'])
print(f"F1 Score: {f1}")

print(df_normalized['Llama_toxic'].value_counts())
f1 = f1_score(df_normalized['toxic'], df_normalized['Llama_toxic'])
print(f"F1 Score: {f1}")

print(df_normalized['Llama_hate'].value_counts())
f1 = f1_score(df_normalized['hateful'], df_normalized['Llama_hate'])
print(f"F1 Score: {f1}")

Llama_combined
False    186
True     114
Name: count, dtype: int64
F1 Score: 0.3643410852713178
Llama_toxic
False    271
True      29
Name: count, dtype: int64
F1 Score: 0.09929078014184398
Llama_hate
False    251
True      49
Name: count, dtype: int64
F1 Score: 0.21138211382113822


In [22]:
del pipe

### 2.2.2 meta-llama/Llama-3.2-3B-Instruct

In [4]:
pipe = pipeline("text-generation", model='meta-llama/Llama-3.2-3B-Instruct', device_map='auto')

Loading checkpoint shards: 100%|██████████| 2/2 [00:04<00:00,  2.33s/it]
Some parameters are on the meta device because they were offloaded to the cpu.


In [5]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is toxic or hateful. Please respond with only False (not toxic or hateful) or True (very toxic or hateful) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_combined'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_combined'] = df_normalized['Llama_combined'].astype(bool)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 2/300 [00:05<14:21,  2.89s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 3/300 [00:08<13:41,  2.77s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|▏         | 4/300 [00:11<13:31,  2.74s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 5/300 [00:14<13:37,  2.77s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefu

In [6]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is toxic. Please respond with only False (not toxic) or True (very toxic) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_toxic'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_toxic'] = df_normalized['Llama_toxic'].astype(bool)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 2/300 [00:05<12:45,  2.57s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 3/300 [00:07<12:37,  2.55s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|▏         | 4/300 [00:10<12:20,  2.50s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 5/300 [00:12<12:49,  2.61s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 6/300 [00:15<12:41,  2.59s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 7/300 [00:17<12:31,  2.

In [7]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is hateful. Please respond with only False (not hateful) or True (very hateful) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_hate'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_hate'] = df_normalized['Llama_hate'].astype(bool)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 2/300 [00:04<12:10,  2.45s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|          | 3/300 [00:07<12:12,  2.47s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   1%|▏         | 4/300 [00:09<12:05,  2.45s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 5/300 [00:12<12:18,  2.50s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 6/300 [00:14<12:11,  2.49s/it]Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Classifying toxicity and hatefulness:   2%|▏         | 7/300 [00:17<12:05,  2.

In [8]:
print(df_normalized['Llama_combined'].value_counts())
# print(df_normalized['combined'].value_counts())
f1 = f1_score(df_normalized['combined'], df_normalized['Llama_combined'])
print(f"F1 Score: {f1}")

print(df_normalized['Llama_toxic'].value_counts())
f1 = f1_score(df_normalized['toxic'], df_normalized['Llama_toxic'])
print(f"F1 Score: {f1}")

print(df_normalized['Llama_hate'].value_counts())
f1 = f1_score(df_normalized['hateful'], df_normalized['Llama_hate'])
print(f"F1 Score: {f1}")

Llama_combined
False    212
True      88
Name: count, dtype: int64
F1 Score: 0.5344827586206896
Llama_toxic
False    197
True     103
Name: count, dtype: int64
F1 Score: 0.4930232558139535
Llama_hate
False    251
True      49
Name: count, dtype: int64
F1 Score: 0.3902439024390244


In [9]:
del pipe

### 2.2.3 aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct  

In [4]:
pipe = pipeline("text-generation", model='aisingapore/llama3-8b-cpt-sea-lionv2.1-instruct', device_map='auto')

Loading checkpoint shards: 100%|██████████| 4/4 [00:18<00:00,  4.73s/it]
Some parameters are on the meta device because they were offloaded to the disk and cpu.


In [5]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is toxic or hateful. Please respond with only False (not toxic or hateful) or True (very toxic or hateful) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_combined'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_combined'] = df_normalized['Llama_combined'].astype(bool)

Classifying toxicity and hatefulness:   0%|          | 0/300 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
  attn_output = torch.nn.functional.scaled_dot_product_attention(
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)
Classifying toxicity and hatefulness:   3%|▎         | 10/300 [01:58<52:18, 10.82s/it] You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset
Classifying toxicity and hatefulness: 100%|██████████| 300/300 [53:04<00:00, 10.61s/it]


In [6]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is toxic. Please respond with only False (not toxic) or True (very toxic) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_toxic'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_toxic'] = df_normalized['Llama_toxic'].astype(bool)

Classifying toxicity and hatefulness: 100%|██████████| 300/300 [53:01<00:00, 10.61s/it]


In [7]:
for index, row in tqdm(df_normalized.iterrows(), total=df_normalized.shape[0], desc="Classifying toxicity and hatefulness"):
    text = row['text']
    messages = [
        {"role": "system", "content": "You are a moderator on the subreddit r/Singapore. You are reviewing a user's comment to determine if it is hateful. Please respond with only False (not hateful) or True (very hateful) for the following comment."},
        {"role": "user", "content": text},
    ]
    response = pipe(messages, max_new_tokens=50, do_sample=False, truncation=True)
    df_normalized.at[index, 'Llama_hate'] = (response[0]['generated_text'][2]['content'] == 'True')

df_normalized['Llama_hate'] = df_normalized['Llama_hate'].astype(bool)

Classifying toxicity and hatefulness: 100%|██████████| 300/300 [50:46<00:00, 10.15s/it]


In [8]:
print(df_normalized['Llama_combined'].value_counts())
# print(df_normalized['combined'].value_counts())
f1 = f1_score(df_normalized['combined'], df_normalized['Llama_combined'])
print(f"F1 Score: {f1}")

print(df_normalized['Llama_toxic'].value_counts())
f1 = f1_score(df_normalized['toxic'], df_normalized['Llama_toxic'])
print(f"F1 Score: {f1}")

print(df_normalized['Llama_hate'].value_counts())
f1 = f1_score(df_normalized['hateful'], df_normalized['Llama_hate'])
print(f"F1 Score: {f1}")

Llama_combined
False    220
True      80
Name: count, dtype: int64
F1 Score: 0.5178571428571429
Llama_toxic
False    202
True      98
Name: count, dtype: int64
F1 Score: 0.5142857142857142
Llama_hate
False    257
True      43
Name: count, dtype: int64
F1 Score: 0.3247863247863248


In [9]:
del pipe