In [123]:
import pandas as pd
import torch
import torch.nn as nn
from transformers import pipeline
from nltk.corpus import stopwords
import matplotlib.pyplot as plt
import os
from dotenv import load_dotenv
import time
from huggingface_hub import login
import google.generativeai as genai
import os
from tqdm import tqdm
from datasets import Dataset
import json
import openai
from openai import OpenAI
import anthropic
import re
from sklearn.metrics import f1_score
from googleapiclient import discovery

if torch.cuda.is_available():
    device = torch.device("cuda")
    device_name = torch.cuda.get_device_name(torch.cuda.current_device())
    print(f'Device in use: {device_name}')
else:
    device = torch.device("cpu")
    print('Device in use: CPU')

load_dotenv()

Device in use: CPU


True

# Reading in data

## Dataset Description

This dataset contains information on Reddit posts, with columns providing details about each post's metadata and toxicity/hatefulness ratings. It includes a set of 300 expertly-labeled comments that serve as the "ground truth" for toxicity and hatefulness.

### Columns

- **text**: The content of the Reddit post or comment.
- **timestamp**: The timestamp indicating when the post was created.
- **username**: The username of the Reddit user who posted the content.
- **link**: The URL link to the post.
- **link_id**: A unique identifier for the post link.
- **parent_id**: The identifier of the parent post or comment, indicating a reply relationship.
- **id**: A unique identifier for each post or comment.
- **subreddit_id**: The identifier of the subreddit where the post was made.
- **moderation**: Information regarding the moderation status of the post.
- **toxic**: A binary label indicating whether the post is classified as toxic (TRUE for toxic, FALSE for non-toxic).
- **hateful**: A binary label indicating whether the post is classified as hateful (TRUE for hateful, FALSE for non-hateful).

### Ground Truth Labeling

For this dataset: df_expert, a subset of 300 specifically chosen comments has been expertly labeled for toxicity and hatefulness. These labels (TRUE/FALSE) were manually assigned and represent the ground truth for evaluating toxicity and hatefulness. This subset allows for accurate model training and validation, ensuring that the classification aligns closely with expert judgment.

This dataset is intended for analyzing the prevalence of toxic and hateful content on Reddit by subreddit and user engagement. The `toxic` and `hateful` columns provide essential insights into problematic content, which may inform moderation strategies.


In [None]:
df_expert = pd.read_csv('../data/labeled_data_2.csv')

In [128]:
df_expert['hate_toxic'] = df_expert[['toxic', 'hateful']].any(axis=1)
print(df_expert.columns)
print(df_expert['hate_toxic'].value_counts())

Index(['text', 'timestamp', 'username', 'link', 'link_id', 'parent_id', 'id',
       'subreddit_id', 'moderation', 'toxic', 'hateful', 'BERT_toxic',
       'BERT_hateful', 'BERT_hate_toxic', 'hate_toxic', 'claude_hate',
       'claude_toxic', 'claude_hate_toxic', 'gpt_hateful', 'gpt_toxic',
       'gpt_hate_toxic', 'toxicity_score', 'severe_toxicity_score',
       'insult_score', 'identity_attack_score', 'predicted_toxic',
       'predicted_hate', 'predicted_hate_toxic'],
      dtype='object')
hate_toxic
False    156
True     144
Name: count, dtype: int64


# Using API-Based Models to label our data

## Claude 3 Haiku

### Code Rationale

This code is designed to classify text data based on two attributes: **toxicity** and **hatefulness**, using the Claude language model API. Specifically, the code assesses each text entry in `df_expert` according to culturally relevant definitions of toxicity and hatefulness in the Singaporean context. By incorporating **context prompting**, the model is guided to interpret language within the nuances of Singaporean culture, including slang and sensitive topics. The results are used to create three columns in the DataFrame:

1. **`claude_hate`**: Indicates if the text is classified as hateful.
2. **`claude_toxic`**: Indicates if the text is classified as toxic.
3. **`claude_hate_toxic`**: Combines the `claude_hate` and `claude_toxic` columns, indicating `True` if the text meets either classification, otherwise `False`.

### Step-by-Step Explanation

1. **API Initialization**:
   - The code initializes an API client for the Claude model by providing an API key stored in an environment variable (`CLAUDE_API_KEY`).

2. **Function Definition (`get_toxicity_evaluation_claude`)**:
   - The function `get_toxicity_evaluation_claude` takes a single `text` input and prompts the Claude API to evaluate it for both toxicity and hatefulness.
   - A custom prompt (`message_content`) is crafted to provide **context prompting** by specifying Singaporean cultural nuances and definitions of **toxic** and **hateful** language. This helps the model understand local slang, such as **Singlish** (e.g., terms like “ceca” or “377a”) and cultural sensitivities relevant to Singapore.

3. **API Response Handling**:
   - The Claude API response is parsed to determine if the text is toxic or hateful.
   - Using regular expressions (`re.search`), the function extracts `TRUE` or `FALSE` labels for each category from the API's response.
   - The function then returns `True` or `False` values for toxicity and hatefulness in a pandas Series.

4. **Applying the Function to Data**:
   - The `progress_apply` method is used to apply `get_toxicity_evaluation_claude` across all text entries in `df_expert`, with a progress bar displayed via `tqdm`.
   - The returned values are stored in two new columns: `claude_hate` and `claude_toxic`.

5. **Combining Toxicity and Hatefulness Classification**:
   - The code creates a new column, `claude_hate_toxic`, which is set to `True` if either `claude_hate` or `claude_toxic` is `True`.
   - This column provides a unified classification that flags text as either toxic or hateful, enhancing the analysis with a single indicator.

6. **Output**:
   - Finally, the code prints the first few rows of the DataFrame, showing `text`, `claude_hate`, `claude_toxic`, and `claude_hate_toxic`, to verify the classifications.

### How Context Prompting is Applied

The prompt leverages **context prompting** by setting expectations around cultural and linguistic context for the Claude model. This approach includes:

- **Guiding the model to understand Singaporean cultural nuances**: By explicitly mentioning Singlish and relevant slang, the prompt helps the model adapt its interpretation to a Singaporean audience, understanding unique terms and references that may be unfamiliar or misinterpreted otherwise.
- **Defining "toxic" and "hateful" language with culturally relevant examples**: The examples provided (e.g., “ceca” and “377a”) are specific to Singaporean culture, helping the model recognize potentially offensive language that is unique to this setting.
  
This context-driven approach improves classification accuracy by aligning the model’s interpretation with local linguistic and cultural nuances, making it highly effective for analyzing toxicity and hatefulness within Singaporean discourse. The resulting classifications can then be used for further analysis or to inform content moderation and policy interventions.


In [111]:
client = anthropic.Anthropic(api_key=os.getenv("CLAUDE_API_KEY"),)

In [112]:
import re
import anthropic
from tqdm import tqdm
import pandas as pd
import os

client = anthropic.Anthropic(
    api_key=os.getenv("CLAUDE_API_KEY")
)

def get_toxicity_evaluation_claude(text):
    message_content = (
        f"Please evaluate the following text for hatefulness and toxicity in the context of Singapore’s culture, "
        f"where Singlish and Singaporean slang may be present. Interpret the language based on these definitions and examples:\n\n"
        f"- **Toxic**: Language that is hostile, aggressive, or disrespectful, likely to cause offense or discomfort. "
        f"Examples include derogatory comments, vulgar language, or personal attacks, such as 'ceca' used in an offensive context, "
        f"or '377a' when implying derogatory meaning.\n"
        f"- **Hateful**: Language that shows prejudice or hostility toward a specific group, race, nationality, or identity. "
        f"This includes slurs, racist or xenophobic remarks, and targeted hate speech. Examples might include statements intended "
        f"to marginalize groups based on ethnicity, sexuality, or nationality.\n\n"
        f"Respond with TRUE or FALSE for each category based on the definitions above:\n"
        f"- **Toxic**: TRUE or FALSE\n"
        f"- **Hateful**: TRUE or FALSE\n\n"
        f"Text to evaluate: {text}"
    )

    response = client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=50,
        messages=[
            {"role": "user", "content": message_content}
        ]
    )
    response_text = response.content[0].text  
    toxic_match = re.search(r'Toxic:\s*(TRUE|FALSE)', response_text, re.IGNORECASE)
    hateful_match = re.search(r'Hateful:\s*(TRUE|FALSE)', response_text, re.IGNORECASE)
    toxic = toxic_match.group(1).upper() == "TRUE" if toxic_match else False
    hateful = hateful_match.group(1).upper() == "TRUE" if hateful_match else False
    
    return pd.Series([toxic, hateful])

tqdm.pandas()
df_expert[['claude_hate', 'claude_toxic']] = df_expert['text'].progress_apply(
    lambda x: pd.Series(get_toxicity_evaluation_claude(x))
)

# Create the claude_hate_toxic column based on claude_hate or claude_toxic being True
df_expert['claude_hate_toxic'] = df_expert[['claude_hate', 'claude_toxic']].any(axis=1)

print(df_expert[['text', 'claude_hate', 'claude_toxic', 'claude_hate_toxic']].head())


100%|██████████| 300/300 [05:19<00:00,  1.07s/it]

                                                text  claude_hate  \
0     Expensive eh now that Uglyfoods closed down :(        False   
1                How dare you.. wan go lim kopi ah??        False   
2  Yeah the governments can politick all they wan...        False   
3               Hijacks event, then complains. Wild.        False   
4  Hate to break it to you. But once someone accu...        False   

   claude_toxic  claude_hate_toxic  
0         False              False  
1         False              False  
2         False              False  
3         False              False  
4         False              False  





In [114]:
print(df_expert['claude_hate_toxic'].value_counts())

claude_hate_toxic
False    259
True      41
Name: count, dtype: int64


### Code Rationale

The code calculates the F1-score to evaluate the performance of the Claude model's classification of combined hatefulness and toxicity in text. The F1-score is a useful metric here, as it provides a balance between **precision** (how many of the predicted "hate_toxic" instances were actually correct) and **recall** (how many of the actual "hate_toxic" instances were detected by the model). This score helps assess the effectiveness of the model in identifying texts that meet either the hatefulness or toxicity criteria.

#### Explanation of Each Step

1. **Calculate the F1-Score**:
   - The `f1_score` function from `sklearn.metrics` compares the model's predictions (`claude_hate_toxic`) against the true labels (`hate_toxic`), which represents the ground truth.
   - `df_expert['hate_toxic']` is the column containing the true labels, indicating whether each text was deemed either hateful or toxic in the original dataset.
   - `df_expert['claude_hate_toxic']` is the predicted label from the Claude model, where `True` indicates the model classified the text as either hateful or toxic.

2. **Store the F1-Score in a Variable**:
   - `f1_score_claude_hate_toxic` stores the resulting F1-score, making it easy to access or print for further analysis.

3. **Output the F1-Score**:
   - `print(f1_score_claude_hate_toxic)` outputs the F1-score, allowing a quick assessment of the Claude model's performance in detecting hatefulness or toxicity.

### Purpose and Importance of the F1-Score

The F1-score provides insight into the model's effectiveness in a balanced manner, as it considers both:
- **Precision**: How accurately the model's "hate_toxic" predictions align with the true instances.
- **Recall**: How well the model captures all the true "hate_toxic" instances.

By calculating the F1-score, this code assesses how well the Claude model captures the intended classifications, helping gauge its suitability for use in content moderation, analysis, or policy interventions related to hatefulness and toxicity.


In [116]:
f1_score_claude_hate_toxic = f1_score(df_expert['hate_toxic'], df_expert['claude_hate_toxic'])
print(f1_score_claude_hate_toxic)

0.3567567567567568


## chatGPT 4o mini

### Code Rationale

This code uses OpenAI's GPT model to classify text data based on two attributes, **toxicity** and **hatefulness**, within the Singaporean cultural context. It evaluates each text entry in `df_expert`, leveraging the model to interpret language with Singaporean nuances, including **Singlish** and culturally specific expressions. The results are stored in three columns in the DataFrame:

1. **`gpt_hateful`**: Indicates if the text is classified as hateful by the GPT model.
2. **`gpt_toxic`**: Indicates if the text is classified as toxic by the GPT model.
3. **`gpt_hate_toxic`**: Combines `gpt_hateful` and `gpt_toxic`, showing `True` if the text meets either classification, otherwise `False`.

### Step-by-Step Explanation

1. **API Initialization**:
   - The OpenAI API key is set up using an environment variable (`OPENAI_API_KEY`), allowing secure access to the GPT model.

2. **Function Definition (`get_hate_toxic_evaluation`)**:
   - The function `get_hate_toxic_evaluation` takes a single `text` input and sends a request to the OpenAI GPT model to evaluate both toxicity and hatefulness.
   - A structured prompt is crafted to include:
     - **Context prompting**: It specifies the need to evaluate the text within Singapore’s socio-cultural context, including Singlish and local slang.
     - **Definitions**: Detailed definitions of "toxic" and "hateful" language are provided to guide the model's classification.
     - **Examples**: Culturally relevant examples (e.g., “ceca” and “377a”) help the model recognize locally specific offensive language.
     - **Response Format Request**: The prompt asks the model to respond with “TRUE” or “FALSE” for both toxicity and hatefulness, ensuring structured responses that can be parsed easily.

3. **Parsing the API Response**:
   - The function parses the model's response using regular expressions (`re.search`) to extract the "TRUE" or "FALSE" labels for each category (toxic and hateful).
   - If a match is found, the label is set to `True` or `False` accordingly; otherwise, it defaults to `False`.
   - The function returns the classification as a pandas Series containing the values for `hateful` and `toxic`.

4. **Applying the Function to Data**:
   - `progress_apply` is used to apply `get_hate_toxic_evaluation` across each text entry in `df_expert`, with `tqdm` providing a progress bar for real-time tracking.
   - The results are stored in two new columns, `gpt_hateful` and `gpt_toxic`.

5. **Combining Toxicity and Hatefulness Classification**:
   - The code then creates a new column, `gpt_hate_toxic`, which is `True` if either `gpt_hateful` or `gpt_toxic` is `True`. This provides a combined classification that flags any text deemed either hateful or toxic by the model.

6. **Output**:
   - Finally, the code prints the first few rows of the DataFrame, displaying `text`, `gpt_hateful`, `gpt_toxic`, and `gpt_hate_toxic` to verify the classifications.

### Purpose and Importance

This approach leverages OpenAI's GPT model to interpret nuanced language within a specific cultural framework. The classification results can be used for further analysis, content moderation, or policy-making related to identifying and managing toxicity and hatefulness within Singaporean online discourse. By combining `gpt_hateful` and `gpt_toxic` into `gpt_hate_toxic`, this method simplifies analysis, providing a single indicator for any harmful content detected.


In [119]:
import re
import openai
import pandas as pd
from tqdm import tqdm

openai.api_key = os.getenv("OPENAI_API_KEY")
def get_hate_toxic_evaluation(text):
    messages = [
        {
            "role": "user",
            "content": f"""
            Evaluate the following text for hatefulness and toxicity in the context of Singapore’s culture, where Singlish and Singaporean slang may be present. Interpret the language based on these definitions and examples:

            Toxic: Language that is hostile, aggressive, or disrespectful, likely to cause offense or discomfort. Examples include derogatory comments, vulgar language, or personal attacks, such as “ceca” used in an offensive context, or “377a” when implying derogatory meaning.
            Hateful: Language that shows prejudice or hostility toward a specific group, race, nationality, or identity. This includes slurs, racist or xenophobic remarks, and targeted hate speech. Examples might include statements intended to marginalize groups based on ethnicity, sexuality, or nationality.
            Respond with TRUE or FALSE for each category based on the definitions above:

            Toxic: TRUE or FALSE
            Hateful: TRUE or FALSE
            Text to evaluate: {text}
            """
        }
    ]
    completion = openai.chat.completions.create(
        model='gpt-4o-mini-2024-07-18',
        messages=messages,
        max_tokens=50  
    )
    response = completion.choices[0].message.content
    toxic_match = re.search(r'Toxic:\s*(TRUE|FALSE)', response, re.IGNORECASE)
    hateful_match = re.search(r'Hateful:\s*(TRUE|FALSE)', response, re.IGNORECASE)
    toxic = toxic_match.group(1).upper() == "TRUE" if toxic_match else False
    hateful = hateful_match.group(1).upper() == "TRUE" if hateful_match else False
    
    return hateful, toxic

tqdm.pandas()
df_expert[['gpt_hateful', 'gpt_toxic']] = df_expert['text'].progress_apply(
    lambda x: pd.Series(get_hate_toxic_evaluation(x))
)
df_expert['gpt_hate_toxic'] = df_expert[['gpt_hateful', 'gpt_toxic']].any(axis=1)

print(df_expert[['text', 'gpt_hateful', 'gpt_toxic', 'gpt_hate_toxic']].head())


100%|██████████| 300/300 [06:26<00:00,  1.29s/it]

                                                text  gpt_hateful  gpt_toxic  \
0     Expensive eh now that Uglyfoods closed down :(        False      False   
1                How dare you.. wan go lim kopi ah??        False       True   
2  Yeah the governments can politick all they wan...        False      False   
3               Hijacks event, then complains. Wild.        False       True   
4  Hate to break it to you. But once someone accu...        False       True   

   gpt_hate_toxic  
0           False  
1            True  
2           False  
3            True  
4            True  





In [120]:
print(df_expert['gpt_hate_toxic'].value_counts())

gpt_hate_toxic
True     167
False    133
Name: count, dtype: int64


### Code Explanation

This code calculates the F1-score to evaluate how well the GPT model identifies combined hatefulness and toxicity in the text data. The **F1-score** is a metric that balances **precision** (how accurately the model labels content as "hate_toxic") and **recall** (how well the model detects all true "hate_toxic" cases). This score is especially useful in assessing the model's effectiveness in identifying harmful content.

1. **Calculate the F1-Score**:
   - `f1_score` from `sklearn.metrics` compares the model’s predictions (`gpt_hate_toxic`) against the ground truth (`hate_toxic`).
   - `df_expert['hate_toxic']` represents the true labels, while `df_expert['gpt_hate_toxic']` is the model’s prediction based on the combined classifications of hatefulness and toxicity.

2. **Store and Print the F1-Score**:
   - The resulting F1-score is stored in `f1_score_gpt_hate_toxic`, which is then printed to show the model’s performance.

This F1-score provides a quantitative assessment of the GPT model’s reliability in detecting hatefulness or toxicity, helping users understand its accuracy and effectiveness.

In [122]:
f1_score_gpt_hate_toxic = f1_score(df_expert['hate_toxic'], df_expert['gpt_hate_toxic'])
print(f1_score_gpt_hate_toxic)

0.6945337620578779


## Experimenting with batch sizes

### Code Explanation

This code reads two large CSV files in chunks, combines them into a single DataFrame, and then samples 10,000 rows for further analysis. Here’s how each part functions:

1. **Chunk Size Definition**:
   - `chunk_size = 10000` specifies that each chunk will contain 10,000 rows, which helps reduce memory usage when reading large files.

2. **Reading and Concatenating CSV Files in Chunks**:
   - `df1_chunks = pd.read_csv('../data/Reddit-Threads_2020-2021.csv', chunksize=chunk_size)` reads the first CSV file in chunks of 10,000 rows each.
   - `df1 = pd.concat(df1_chunks, ignore_index=True)` concatenates all chunks of the first file into a single DataFrame, `df1`.
   - Similarly, `df2_chunks` and `df2` read and concatenate the second CSV file (`Reddit-Threads_2022-2023.csv`) in chunks.

3. **Combining Both DataFrames**:
   - `full_df = pd.concat([df1, df2])` combines the two DataFrames (`df1` and `df2`) from each CSV file into a single DataFrame, `full_df`.

4. **Sampling 10,000 Rows**:
   - `df_10k = full_df.sample(n=10000, random_state=42)` randomly samples 10,000 rows from `full_df` for batching analysis, with a fixed `random_state` for reproducibility.

5. **Output**:
   - `print(df_10k.shape)` displays the shape of `df_10k`, confirming that the sampled DataFrame contains 10,000 rows.

This approach efficiently handles large files by processing them in chunks and provides a manageable sample of 10,000 rows for further analysis.


In [None]:
chunk_size = 10000
df1_chunks = pd.read_csv('../data/Reddit-Threads_2020-2021.csv', chunksize=chunk_size)
df1 = pd.concat(df1_chunks, ignore_index=True)
df2_chunks = pd.read_csv('../data/Reddit-Threads_2022-2023.csv', chunksize=chunk_size)
df2 = pd.concat(df2_chunks, ignore_index=True)
full_df = pd.concat([df1, df2])

In [46]:
df_10k = full_df.sample(n=10000, random_state=42)
print(df_10k.shape)

(10000, 9)


## Feasibility Analysis for Processing Large Dataset Using OpenAI API with Batching

In this section, we attempt to evaluate approximately 4.5 million rows of text for toxicity and hatefulness using the OpenAI API with batching. However, token usage and API constraints make this approach challenging and potentially infeasible within a reasonable timeframe.

### Token Limit Constraints and Calculations

1. **Daily Token Limit**:
   - OpenAI enforces a **2 million tokens per day (TPD)** limit, meaning the total number of tokens used (prompt + response) across all requests in a 24-hour period must stay within this limit.

2. **Request Composition**:
   - **Prompt**: Contains instructions, definitions, and examples.
   - **Input Text**: The text sample from our dataset to be evaluated.
   - **Completion**: The model’s response, limited to a maximum of 10 tokens.

3. **Updated Token Usage Analysis per Request**:
   Given that we can process **7 batches of 500 rows each** per day, which totals **3,500 requests per day**, we calculate the average tokens available per request:

   - **Total Tokens per Day**: 2,000,000 tokens
   - **Total Requests per Day**: 3,500 requests

   Therefore, the tokens available per request are:

   **Total Tokens per Request = 2,000,000 / 3,500 ≈ 571 tokens**

   Breaking this down:
   - **Prompt Tokens (Fixed)**: 213 tokens for the instructions, definitions, and examples.
   - **Completion Tokens**: Limited to 10 tokens.
   - **Input Text Tokens**: With the updated batch size, the input text portion is approximately:

     **Input Text Tokens = 571 - 213 - 10 = 348 tokens**

4. **Daily Request Capacity**:
   With 7 batches of 500 rows, we can process **3,500 rows per day** within the 2 million token limit.

### Feasibility Issues with Batching

With a total dataset of 4.5 million rows, the batching approach becomes infeasible:

- **Insufficient Daily Capacity**:
  - At 3,500 rows per day, processing 4.5 million rows would take approximately:

    **4,500,000 rows / 3,500 rows per day ≈ 1,285 days (over 3.5 years)**

  - This processing time is beyond acceptable limits.

- **Token Consumption Per Batch**:
  - Each batch of 500 rows uses about 333,500 tokens, meaning that with 7 batches per day, the daily token limit of 2 million is fully utilized.

### Conclusion

Due to the high token consumption per request, this batch-processing method is **not feasible** for such a large dataset. Consider these alternatives:
1. **Data Sampling**: Process a representative subset of the data instead of the entire dataset.
2. **Local or Offline Processing**: Explore alternative models without token limitations.

## Code Rationale

This code processes a dataset (`df_10k`) in batches to evaluate text for hatefulness and toxicity using OpenAI’s API, specifically designed to handle Singaporean cultural nuances. It breaks down the dataset into manageable chunks to optimize processing and API usage, while persisting batch IDs for tracking and potential reusability.

### Code Breakdown

1. **Initialize and Load Batch IDs**:
   - The OpenAI API client is set up, and a `batch_size` of 500 rows is defined to process the data in chunks.
   - The code calculates the `total_batches` needed to cover the dataset.
   - If a file (`batch_ids.json`) exists with previously saved batch IDs, it loads these to avoid duplicating batch processing.

2. **Batch Processing**:
   - For each batch, a subset of the DataFrame (`batch_df`) is created based on the defined `batch_size`.
   - Each batch is saved as a `.jsonl` file (`batch_input_{batch_num}.jsonl`), where each row is converted to a JSON request format for the API. 
   - The prompt instructs the API to evaluate each text for toxicity and hatefulness, providing clear definitions and examples relevant to Singapore’s culture.

3. **Upload and Process Each Batch**:
   - After creating the `.jsonl` file, the batch is uploaded to OpenAI’s API, and the file ID is used to submit a batch job with a 24-hour processing window.
   - Metadata is included to label each batch job for easier tracking.
   - The batch ID is saved to `batch_ids.json` for persistence, allowing resumption or tracking of processing status if needed.

4. **Error Handling**:
   - The code includes error handling to manage issues during file creation, upload, and batch submission, allowing it to continue processing subsequent batches without interruption.

### Purpose

The code is designed to optimize the processing of large datasets by breaking down API requests into batches. This approach enhances efficiency by handling each batch sequentially, while the persistence of batch IDs enables monitoring and reuse. It also supports experimentation with batch sizes to balance processing time and API rate limits.


In [73]:
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

batch_size = 500
total_batches = (len(df_10k) + batch_size - 1) // batch_size
batch_ids = []
batch_ids_file = "batch_ids.json"

if os.path.exists(batch_ids_file):
    with open(batch_ids_file, "r") as f:
        batch_ids = json.load(f)

for batch_num in range(total_batches):
    batch_df = df_10k.iloc[batch_num * batch_size : (batch_num + 1) * batch_size]
    if batch_df.empty:
        print(f"Skipping empty batch {batch_num}")
        continue
    
    batch_file_path = f"batch_input_{batch_num}.jsonl"
    
    try:
        with open(batch_file_path, "w") as f:
            for idx, row in batch_df.iterrows():
                text = row['text']
                request = {
                    "custom_id": f"request-{idx}",
                    "method": "POST",
                    "url": "/v1/chat/completions",
                    "body": {
                        "model": "gpt-4o-mini-2024-07-18",
                        "temperature": 0.7,
                        "messages": [
                            {
                                "role": "user",
                                "content": (
                                    f"Evaluate the following text for hatefulness and toxicity in the context of Singapore’s culture, "
                                    f"where Singlish and Singaporean slang may be present. Interpret the language based on these definitions and examples:\n\n"
                                    f"Toxic: Language that is hostile, aggressive, or disrespectful, likely to cause offense or discomfort. "
                                    f"Examples include derogatory comments, vulgar language, or personal attacks, such as 'ceca' used in an offensive context, "
                                    f"or '377a' when implying derogatory meaning.\n\n"
                                    f"Hateful: Language that shows prejudice or hostility toward a specific group, race, nationality, or identity. "
                                    f"This includes slurs, racist or xenophobic remarks, and targeted hate speech. Examples might include statements intended "
                                    f"to marginalize groups based on ethnicity, sexuality, or nationality.\n\n"
                                    f"Respond with True or False only in string for each category based on the definitions above:\n\n"
                                    f"Toxic: True or False\n"
                                    f"Hateful: True or False\n\n"
                                    f"Text to evaluate: {text}"
                                )
                            }
                        ],
                        "max_tokens": 10
                    }
                }
                f.write(json.dumps(request) + "\n")
        
        print(f"Successfully created {batch_file_path}")
    
    except Exception as e:
        print(f"Failed to create {batch_file_path}: {e}")
        continue 
    if os.path.exists(batch_file_path):
        try:
            batch_input_file = client.files.create(
                file=open(batch_file_path, "rb"),
                purpose="batch"
            )
            print(f"Successfully uploaded {batch_file_path}")
        except Exception as e:
            print(f"Error uploading {batch_file_path}: {e}")
            continue

        try:
            batch = client.batches.create(
                input_file_id=batch_input_file.id,
                endpoint="/v1/chat/completions",
                completion_window="24h",
                metadata={
                    "description": f"hateful/toxic analysis job - Batch {batch_num}"
                }
            )
            print(f"Submitted Batch {batch_num}: {batch.id}")
            batch_ids.append(batch.id)
            with open(batch_ids_file, "w") as f:
                json.dump(batch_ids, f)
        
        except Exception as e:
            print(f"Error creating batch for {batch_file_path}: {e}")
    else:
        print(f"File {batch_file_path} does not exist, skipping batch {batch_num}")

Successfully created batch_input_0.jsonl
Successfully uploaded batch_input_0.jsonl
Submitted Batch 0: batch_67304dbc0f2c81908d784919826efd05
Successfully created batch_input_1.jsonl
Successfully uploaded batch_input_1.jsonl
Submitted Batch 1: batch_67304dbdf47c8190b4513924cc4277e7
Successfully created batch_input_2.jsonl
Successfully uploaded batch_input_2.jsonl
Submitted Batch 2: batch_67304dbf54a08190b5554a28282ba740
Successfully created batch_input_3.jsonl
Successfully uploaded batch_input_3.jsonl
Submitted Batch 3: batch_67304dc0c04c819080bd53fb12429594
Successfully created batch_input_4.jsonl
Successfully uploaded batch_input_4.jsonl
Submitted Batch 4: batch_67304dc1ed34819095ee2aedb5aaae8c
Successfully created batch_input_5.jsonl
Successfully uploaded batch_input_5.jsonl
Submitted Batch 5: batch_67304dc37c648190af7faf67845226b0
Successfully created batch_input_6.jsonl
Successfully uploaded batch_input_6.jsonl
Submitted Batch 6: batch_67304dc4b3748190a24850c8cf6833bf
Successfully 

## Code Rationale

This code processes a JSONL output file containing batch responses from an API. It extracts toxicity and hatefulness classifications from each response and organizes them into a pandas DataFrame for easy analysis.

### Code Breakdown

1. **Load Data from JSONL File**:
   - `output_file`: Specifies the name of the JSONL file containing the batch API responses.
   - The code opens the file, reads each line as JSON, and appends each JSON object (representing an API response) to the `data` list.
   - `df_results`: Creates a DataFrame from the `data` list, where each row corresponds to an individual API response.

2. **Define Function to Extract Toxicity and Hatefulness**:
   - `extract_toxic_hateful`: A function designed to parse each response for toxicity and hatefulness values.
   - Inside the function:
     - It first checks if the response contains the `body` and `choices` fields, ensuring the structure matches the expected format.
     - Extracts the `content` from `choices`, which holds the model’s output text.
     - Uses regular expressions (`re.search`) to find `Toxic: TRUE/FALSE` and `Hateful: TRUE/FALSE` in the `content`.
     - Prints the extracted content and matches for debugging purposes.
     - If matches are found, it converts the results to uppercase (`TRUE` or `FALSE`) and returns them as `gpt_toxic` and `gpt_hate`. If no matches are found, it returns `None` for each.
     - Logs warnings if the required fields are missing from the response, and handles any `KeyError` exceptions gracefully.

3. **Apply Function to Extract Labels**:
   - `df_results['gpt_toxic_batch'], df_results['gpt_hate_batch']`: The `apply` method is used to run `extract_toxic_hateful` on each `response` entry in `df_results`.
   - This populates two new columns, `gpt_toxic_batch` and `gpt_hate_batch`, with the extracted values for toxicity and hatefulness.

4. **Print Results**:
   - Displays the first few rows of `df_results`, showing the `gpt_toxic_batch` and `gpt_hate_batch` columns with the extracted toxicity and hatefulness classifications.

### Purpose

The purpose of this code is to automate the extraction of toxicity and hatefulness classifications from batch API responses. By organizing these classifications into a structured DataFrame, the code facilitates further analysis and evaluation of the model’s performance on these attributes.


In [74]:
def check_batch_status(batch_id):
    while True:
        batch_status = client.batches.retrieve(batch_id)
        print(f"Batch {batch_id} status: {batch_status.status}")
        if batch_status.status == "completed" or batch_status.status == "failed":
            return batch_status
        
with open(batch_ids_file, "r") as f:
    batch_ids = json.load(f)

for batch_id in batch_ids:
    batch_status = check_batch_status(batch_id)
    
    if batch_status.output_file_id:
        output_file_id = batch_status.output_file_id
        file_metadata = client.files.retrieve(output_file_id)
        output_content = client.files.content(output_file_id).read()
        with open(f"batch_output_{batch_id}.jsonl", "wb") as output_file:
            output_file.write(output_content)

print(f"Batch {batch_id} results successfully retrieved and saved to 'batch_output_{batch_id}.jsonl'.")


Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c81908d784919826efd05 status: in_progress
Batch batch_67304dbc0f2c8

In [82]:
output_files = [
    "batch_output_batch_67304dbc0f2c81908d784919826efd05.jsonl",
    "batch_output_batch_67304dbdf47c8190b4513924cc4277e7.jsonl",
    "batch_output_batch_67304dbf54a08190b5554a28282ba740.jsonl",
    "batch_output_batch_67304dc0c04c819080bd53fb12429594.jsonl",
    "batch_output_batch_67304dc1ed34819095ee2aedb5aaae8c.jsonl",
    "batch_output_batch_67304dc4b3748190a24850c8cf6833bf.jsonl",
    "batch_output_batch_67304dc37c648190af7faf67845226b0.jsonl"
]

def extract_toxic_hateful(response):
    try:
        if 'body' in response and 'choices' in response['body']:
            content = response['body']['choices'][0]['message']['content']
            print(f"Content: {content}")
            toxic_match = re.search(r'Toxic:\s*(TRUE|FALSE)', content, re.IGNORECASE)
            hateful_match = re.search(r'Hateful:\s*(TRUE|FALSE)', content, re.IGNORECASE)
            print(f"Toxic match: {toxic_match}")
            print(f"Hateful match: {hateful_match}")

            gpt_toxic = toxic_match.group(1).upper() if toxic_match else None
            gpt_hate = hateful_match.group(1).upper() if hateful_match else None

            return gpt_toxic, gpt_hate
        else:
            print("Warning: 'body' or 'choices' not found in response")
            return None, None
    except KeyError as e:
        print(f"KeyError: {e}")
        return None, None

# Initialize an empty list to store all results
all_data = []

# Loop through each output file, process it, and append the data to the list
for output_file in output_files:
    data = []
    with open(output_file, 'r') as file:
        for line in file:
            data.append(json.loads(line))
    
    # Convert the current batch data to a DataFrame
    df_results = pd.DataFrame(data)
    
    # Extract 'gpt_toxic_batch' and 'gpt_hate_batch' columns using the function
    df_results['gpt_toxic_batch'], df_results['gpt_hate_batch'] = zip(*df_results['response'].apply(
        lambda x: extract_toxic_hateful(x)
    ))
    
    # Append the current DataFrame to the list
    all_data.append(df_results)

# Concatenate all DataFrames into a single DataFrame
final_df = pd.concat(all_data, ignore_index=True)

# Display the results
print(final_df[['gpt_toxic_batch', 'gpt_hate_batch']].head())

Content: Toxic: False  
Hateful: False
Toxic match: <re.Match object; span=(0, 12), match='Toxic: False'>
Hateful match: <re.Match object; span=(15, 29), match='Hateful: False'>
Content: Toxic: False  
Hateful: False
Toxic match: <re.Match object; span=(0, 12), match='Toxic: False'>
Hateful match: <re.Match object; span=(15, 29), match='Hateful: False'>
Content: Toxic: False  
Hateful: False
Toxic match: <re.Match object; span=(0, 12), match='Toxic: False'>
Hateful match: <re.Match object; span=(15, 29), match='Hateful: False'>
Content: Toxic: False  
Hateful: False
Toxic match: <re.Match object; span=(0, 12), match='Toxic: False'>
Hateful match: <re.Match object; span=(15, 29), match='Hateful: False'>
Content: Toxic: False  
Hateful: False
Toxic match: <re.Match object; span=(0, 12), match='Toxic: False'>
Hateful match: <re.Match object; span=(15, 29), match='Hateful: False'>
Content: Toxic: False  
Hateful: False
Toxic match: <re.Match object; span=(0, 12), match='Toxic: False'>
Hate

In [86]:
# Find out how many TRUE/FALSE in gpt_toxic_batch and gpt_hate_batch in final_df
toxic_counts = final_df['gpt_toxic_batch'].value_counts()
hate_counts = final_df['gpt_hate_batch'].value_counts()
print(toxic_counts)
print(hate_counts)

gpt_toxic_batch
FALSE    2937
TRUE      550
Name: count, dtype: int64
gpt_hate_batch
FALSE    3406
TRUE       81
Name: count, dtype: int64


In [75]:
def check_batch_status(batch_id):
    batch_status = client.batches.retrieve(batch_id)
    print(f"Batch {batch_id} status: {batch_status.status}")
    
    if batch_status.status == "failed":
        if batch_status.errors:
            print(f"Batch {batch_id} failed with error: {batch_status.errors}")
        else:
            print(f"Batch {batch_id} failed, but no error details are available.")
        return batch_status
    if batch_status.status == "completed":
        return batch_status

    return None

with open(batch_ids_file, "r") as f:
    batch_ids = json.load(f)

for batch_id in batch_ids:
    batch_status = check_batch_status(batch_id)
    
    if batch_status and batch_status.output_file_id:
        output_file_id = batch_status.output_file_id
        output_content = client.files.content(output_file_id).read()
        with open(f"batch_output_{batch_id}.jsonl", "wb") as output_file:
            output_file.write(output_content)
        
        print(f"Batch {batch_id} results successfully retrieved and saved to 'batch_output_{batch_id}.jsonl'.")
    elif batch_status and batch_status.status == "failed":
        print(f"Batch {batch_id} has failed. Check the error details above.")
    else:
        print(f"Batch {batch_id} has not completed or no output file is available.")



Batch batch_67304dbc0f2c81908d784919826efd05 status: completed
Batch batch_67304dbc0f2c81908d784919826efd05 results successfully retrieved and saved to 'batch_output_batch_67304dbc0f2c81908d784919826efd05.jsonl'.
Batch batch_67304dbdf47c8190b4513924cc4277e7 status: completed
Batch batch_67304dbdf47c8190b4513924cc4277e7 results successfully retrieved and saved to 'batch_output_batch_67304dbdf47c8190b4513924cc4277e7.jsonl'.
Batch batch_67304dbf54a08190b5554a28282ba740 status: completed
Batch batch_67304dbf54a08190b5554a28282ba740 results successfully retrieved and saved to 'batch_output_batch_67304dbf54a08190b5554a28282ba740.jsonl'.
Batch batch_67304dc0c04c819080bd53fb12429594 status: completed
Batch batch_67304dc0c04c819080bd53fb12429594 results successfully retrieved and saved to 'batch_output_batch_67304dc0c04c819080bd53fb12429594.jsonl'.
Batch batch_67304dc1ed34819095ee2aedb5aaae8c status: completed
Batch batch_67304dc1ed34819095ee2aedb5aaae8c results successfully retrieved and saved

## Perspective API

### Code Rationale

This code uses Google’s Perspective API to evaluate text data based on four key attributes related to toxicity and hatefulness: **toxicity**, **severe toxicity**, **insult**, and **identity attack**. By gathering scores for these attributes and calculating optimal thresholds for classification, the code aims to accurately detect and classify texts that are potentially harmful. It performs the following steps:

### Step-by-Step Explanation

1. **API Setup**:
   - An API key (`PERSPECTIVE_API_KEY`) is loaded from environment variables, and a client is created to access the Perspective API using Google’s `discovery.build`.
   - The `GOOGLE_APPLICATION_CREDENTIALS` environment variable is set to authenticate the API.

2. **Collecting Scores from the API**:
   - Four empty lists are created to store scores for each attribute: `toxicity_scores`, `severe_toxicity_scores`, `insult_scores`, and `identity_attack_scores`.
   - The code iterates over each text entry in `df_expert`, constructing an `analyze_request` dictionary specifying the text and requesting scores for each attribute.
   - Each text is sent to the API in a single call to minimize API usage and avoid hitting rate limits.
   - The scores are appended to the respective lists, representing the model’s assessment of each attribute. An optional `time.sleep(1)` delay is added to further manage the API rate limits.

3. **Adding Scores to the DataFrame**:
   - After collecting scores for all texts, each list of scores is added to the `df_expert` DataFrame as new columns: `toxicity_score`, `severe_toxicity_score`, `insult_score`, and `identity_attack_score`.

4. **Evaluating F1-Scores Across Thresholds**:
   - To find the best threshold for classifying texts as toxic or hateful, the code tests thresholds from `0` to `1` in increments of `0.01`.
   - Three empty dictionaries (`toxicity_f1_scores`, `hate_f1_scores`, `hate_toxic_f1_scores`) are created to store F1-scores at each threshold.
   - For each threshold:
     - **Classification**: The code classifies texts as toxic if either `toxicity_score` or `severe_toxicity_score` exceeds the threshold, and as hateful if either `insult_score` or `identity_attack_score` exceeds the threshold.
     - **Combined Classification**: A new column, `predicted_hate_toxic`, is created to indicate if a text meets either the toxic or hateful criteria based on the threshold.
     - **Calculating F1-Scores**: F1-scores are calculated using `f1_score` for each classification (`predicted_toxic`, `predicted_hate`, and `predicted_hate_toxic`) compared to their respective ground truth labels (`toxic`, `hateful`, and `hate_toxic`).
     - The calculated F1-scores are stored in the dictionaries, with the threshold as the key.

5. **Identifying Optimal Thresholds**:
   - After testing all thresholds, the optimal threshold for each category (toxicity, hate, and hate_toxic) is determined by selecting the threshold with the highest F1-score from each dictionary.
   - These optimal thresholds are printed alongside their corresponding F1-scores.

### Purpose and Importance of This Approach

This code is designed to evaluate and optimize the detection of harmful language by:
- **Gathering Contextually Relevant Scores**: Using multiple attributes (toxicity, severe toxicity, insult, identity attack) provides a nuanced view of harmful language.
- **Testing Multiple Thresholds**: By evaluating F1-scores at various thresholds, the code ensures the optimal balance between **precision** and **recall**, leading to a robust classification system.
- **Combined Classification**: The `predicted_hate_toxic` column offers a unified classification, flagging texts that meet either the toxic or hateful criteria, simplifying analysis and enhancing interpretability.

This method is ideal for applications in content moderation, policy development, or any setting that requires reliable identification of toxic or hateful language in text.

### Limitations
Batch optimization for Perspective API is not feasible due to its rate limit of 60 requests per minute (rpm). This limitation prevents efficient large-scale dataset labeling, as it requires each request to be processed individually, significantly slowing down the workflow. Consequently, we decided not to proceed with using the Perspective API for large-scale dataset labeling.

In [124]:
# Set up the API key and client
API_KEY = os.getenv('PERSPECTIVE_API_KEY')
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "igneous-effort-438806-m2-f268f59661e1.json"
client = discovery.build(
    "commentanalyzer",
    "v1alpha1",
    developerKey=API_KEY,
    discoveryServiceUrl="https://commentanalyzer.googleapis.com/$discovery/rest?version=v1alpha1",
    static_discovery=False,
)

# Step 1: Collect scores for each comment (only one API call per text)
toxicity_scores = []
severe_toxicity_scores = []
insult_scores = []
identity_attack_scores = []

for index, row in tqdm(df_expert.iterrows(), total=df_expert.shape[0], desc="Collecting API scores"):
    text = row['text']
    analyze_request = {
        'comment': {'text': text},
        'requestedAttributes': {
            'TOXICITY': {},
            'SEVERE_TOXICITY': {},
            'INSULT': {},
            'IDENTITY_ATTACK': {}
        },
        'languages': ['en']  # Specify language as English
    }

    response = client.comments().analyze(body=analyze_request).execute()

    # Store each score in separate lists
    toxicity_scores.append(response['attributeScores']['TOXICITY']['summaryScore']['value'])
    severe_toxicity_scores.append(response['attributeScores']['SEVERE_TOXICITY']['summaryScore']['value'])
    insult_scores.append(response['attributeScores']['INSULT']['summaryScore']['value'])
    identity_attack_scores.append(response['attributeScores']['IDENTITY_ATTACK']['summaryScore']['value'])

    time.sleep(1)  # Optional delay to avoid hitting API rate limits

# Add scores to the DataFrame
df_expert['toxicity_score'] = toxicity_scores
df_expert['severe_toxicity_score'] = severe_toxicity_scores
df_expert['insult_score'] = insult_scores
df_expert['identity_attack_score'] = identity_attack_scores

# Step 2: Evaluate F1 scores at thresholds from 0 to 1 in intervals of 0.01
thresholds = [round(i * 0.01, 2) for i in range(101)]
toxicity_f1_scores = {}
hate_f1_scores = {}
hate_toxic_f1_scores = {}

for threshold in thresholds:
    print(f"\nTesting threshold: {threshold}")
    
    # Apply threshold to classify toxicity and hatefulness
    df_expert['predicted_toxic'] = ((df_expert['toxicity_score'] >= threshold) | 
                                    (df_expert['severe_toxicity_score'] >= threshold))

    df_expert['predicted_hate'] = ((df_expert['insult_score'] >= threshold) | 
                                   (df_expert['identity_attack_score'] >= threshold))
    
    # Create predicted_hate_toxic as per your rule
    df_expert['predicted_hate_toxic'] = df_expert[['predicted_toxic', 'predicted_hate']].any(axis=1)
    
    # Calculate F1-scores for each category
    toxic_f1_score = f1_score(df_expert['toxic'], df_expert['predicted_toxic'])
    hate_f1_score = f1_score(df_expert['hateful'], df_expert['predicted_hate'])
    hate_toxic_f1_score = f1_score(df_expert['hate_toxic'], df_expert['predicted_hate_toxic'])

    # Store F1-scores for each threshold
    toxicity_f1_scores[threshold] = toxic_f1_score
    hate_f1_scores[threshold] = hate_f1_score
    hate_toxic_f1_scores[threshold] = hate_toxic_f1_score

    print(f"Threshold {threshold} - Toxic F1 Score: {toxic_f1_score:.2f}, Hate F1 Score: {hate_f1_score:.2f}, Hate_Toxic F1 Score: {hate_toxic_f1_score:.2f}")

# Find the optimal threshold based on the highest F1-score for each category
optimal_toxic_threshold = max(toxicity_f1_scores, key=toxicity_f1_scores.get)
optimal_hate_threshold = max(hate_f1_scores, key=hate_f1_scores.get)
optimal_hate_toxic_threshold = max(hate_toxic_f1_scores, key=hate_toxic_f1_scores.get)

print(f"\nOptimal toxicity threshold: {optimal_toxic_threshold} with F1-score: {toxicity_f1_scores[optimal_toxic_threshold]:.2f}")
print(f"Optimal hate threshold: {optimal_hate_threshold} with F1-score: {hate_f1_scores[optimal_hate_threshold]:.2f}")
print(f"Optimal hate_toxic threshold: {optimal_hate_toxic_threshold} with F1-score: {hate_toxic_f1_scores[optimal_hate_toxic_threshold]:.2f}")

Collecting API scores: 100%|██████████| 300/300 [06:28<00:00,  1.29s/it]



Testing threshold: 0.0
Threshold 0.0 - Toxic F1 Score: 0.54, Hate F1 Score: 0.40, Hate_Toxic F1 Score: 0.65

Testing threshold: 0.01
Threshold 0.01 - Toxic F1 Score: 0.54, Hate F1 Score: 0.41, Hate_Toxic F1 Score: 0.65

Testing threshold: 0.02
Threshold 0.02 - Toxic F1 Score: 0.55, Hate F1 Score: 0.45, Hate_Toxic F1 Score: 0.66

Testing threshold: 0.03
Threshold 0.03 - Toxic F1 Score: 0.55, Hate F1 Score: 0.47, Hate_Toxic F1 Score: 0.66

Testing threshold: 0.04
Threshold 0.04 - Toxic F1 Score: 0.55, Hate F1 Score: 0.49, Hate_Toxic F1 Score: 0.66

Testing threshold: 0.05
Threshold 0.05 - Toxic F1 Score: 0.54, Hate F1 Score: 0.50, Hate_Toxic F1 Score: 0.66

Testing threshold: 0.06
Threshold 0.06 - Toxic F1 Score: 0.55, Hate F1 Score: 0.51, Hate_Toxic F1 Score: 0.67

Testing threshold: 0.07
Threshold 0.07 - Toxic F1 Score: 0.55, Hate F1 Score: 0.54, Hate_Toxic F1 Score: 0.67

Testing threshold: 0.08
Threshold 0.08 - Toxic F1 Score: 0.55, Hate F1 Score: 0.54, Hate_Toxic F1 Score: 0.67

Te

In [125]:
# Apply optimal thresholds to classify toxicity and hatefulness
df_expert['predicted_toxic'] = ((df_expert['toxicity_score'] >= optimal_toxic_threshold) |
                                df_expert['severe_toxicity_score'] >= optimal_toxic_threshold)
df_expert['predicted_hate'] = ((df_expert['insult_score'] >= optimal_hate_threshold) |
                                 (df_expert['identity_attack_score'] >= optimal_hate_threshold))
df_expert['predicted_hate_toxic'] = df_expert[['predicted_toxic', 'predicted_hate']].any(axis=1)

# Display value counts for each prediction
print(df_expert[['predicted_toxic']].value_counts())
print(df_expert[['predicted_hate']].value_counts())
print(df_expert[['predicted_hate_toxic']].value_counts())

predicted_toxic
True               300
Name: count, dtype: int64
predicted_hate
False             211
True               89
Name: count, dtype: int64
predicted_hate_toxic
True                    300
Name: count, dtype: int64


### Code Explanation

This code calculates the **F1-score** to evaluate how accurately the Perspective API model detects combined **toxicity** and **hatefulness** in the text data.

- **F1-score Calculation**: `f1_score` from `sklearn.metrics` compares the model’s prediction (`predicted_hate_toxic`) against the ground truth (`hate_toxic`). The F1-score provides a balance between:
  - **Precision**: How many of the texts predicted as "hate_toxic" were correct.
  - **Recall**: How many of the actual "hate_toxic" texts were identified by the model.

- **Output**: The calculated F1-score is stored in `f1_score_perspective_toxic_hate` and then printed. This score gives a quantitative measure of how well the Perspective API model classifies harmful content based on the combined criteria of toxicity and hatefulness, aiding in evaluating the model’s effectiveness.



In [127]:
# Calculating F1-scores for Perspective API
f1_score_perspective_toxic_hate = f1_score(df_expert['hate_toxic'], df_expert['predicted_hate_toxic'])
print(f1_score_perspective_toxic_hate)

0.6486486486486487
