###  Excessive use of profanity/hate speech (Policy G)

Idea - non-toxic language should score high.

Tools to use -> __detoxify__

The Detoxify library in Python is an open-source tool designed for detecting and classifying various types of toxic language in text. It utilizes pre-trained machine learning models, primarily built on the Hugging Face Transformers library, to analyze text and provide scores indicating the likelihood of different toxicity categories.

__Limitations__ includes:

If words that are associated with swearing, insults or profanity are present in a comment, it is likely that it will be classified as toxic, regardless of the intent of the author. This could present biases towards already vulnerable minority groups.

How to get the score?

- Get category probabilities: “toxicity”, “severe_toxic”, “obscene”, “threat”, “insult”, “identity_hate”
- Take the worst probability: P(max) = max(. . .)
- Map to “good is 1”: S(toxicity) = 1-P(max)



In [25]:
from detoxify import Detoxify
from typing import List, Dict, Optional

# initial state of the detoxify model
_MODEL = None

In [26]:
def get_detoxify_model(model_name: str = "unbiased"):
    # three different model: "original", "unbiased", "multilingual"
    global _MODEL
    if _MODEL is None:
        _MODEL = Detoxify(model_name)
    return _MODEL

In [27]:
# scoring function

def score_toxicity(text: str, model_name: str = "unbiased") -> dict:
    '''
    Returns:
    {
    "toxicity": ...,
    "severe_toxicity": ...,
    "obscene": ...,
    "threat": ...,
    "insult": ...,
    "identity_attack": ...,
    "P_max": ..., -> worse-case toxicity
    "S_toxicity": ...
    }
    '''
    
    if not text or not text.strip():
        return {
            "toxicity": 0.0,
            "severe_toxicity": 0.0,
            "obscene": 0.0,
            "threat": 0.0,
            "insult": 0.0,
            "identity_attack": 0.0,
            "P_max": 0.0,
            "S_toxicity": 0.0
        }
    
    model = get_detoxify_model(model_name)
    preds = model.predict(text)
    
    probs = {
        "toxicity": float(preds.get("toxicity", 0.0)),
        "severe_toxicity": float(preds.get("severe_toxicity", 0.0)),
        "obscene": float(preds.get("obscene", 0.0)),
        "threat": float(preds.get("threat", 0.0)),
        "insult": float(preds.get("insult", 0.0)),
        "identity_attack": float(
            preds.get("identity_attack", preds.get("identity_hate", 0.0))
        )
    }
    
    P_max = max(probs.values())
    S_toxicity = 1.0 - P_max
    
    probs["P_max"] = float(P_max)
    probs["S_toxicity"] = float(S_toxicity)
    return probs


    

In [None]:
def batch_score_toxicity(
    texts: List[str],
    model_name: str = "original",
    batch_size: int = 128,
) -> List[Dict[str, float]]:
    """
    Returns a list of dicts aligned to `texts`:
        {
          "toxicity": ...,
          "severe_toxicity": ...,
          "obscene": ...,
          "threat": ...,
          "insult": ...,
          "identity_attack": ...,
          "P_max": ...,
          "S_toxicity": ...
        }
    """
    if not texts:
        return []

    model = get_detoxify_model(model_name)
    results: List[Dict[str, float]] = []

    n = len(texts)
    for start in range(0, n, batch_size):
        chunk = texts[start:start + batch_size]

        # detoxify returns a dict of arrays for batch input
        preds = model.predict(chunk)

        # key variants 
        tox = preds.get("toxicity")
        sev = preds.get("severe_toxicity", preds.get("severe_toxic")) 
        obs = preds.get("obscene")
        thr = preds.get("threat")
        ins = preds.get("insult")
        ida = preds.get("identity_attack", preds.get("identity_hate"))

        L = len(chunk)
        # convert to python floats
        def as_list(x):
            # x is numpy array so convert to float
            return [float(v) for v in x]

        tox_l = as_list(tox)
        sev_l = as_list(sev)
        obs_l = as_list(obs)
        thr_l = as_list(thr)
        ins_l = as_list(ins)
        ida_l = as_list(ida)

        for i in range(L):
            vals = {
                "toxicity": tox_l[i],
                "severe_toxicity": sev_l[i],
                "obscene": obs_l[i],
                "threat": thr_l[i],
                "insult": ins_l[i],
                "identity_attack": ida_l[i],
            }
            p_max = max(vals.values())
            vals["P_max"] = float(p_max)
            vals["S_toxicity"] = float(1.0 - p_max)
            results.append(vals)

    return results

In [29]:
examples = [
    "This place was amazing, staff were kind and helpful!",       # non-toxic
    "Damn this service sucked, worst experience ever.",           # weak swear
    "The experience at this place was shit. No more words need.",  # med swear
    "The staffs are fucking rude, they are braindead as fuck.",   # strong swear
    "The waiter was a stupid idiot, absolute shit service.",   # insult
    "I hate all people from NTU, they shouldn't exist.",    # hate speech
]

for txt in examples:
    scores = score_toxicity(txt)
    print(f"\nReview: {txt}")
    print(f"Scores: {scores}")
    print(f"S(toxicity): {scores["S_toxicity"]}")


Review: This place was amazing, staff were kind and helpful!
Scores: {'toxicity': 0.0004211723862681538, 'severe_toxicity': 1.2876740811407217e-06, 'obscene': 2.846930146915838e-05, 'threat': 1.8640135749592446e-05, 'insult': 0.00011658602306852117, 'identity_attack': 6.628548726439476e-05, 'P_max': 0.0004211723862681538, 'S_toxicity': 0.9995788276137318}
S(toxicity): 0.9995788276137318

Review: Damn this service sucked, worst experience ever.
Scores: {'toxicity': 0.9891746044158936, 'severe_toxicity': 0.041211824864149094, 'obscene': 0.9647092223167419, 'threat': 0.0013417662121355534, 'insult': 0.6300086379051208, 'identity_attack': 0.018575580790638924, 'P_max': 0.9891746044158936, 'S_toxicity': 0.010825395584106445}
S(toxicity): 0.010825395584106445

Review: The experience at this place was shit. No more words need.
Scores: {'toxicity': 0.9548090696334839, 'severe_toxicity': 0.007803658954799175, 'obscene': 0.921392023563385, 'threat': 0.0008290570112876594, 'insult': 0.2268422394

In [30]:
import pandas as pd
df = pd.read_csv("/Users/evan/Documents/Projects/TikTok-TechJam-2025/final_data_sampled.csv")
sample_df = df.sample(n=10, random_state=42).reset_index(drop=True).copy()
reviews = sample_df["text"].fillna("").tolist()

scores = batch_score_toxicity(reviews, model_name="unbiased", batch_size=16)
scores_df = pd.DataFrame(scores)
results_df = pd.concat([sample_df, scores_df], axis=1)
print(results_df)


   Unnamed: 0  rating                                               text  \
0      617643       5  Good fast Korean food. You can get an extra si...   
1      799373       4  Located in the small ranching town of Waimea o...   
2      188913       3                 It's all the way on the West Coast   
3      763245       4  Great selection of Tequilas and Jo the bartner...   
4      399817       4  Lots of stunning views.  Long hike left at 8:3...   
5      664326       5                            Good is very delicious!   
6      297606       5  Fun mini golf here. They also have a zipline b...   
7      154787       5  Awesome,  compassionate hospitality good food ...   
8     1047180       5  Georgia Peach and Durian,  a new first time fl...   
9      898727       5   Great bakery. Great selection of tasty products.   

                   business_name  \
0             Sam's Delicatessen   
1          Merriman's Big Island   
2         Ewa Pointe Marketplace   
3              Mi A