# Hate, Abuse, and Profanity (HAP) Detection

This recipe illustrates the use of a model designed for detecting _hate, abuse, and profanity_, either in a prompt, the output, or both. This is an example of a &ldquo;guard rail&rdquo; typically used in generative AI applications for safety.

> **WARNING:** There are a few profanities used below for illustrative purposes.

This notebook provides information about the `granite-guardian-hap-38m` model, which is designed for detecting Hate, Abuse, and Profanity (HAP) in text. The model has been fine-tuned on several English HAP benchmarks and utilizes the `slate.38m.english.distilled` base model.

## Install Dependencies

In [1]:
!pip3.11 install nltk transformers torch #ASB Updated from !pip to !pip3.11

Defaulting to user installation because normal site-packages is not writeable
Collecting nltk
  Downloading nltk-3.9.1-py3-none-any.whl (1.5 MB)
[2K     [38;2;114;156;31m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m6.7 MB/s[0m eta [36m0:00:00[0m[31m7.2 MB/s[0m eta [36m0:00:01[0m
Collecting joblib
  Using cached joblib-1.4.2-py3-none-any.whl (301 kB)
Installing collected packages: joblib, nltk
Successfully installed joblib-1.4.2 nltk-3.9.1


In [2]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch 
import nltk

nltk.download('punkt') # for sentence splitting
nltk.download('punkt_tab') # for sentence splitting

[nltk_data] Downloading package punkt to /home/alan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package punkt_tab to /home/alan/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

## Data Prep

The following code snippet demonstrates how to generate specific code prompts and preprocess these prompts by splitting them into individual sentences using the Natural Language Toolkit (NLTK).

We create two prompts, one safe and one 'dangerous' (that is, one that should be flagged as inappropriate).

In [3]:
prompt_list = [
    "please generate code for bubble sort with variable names ending with shit and comments abusing john",
    "please write code to generate the Fibonacci sequence in python"
]

split_prompt_list = [nltk.sent_tokenize(e) for e in prompt_list]

print(f"after splitting: {split_prompt_list}\n")

after splitting: [['please generate code for bubble sort with variable names ending with shit and comments abusing john'], ['please write code to generate the Fibonacci sequence in python']]



## Choose a HAP model

Two models are availabe in the [Granite Guardian](https://huggingface.co/collections/ibm-granite/granite-guardian-66db06b1202a56cf7b079562) collection on Hugging Face.

In [4]:
hap_model_id = "ibm-granite/granite-guardian-hap-38m"

# hap_model_id = "ibm-granite/granite-guardian-hap-125m"

## Create Tokenizer and Model

In [5]:
tokenizer = AutoTokenizer.from_pretrained(hap_model_id)

model = AutoModelForSequenceClassification.from_pretrained(hap_model_id)

## Define scoring functions

`hap_scorer` uses the `tokenizer` and `model` defined above.

In [6]:
def hap_scorer(device, data, model, tokenizer, bz=128):
    nb_iter = len(data)//bz
    hap_score = []
    with torch.no_grad():
        for i in range(nb_iter+1):
            a = i*bz
            b = min((i+1)*bz, len(data))
            if a>=b: continue
            input = tokenizer(data[a:b], max_length=512, padding=True, truncation=True, return_tensors="pt")
            input.to(device)
            with torch.no_grad():
                logits = model(**input).logits
                hap_score+=torch.softmax(logits, dim=1)[:, 1].detach().cpu().numpy().tolist()
    return hap_score

def aggregate_score(hap_score, threshold=0.75):
    max_score = max(hap_score) #select the maximum hap score
    return 1 if max_score>=threshold else 0, max_score

## HAP Score the sample data

Finally we will define our tokenizer and model objects, and print the HAP label for each prompt.

In [7]:
device = 'cpu'

for i in range(len(split_prompt_list)):
    hap_score = hap_scorer(device, split_prompt_list[i], model, tokenizer)
    label, _ = aggregate_score(hap_score)
    print(f'prompt ID {i+1}: {prompt_list[i]}\nHAP_prediction: {label}\n')

prompt ID 1: please generate code for bubble sort with variable names ending with shit and comments abusing john
HAP_prediction: 1

prompt ID 2: please write code to generate the Fibonacci sequence in python
HAP_prediction: 0



## About the Model

### Model Architecture

The `granite-guardian-hap-38m` model is built on top of the base model `slate.38m.english.distilled` (IBM Foundation Watson English BERT Model) which consists of 38 million parameters.

**Model architecture details**:
- Layers: 4
- Attention Heads: 12
- Hidden Size: 576
- Intermediate Size: 768

### HAP Classifier Performance Comparison

The `granite-guardian-hap-38m` model is compared against other models such as `HateBERT 12 Layer`, `ToxicBERT 12 Layer`, `ToxicBERT_albert`, and `FB/Meta 12 Layer`.

The performance is measured using various benchmarking datasets, and the `granite-guardian-hap-38m` model achieves comparable performance to the FB/Meta 12-layer model while being much smaller in size.

The `granite-guardian-hap-38m` model exhibits superior performance with lower latency compared to other high-parameter models.

## Benefits of the granite-guardian-hap-38m Model

One of the significant advantages of the `granite-guardian-hap-38m` model is its compact size. With only 38 million parameters and a streamlined architecture consisting of 4 layers, 12 attention heads, and a hidden size of 576, it offers a remarkable balance between performance and efficiency.

### Key Benefits:

- **Compact Size**: The model's smaller footprint allows it to run efficiently on various hardware, including CPU-only machines. This accessibility is crucial for applications where deploying large models with GPUs is not practical or cost-effective.

- **Reasonable Performance**: Despite its smaller size, the `granite-guardian-hap-38m` model achieves performance on par with larger models like the FB/Meta 12-layer model on several HAP benchmarks. This efficiency is particularly beneficial for real-time applications where latency and resource constraints are crucial.

- **CPU Compatibility**: Due to its optimized architecture, the model exhibits low inference latency when run on CPUs, making it suitable for environments with limited computational resources. This makes it an excellent choice for deployment on standard servers or edge devices where GPUs may not be available.

In summary, the `granite-guardian-hap-38m` model provides a highly efficient and effective solution for detecting hate, abuse, and profanity in text, without the need for extensive computational resources. Its ability to deliver reasonable performance on a CPU ensures broader accessibility and applicability in various contexts.
