# Distributed Data Classification with NeMo Curator's `InstructionDataGuardClassifier`

This notebook demonstrates the use of NeMo Curator's `InstructionDataGuardClassifier`. The [Instruction Data Guard classifier](https://huggingface.co/nvidia/instruction-data-guard) is built on NVIDIA's [Aegis safety classifier](https://huggingface.co/nvidia/Aegis-AI-Content-Safety-LlamaGuard-Defensive-1.0) and is designed to detect LLM poisoning trigger attacks. Please refer to the NemoCurator Instruction Data Guard Hugging Face page for more information about the Instruction Data Guard classifier here: https://huggingface.co/nvidia/instruction-data-guard.

Like the `AegisClassifier`, you must get access to Llama Guard on Hugging Face here: https://huggingface.co/meta-llama/LlamaGuard-7b. Afterwards, you should set up a [user access token](https://huggingface.co/docs/hub/en/security-tokens) and pass that token into the constructor of this classifier.

The Instruction Data Guard classifier is accelerated using [CrossFit](https://github.com/rapidsai/crossfit), a library that leverages intellegent batching and RAPIDS to accelerate the offline inference on large datasets.

Before running this notebook, please see this [Getting Started](https://github.com/NVIDIA/NeMo-Curator?tab=readme-ov-file#get-started) page for instructions on how to install NeMo Curator.

In [1]:
# Silence Warnings (HuggingFace internal warnings)

%env PYTHONWARNINGS=ignore
import warnings
warnings.filterwarnings("ignore")



In [2]:
from nemo_curator import get_client
from nemo_curator.classifiers import InstructionDataGuardClassifier
from nemo_curator.datasets import DocumentDataset
import cudf
import dask_cudf

In [3]:
client = get_client(cluster_type="gpu")

cuDF Spilling is enabled


# Set Output File Path

The user should specify an empty directory below for storing the output results.

In [None]:
output_file_path = "./instruction_data_guard_results/"

# Prepare Text Data and Initialize Classifier

In [5]:
# For security reasons, we only give a benign example here
instruction = "Find a route between San Diego and Phoenix which passes through Nevada"
input_ = ""
response = "Drive to Las Vegas with highway 15 and from there drive to Phoenix with highway 93"
benign_sample_text =  f"Instruction: {instruction}. Input: {input_}. Response: {response}."

# Create sample DataFrame
text = [benign_sample_text]
df = cudf.DataFrame({"text": text})
input_dataset = DocumentDataset(dask_cudf.from_cudf(df, npartitions=1))
write_to_filename = False

# Alternatively, read existing directory of JSONL files
# input_file_path="/input_data_dir/"
# input_dataset = DocumentDataset.read_json(
#     input_file_path, backend="cudf", add_filename=True
# )
# write_to_filename = True

In [6]:
# Replace with your user access token
token = "hf_1234"

classifier = InstructionDataGuardClassifier(token=token)

# Run the  Classifier

Dask operations are lazy, so the the classifier will not run until we call an eager operation like `to_json`, `compute`, or `persist`. 

In [7]:
%%time

result_dataset = classifier(dataset=input_dataset)
result_dataset.to_json(output_path=output_file_path, write_to_filename=write_to_filename)

Starting Instruction Data Guard classifier inference


Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00,  3.25it/s]
From v4.47 onwards, when a model cache is to be returned, `generate` will return a `Cache` instance instead by default (as opposed to the legacy tuple of tuples format). If you want to keep returning the legacy format, please set `return_legacy_cache=True`.


Writing to disk complete for 1 partition(s)
CPU times: user 2.51 s, sys: 1.7 s, total: 4.21 s
Wall time: 21.2 s


# Inspect the Output

In [8]:
output_dataset = DocumentDataset.read_json(output_file_path, backend="cudf", add_filename=write_to_filename)
output_dataset.head(1)

Reading 1 files


Unnamed: 0,instruction_data_guard_poisoning_score,is_poisoned,text
0,0.011496,False,Instruction: Find a route between San Diego an...
