# Toxicity Detection Pipeline

The toxicity detection pipeline is designed to identify and classify toxic content in text data. 

+ The pipeline can be configured using YAML files, or programmatically as shown below.
+ There are different ways to run the pipeline: You can use the `detect_toxicity` function directly, or set up a Gradio web application as shown in the README.md.

This notebook demonstrates **how to set up and run the pipeline using a Hugging Face API key**. There are several possibilities to execute this notebook. You can, for instance,

1. execute this notebook on Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/debatelab/toxicity-detector/blob/master/toxicity_pipeline_intro.ipynb), or
2. execute this notebook locally in, for instance, [JupyterLab](https://jupyter.org/). The second option requires you to have Python installed (specifics ) installed on your machine.


## Prerequisites: Using Hugging Face API Key

## Installing the required packages

In [None]:
%pip install toxicity-detector

## Configuration of the pipeline

In [None]:
from toxicity_detector import detect_toxicity
from toxicity_detector.config import PipelineConfig

pipeline_config = PipelineConfig(
    # We use the Llama-3.2-3B model via the Hugging Face API
    # This model is very small and only suitable for demonstration purposes.
    # For serious use cases, consider using a larger and newer model.
    used_chat_model='Llama-3.2-3B',
    local_base_path='.',
    result_data_path='result_data',
    log_path='logs',
    models={
        'Llama-3.2-3B': {
            'name': 'Llama-3.2-3B',
            'description': 'Llama-3.2-3B over together.ai',
            'base_url': 'https://router.huggingface.co/together/v1',
            'model': 'meta-llama/Llama-3.2-3B-Instruct-Turbo',
            # Passing the API key directly here for demonstration purposes. 
            # In practice, you should use environment variables an pass here the 
            # name of the variable that holds the key, e.g.,
            # 'api_key_name': '<YOUR_ENV_VARIABLE_NAME>',
            'api_key': '<YOUR_HUGGINGFACE_API_KEY>',
            'llm_chain': 'chat-chain',
            'model_kwargs': {
                'max_tokens': 1024,
                'temperature': 0.2,
            }
        },
    }
)

## Running the pipeline via `detect_toxicity`

The function `detect_toxicity` will run the toxicity detection pipeline on the provided input text, return the reulst and store them in the specified result data path (the YAML file that contains the result contains all information to reproduce the run and the full reasoning trace).

In [None]:
from IPython.display import display, Markdown

input_text = 'Peter is dumn.'

result = detect_toxicity(
    input_text=input_text,
    user_input_source=None, 
    toxicity_type='personalized_toxicity',
    context_info=None,
    pipeline_config=pipeline_config,
    serialize_result=True,
)

# Display results as formatted markdown
analysis = result.answer['analysis_result']
contains_toxicity = result.answer['contains_toxicity']

markdown_output = f"""
## üîç Toxicity Analysis Results

### Input Text
> {input_text}

### Contains Toxicity
**{'‚ö†Ô∏è YES' if contains_toxicity else '‚úÖ NO'}**

### Analysis Result
{analysis}
"""

display(Markdown(markdown_output))