# Evaluating Opik's Moderation Metric

*This cookbook was created from a Jypyter notebook which can be found [here](TBD).*

For this guide we will be evaluating the Moderation metric included in the LLM Evaluation SDK which will showcase both how to use the `evaluation` functionality in the platform as well as the quality of the Moderation metric included in the SDK.

In [2]:
# Configure OpenAI
import os
import getpass

os.environ["COMET_URL_OVERRIDE"] = "http://localhost:5173/api"
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API key: ")

We will be using the [OpenAI Moderation API Release dataset](https://github.com/openai/moderation-api-release/tree/main/data) which according to this [blog post](https://openai.com/index/using-gpt-4-for-content-moderation/) GPT-4o detects ~60~% of hallucinations. The first step will be to create a dataset in the platform so we can keep track of the results of the evaluation.

In [3]:
# Create dataset
from opik import Opik, DatasetItem
import pandas as pd
import requests
from io import BytesIO

client = Opik()
try:
    # Create dataset
    dataset = client.create_dataset(name="OpenAIModerationDataset", description="OpenAI Moderation Dataset")

    # Insert items into dataset
    url = "https://github.com/openai/moderation-api-release/raw/main/data/samples-1680.jsonl.gz"
    response = requests.get(url)
    df = pd.read_json(BytesIO(response.content), lines=True, compression='gzip')

    df = df.sample(n=500, random_state=42)
    
    dataset_records = []
    for x in df.to_dict(orient="records"):
        moderation_fields = ["S", "H", "V", "HR", "SH", "S3", "H2", "V2"]
        moderated_fields = [field for field in moderation_fields if x[field] == 1.0]
        expected_output = "moderated" if moderated_fields else "not_moderated"

        dataset_records.append(
            DatasetItem(
                input = {
                    "input": x["prompt"]
                },
                expected_output = {
                    "expected_output": expected_output,
                    "moderated_fields": moderated_fields
                }
            ))
    
    dataset.insert(dataset_records)

except Exception as e:
    print(e)

status_code: 409, body: {'errors': ['Dataset already exists']}


In [1]:
from opik.evaluation.metrics import Moderation
from opik.evaluation import evaluate
from opik.evaluation.metrics import base_metric, score_result
from opik import Opik, DatasetItem

client = Opik()

class CheckModerated(base_metric.BaseMetric):
    def __init__(self, name: str):
        self.name = name

    def score(self, moderation_score, moderation_reason, expected_moderation_score, **kwargs):
        moderation_score = "moderated" if moderation_score > 0.5 else "not_moderated"

        return score_result.ScoreResult(
            value= None if moderation_score is None else moderation_score == expected_moderation_score,
            name=self.name,
            reason=f"Got the moderation score of {moderation_score} and expected {expected_moderation_score}",
            scoring_failed=moderation_score is None
        )

def evaluation_task(x: DatasetItem):
    metric = Moderation()
    try:
        metric_score = metric.score(
            input= x.input["input"]
        )
        moderation_score = metric_score.value
        moderation_reason = metric_score.reason
    except Exception as e:
        print(e)
        moderation_score = None
        moderation_reason = str(e)
    
    return {
        "moderation_score": moderation_score,
        "moderation_reason": moderation_reason,
        "expected_moderation_score": x.expected_output["expected_output"]
    }

dataset = client.get_dataset(name="OpenAIModerationDataset")

res = evaluate(
    experiment_name="Check Comet Metric",
    dataset=dataset,
    task=evaluation_task,
    scoring_metrics=[CheckModerated(name="Detected Moderation")]
)

Running tasks: 100%|██████████| 500/500 [00:34<00:00, 14.44it/s]
Scoring outputs: 100%|██████████| 500/500 [00:00<00:00, 379712.48it/s]


We are able to detect ~85% of moderation violations, this can be improved further by providing some additional examples to the model.