Added NSFW text validator by rkritika1508 · Pull Request #83 · ProjectTech4DevAI/kaapi-guardrails

rkritika1508 · 2026-04-02T13:10:47Z

Summary

Target issue is #84
Explain the motivation for making this change. What existing problem does the pull request solve?
This PR introduces a new validator: nsfw_text, powered by Guardrails AI, with support for custom HuggingFace models.

We configure it to use:
textdetox/xlmr-large-toxicity-classifier

This significantly improves our ability to detect multilingual, code-mixed, and context-aware unsafe content, addressing gaps in the current validator stack.

Our existing validator suite (profanity, slur detection, toxic language, etc.) primarily relies on:

rule-based approaches
keyword matching
lighter ML models

While effective for straightforward cases, they fail in more complex real-world scenarios, such as:

multilingual inputs (Hindi + English mix, etc.)
paraphrased toxicity (non-explicit abusive phrasing)
regional slang / dialects
implicit or contextual NSFW content

Existing Validators (Context)

To make this PR self-contained, here’s a quick overview of what we already use and where they fall short:

Validator	What it does	Limitation
Profanity Free	Detects explicit swear words	Misses implicit or paraphrased toxicity
NSFW (existing basic)	Flags explicit content	Not robust for nuanced or mixed-language input
Llama Guard (policy)	Policy-based filtering	Not specialized for fine-grained NSFW detection

We need a strong model-based classifier to complement these.

What This PR Adds

1. New Validator: `nsfw_text`

File: backend/app/core/validators/config/nsfw_text_safety_validator_config.py

2. Custom Model Integration

Instead of using default models, we configure: textdetox/xlmr-large-toxicity-classifier

This model is

Multilingual (based on XLM-R)
Handles code-mixed inputs well
Better contextual understanding vs keyword-based systems

Design Decision

Previous Approach (Discarded)

Directly calling HuggingFace models separately
Added extra dependency + integration overhead

Current Approach

Use Guardrails NSFWText validator
Plug in custom HuggingFace model via model_name

Benefits

No additional abstraction layer
Fully compatible with existing Guardrails pipeline
Centralized validator management
Easy to swap models in future

Checklist

Before submitting a pull request, please ensure that you mark these task.

Ran fastapi run --reload app/main.py or docker compose up in the repository root and test.
If you've fixed a bug or added code that is tested and has test cases.

Notes

Please add here if any other information is required for the reviewer.

coderabbitai · 2026-04-02T13:10:55Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 020b5145-2c73-453c-8f34-df5382e3be12

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/toxicity-huggingface-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…gface-model

backend/app/core/validators/README.md

…gface-model

Added NSFW text validator

9ab64c7

rkritika1508 and others added 11 commits April 2, 2026 18:45

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

57d97b2

…gface-model

doc: updated details of sentence parameter

f4a11fa

fix: remove vscode files

f330f1b

fix: profanity free validator description

baac9e4

Added integration tests

627fb4f

validator config: add name to config (#79)

8b3da89

added integration tests

cc0bb14

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

3037eb8

…gface-model

added integration tests

b69883d

updated readme

8f67176

Added installation of huggingface model in dockerfile

affe72d

dennyabrain reviewed Apr 7, 2026

View reviewed changes

backend/app/core/validators/README.md Outdated Show resolved Hide resolved

rkritika1508 self-assigned this Apr 7, 2026

rkritika1508 added enhancement New feature or request ready-for-review labels Apr 7, 2026

rkritika1508 linked an issue Apr 7, 2026 that may be closed by this pull request

Add toxicity validator for HuggingFace models #84

Open

rkritika1508 added 3 commits April 7, 2026 17:09

resolved comment

8b0a183

removed blank line

14f6dc1

Merge branch 'feat/toxicity-hub-validators' into feat/toxicity-huggin…

0d15d0c

…gface-model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added NSFW text validator#83

Added NSFW text validator#83
rkritika1508 wants to merge 15 commits intofeat/toxicity-hub-validatorsfrom
feat/toxicity-huggingface-model

rkritika1508 commented Apr 2, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 2, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rkritika1508 commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Existing Validators (Context)

What This PR Adds

1. New Validator: nsfw_text

2. Custom Model Integration

Design Decision

Previous Approach (Discarded)

Current Approach

Benefits

Checklist

Notes

Uh oh!

coderabbitai bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rkritika1508 commented Apr 2, 2026 •

edited

Loading

1. New Validator: `nsfw_text`

coderabbitai bot commented Apr 2, 2026 •

edited

Loading