Added NSFW text validator#83
Open
rkritika1508 wants to merge 15 commits intofeat/toxicity-hub-validatorsfrom
Open
Added NSFW text validator#83rkritika1508 wants to merge 15 commits intofeat/toxicity-hub-validatorsfrom
rkritika1508 wants to merge 15 commits intofeat/toxicity-hub-validatorsfrom
Conversation
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
dennyabrain
reviewed
Apr 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Target issue is #84
Explain the motivation for making this change. What existing problem does the pull request solve?
This PR introduces a new validator:
nsfw_text, powered by Guardrails AI, with support for custom HuggingFace models.We configure it to use:
textdetox/xlmr-large-toxicity-classifier
This significantly improves our ability to detect multilingual, code-mixed, and context-aware unsafe content, addressing gaps in the current validator stack.
Our existing validator suite (profanity, slur detection, toxic language, etc.) primarily relies on:
While effective for straightforward cases, they fail in more complex real-world scenarios, such as:
Existing Validators (Context)
To make this PR self-contained, here’s a quick overview of what we already use and where they fall short:
We need a strong model-based classifier to complement these.
What This PR Adds
1. New Validator:
nsfw_textFile:
backend/app/core/validators/config/nsfw_text_safety_validator_config.py2. Custom Model Integration
Instead of using default models, we configure: textdetox/xlmr-large-toxicity-classifier
This model is
Design Decision
Previous Approach (Discarded)
Current Approach
NSFWTextvalidatormodel_nameBenefits
Checklist
Before submitting a pull request, please ensure that you mark these task.
fastapi run --reload app/main.pyordocker compose upin the repository root and test.Notes
Please add here if any other information is required for the reviewer.