New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language-specific profanity filtering #28900
Conversation
# have custom filtering that takes locale into account for this word. | ||
program = generate_program('My Custom Profanity', 'fu') | ||
innocent_program = generate_program('My Innocent Program', 'funny tofu') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😂
Very nice, Brad! Just to confirm my understanding, an italian project containing |
Almost: We don't use a concept of an "Italian project." After this change, any project containing the word To get the viewer's language, we use the
Note that a project containing |
return word.to_s if r =~ text | ||
end | ||
WebPurify.find_potential_profanity(text, ['en', language_code]) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative I briefly considered and am now thinking might be better: For each word, if we are using one of the allowed languages, strip the word out of the text before we send it to WebPurify. Continue letting WebPurify make the go-no-go call.
- Pro: We only configure words here, we don't also have to unblock them in WebPurify.
- Pro: Our configuration has a more targeted effect on specific languages, will not necessarily block the word in question for all other languages.
- Con: It seems less "correct" to modify the text this way before sending it along - I don't know if this would impact WebPurify's effectiveness (if they're using n-grams, for example).
- Con: We aren't shortcutting any WebPurify API calls.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds interesting, but could get weird -- webpurify also looks for things like addresses and phone numbers, right? so if you have something like 555 fu 1212
and the fu
gets stripped, webpurify might complain about the now-PII-like number. unsure if this will happen in practice, but modifying what we send is making me worry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've got WebPurify's PII filtering turned off right now:
Because we check those ourselves before we contact WebPurify:
code-dot-org/lib/cdo/share_filtering.rb
Lines 32 to 42 in 4ba0d0a
email = RegexpUtils.find_potential_email(program_tags_removed) | |
return ShareFailure.new(FailureType::EMAIL, email) if email | |
street_address = Geocoder.find_potential_street_address(program_tags_removed) | |
return ShareFailure.new(FailureType::ADDRESS, street_address) if street_address | |
phone_number = RegexpUtils.find_potential_phone_number(program_tags_removed) | |
return ShareFailure.new(FailureType::PHONE, phone_number) if phone_number | |
expletive = WebPurify.find_potential_profanity(program_tags_removed, ['en', locale]) | |
return ShareFailure.new(FailureType::PROFANITY, expletive) if expletive |
But I share your concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO the number of problematic words has been fairly small in practice, so I think requiring the manual step of adding exceptions via WebPurify is a perfectly reasonable and scrappy solution.
LP-401 Provide a wrapper around WebPurify that allows us to maintain a small language-specific custom profanity list.
How we currently filter profanity
When a user asks to view a project, we perform our own PII checks and then ask WebPurify to check for profanity in the viewer's selected language and English.
What we can't do right now
WebPurify already has support for multiple languages in their default profanity check. In the past we've used the built-in allowlist/blocklist feature to handle edge-case words we've found. Unfortunately, WebPurify does not support language-specific custom allowlists or blocklists. We've run into a few specific cases where a word we'd like to continue blocking in some languages (in particular for English viewers) should clearly be unblocked in others. For example:
These are coming back blocked, probably because we're checking them in both English and the viewer's language. We want to continue using that extra-careful strategy, but be able to add exceptions as they arise.
This solution
I've added a small wrapper around our WebPurify call that checks our own language-aware blocklist first, giving us more fine-grained control over edge cases. Now, we can choose to block a word for all languages except a specified few.
The procedure for adding a new word with a language-specific allow rule is:
LANGUAGE_SPECIFIC_ALLOWLIST
configuration at the top of profanity_filter.rb, along with the set of ISO 639-1 codes for the languages that should allow it.A possible concern is the maintenance cost of a custom word list. To give a sense of the expected size of that list: Since we started using WebPurify in October 2014, we've added two blocklist words and six allowlist words in their dashboard. This PR adds two more. I suspect we'll have less than twenty words on this custom list for a long, long time.
We may want to follow this work with a review of the words currently on our WebPurify allow/block lists and see if we want to introduce finer rules for any of them.