Language-specific profanity filtering #28900

islemaster · 2019-06-03T23:05:54Z

LP-401 Provide a wrapper around WebPurify that allows us to maintain a small language-specific custom profanity list.

How we currently filter profanity

When a user asks to view a project, we perform our own PII checks and then ask WebPurify to check for profanity in the viewer's selected language and English.

What we can't do right now

WebPurify already has support for multiple languages in their default profanity check. In the past we've used the built-in allowlist/blocklist feature to handle edge-case words we've found. Unfortunately, WebPurify does not support language-specific custom allowlists or blocklists. We've run into a few specific cases where a word we'd like to continue blocking in some languages (in particular for English viewers) should clearly be unblocked in others. For example:

fu, Italian for "it was."
fick, Swedish for "got" or "received."

These are coming back blocked, probably because we're checking them in both English and the viewer's language. We want to continue using that extra-careful strategy, but be able to add exceptions as they arise.

This solution

I've added a small wrapper around our WebPurify call that checks our own language-aware blocklist first, giving us more fine-grained control over edge cases. Now, we can choose to block a word for all languages except a specified few.

The procedure for adding a new word with a language-specific allow rule is:

Add the word to the LANGUAGE_SPECIFIC_ALLOWLIST configuration at the top of profanity_filter.rb, along with the set of ISO 639-1 codes for the languages that should allow it.
Add the word to our WebPurify project's allowlist through their dashboard.

A possible concern is the maintenance cost of a custom word list. To give a sense of the expected size of that list: Since we started using WebPurify in October 2014, we've added two blocklist words and six allowlist words in their dashboard. This PR adds two more. I suspect we'll have less than twenty words on this custom list for a long, long time.

We may want to follow this work with a review of the words currently on our WebPurify allow/block lists and see if we want to introduce finer rules for any of them.

Erin007 · 2019-06-03T23:11:48Z

lib/test/cdo/test_share_filtering.rb

+    # have custom filtering that takes locale into account for this word.
+    program = generate_program('My Custom Profanity', 'fu')
+    innocent_program = generate_program('My Innocent Program', 'funny tofu')
+


davidsbailey · 2019-06-03T23:17:38Z

Very nice, Brad! Just to confirm my understanding, an italian project containing fu will be blocked (or not) depending on the current language of the user who is viewing it, correct?

islemaster · 2019-06-03T23:32:44Z

Almost: We don't use a concept of an "Italian project." After this change, any project containing the word fu will be blocked unless the current language of the viewer is Italian.

To get the viewer's language, we use the request.locale as input to our profanity check, here:

code-dot-org/shared/middleware/helpers/profanity_privacy_helper.rb

Line 11 in 8990f16

share_failure = share_failure_from_body body, request.locale

Note that a project containing fu is not guaranteed to be un-blocked for an Italian viewer - it falls through to the normal WebPurify filtering, which should catch any other profanity present.

islemaster · 2019-06-03T23:54:24Z

lib/cdo/profanity_filter.rb

+      return word.to_s if r =~ text
+    end
+    WebPurify.find_potential_profanity(text, ['en', language_code])
+  end


An alternative I briefly considered and am now thinking might be better: For each word, if we are using one of the allowed languages, strip the word out of the text before we send it to WebPurify. Continue letting WebPurify make the go-no-go call.

Pro: We only configure words here, we don't also have to unblock them in WebPurify.

Pro: Our configuration has a more targeted effect on specific languages, will not necessarily block the word in question for all other languages.

Con: It seems less "correct" to modify the text this way before sending it along - I don't know if this would impact WebPurify's effectiveness (if they're using n-grams, for example).

Con: We aren't shortcutting any WebPurify API calls.

Thoughts?

sounds interesting, but could get weird -- webpurify also looks for things like addresses and phone numbers, right? so if you have something like 555 fu 1212 and the fu gets stripped, webpurify might complain about the now-PII-like number. unsure if this will happen in practice, but modifying what we send is making me worry.

We've got WebPurify's PII filtering turned off right now:

Because we check those ourselves before we contact WebPurify:

code-dot-org/lib/cdo/share_filtering.rb

Lines 32 to 42 in 4ba0d0a

email = RegexpUtils.find_potential_email(program_tags_removed)

return ShareFailure.new(FailureType::EMAIL, email) if email

street_address = Geocoder.find_potential_street_address(program_tags_removed)

return ShareFailure.new(FailureType::ADDRESS, street_address) if street_address

phone_number = RegexpUtils.find_potential_phone_number(program_tags_removed)

return ShareFailure.new(FailureType::PHONE, phone_number) if phone_number

expletive = WebPurify.find_potential_profanity(program_tags_removed, ['en', locale])

return ShareFailure.new(FailureType::PROFANITY, expletive) if expletive

But I share your concern.

IMHO the number of problematic words has been fairly small in practice, so I think requiring the manual step of adding exceptions via WebPurify is a perfectly reasonable and scrappy solution.

islemaster added 4 commits June 3, 2019 13:06

Failing test

444f032

Another failing test

d6a0a28

Allow specific words in specific languages

07c2135

Extract configuration

e6a8e1c

islemaster added the learning-platform teacher dashboard, projects, etc label Jun 3, 2019

islemaster requested review from maddiedierker and Erin007 June 3, 2019 23:05

Erin007 reviewed Jun 3, 2019

View reviewed changes

Erin007 approved these changes Jun 3, 2019

View reviewed changes

islemaster commented Jun 3, 2019

View reviewed changes

islemaster merged commit c2df8d1 into staging Jun 5, 2019

islemaster deleted the custom-language-filter branch June 5, 2019 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Language-specific profanity filtering #28900

Language-specific profanity filtering #28900

islemaster commented Jun 3, 2019 •

edited

Erin007 Jun 3, 2019

davidsbailey commented Jun 3, 2019

islemaster commented Jun 3, 2019 •

edited

islemaster Jun 3, 2019

davidsbailey Jun 4, 2019

islemaster Jun 4, 2019

davidsbailey Jun 4, 2019

	email = RegexpUtils.find_potential_email(program_tags_removed)
	return ShareFailure.new(FailureType::EMAIL, email) if email

	street_address = Geocoder.find_potential_street_address(program_tags_removed)
	return ShareFailure.new(FailureType::ADDRESS, street_address) if street_address

	phone_number = RegexpUtils.find_potential_phone_number(program_tags_removed)
	return ShareFailure.new(FailureType::PHONE, phone_number) if phone_number

	expletive = WebPurify.find_potential_profanity(program_tags_removed, ['en', locale])
	return ShareFailure.new(FailureType::PROFANITY, expletive) if expletive

Language-specific profanity filtering #28900

Language-specific profanity filtering #28900

Conversation

islemaster commented Jun 3, 2019 • edited

Erin007 Jun 3, 2019

Choose a reason for hiding this comment

davidsbailey commented Jun 3, 2019

islemaster commented Jun 3, 2019 • edited

islemaster Jun 3, 2019

Choose a reason for hiding this comment

davidsbailey Jun 4, 2019

Choose a reason for hiding this comment

islemaster Jun 4, 2019

Choose a reason for hiding this comment

davidsbailey Jun 4, 2019

Choose a reason for hiding this comment

islemaster commented Jun 3, 2019 •

edited

islemaster commented Jun 3, 2019 •

edited