Testing framework needed to assist slur filtering efforts #1337

WilliamRoyNelson · 2023-07-13T20:44:06Z

There is currently a lot of interest in creating filters to prevent users from using slurs in handles, and presumably could be extended to other features like lists, display names, etc.

Just in the last few days:
#1326
#1324
#1323
#1322
#1321
#1332
#1319
#1318

This may seem like a straightforward feature to implement, but it can become rapidly more complex as more languages are considered and if regex filters are used to try and prevent variations and workarounds.

Numerous examples of this kind of filtering causing problems can be found in the Wikipedia article for the Scunthorpe Problem

When multiple languages are considered, it's probably a good idea to make sure that overly aggressive filtering does not exclude normal innocent words. I know simple vulgarity is not being considered for filtering, but as an example, the word "Slut" in Swedish means "End," and can easily trigger a vulgarity filter. Similarly, words like "Niger" or "Queer" are likely to show up in pre-made lists of slurs, and it's not obvious whether or not they should be filtered.

I think as a starting point, it might make sense to make two lists (or categories of lists): One that contains a variety of slurs in various forms, included obfuscated with alternative text, and one for inoffensive words that should not be flagged.

A good filter should flag everything on the naughty list, and nothing on the nice list.

The overall goal here is that if someone is an absolute genius at regular expressions, but not comfortable deciding whether or not "Dykes-on-Bikes" is an acceptable handle for a motorcycle club, they can lean on the testing framework. At the same time, someone who is not a skilled coder but does have safety expertise can contribute to the test cases and let programmers finish the job.

I'm very interested in feedback from people with a background or expertise in safety.

ghost · 2023-07-13T21:02:44Z

Just popping in with my two bobs worth.
Can we please not have the etymology discourse around the words "retard" or "retarded". They are straight up slurs. If you're trotting out the "But in French it means delay", then you're actually much closer to understanding why that translates as a slur. Also, attached are 4 accounts that should be dealt with. Thanks, Susie

WilliamRoyNelson · 2023-07-13T21:24:31Z

I think that's an excellent example of a problem that a programmer doesn't necessarily have the right background to solve.

A rule for handles is probably not going to work universally, e.g., as a content filter in general. Someone who posts exclusively in French probably shouldn't get dinged for an English slur.

WilliamRoyNelson · 2023-07-13T21:29:23Z

As another example, the word "Nazi" probably doesn't need to be in anyone's handle, even if something like NaziPunksFuckOff is collateral damage. But at the same time, Nazi probably should be allowed in a block list, e.g. "List of Nazis and White Supremacist accounts"

ghost · 2023-07-13T21:39:43Z

This is why Bluesky needs to engage with disabled people and POC. We are so tirrrrrrred of the word games and linguistic gymnastics.

WilliamRoyNelson · 2023-07-13T22:36:58Z

This is why Bluesky needs to engage with disabled people and POC. We are so tirrrrrrred of the word games and linguistic gymnastics.

I think that's a really good explanation for why there needs to be dedicated employees for this task. It's a lot of work, and the people who are affected most by it don't really have a lot of time and energy to spend on it, unless they're getting paid to do it as their job.

ghost · 2023-07-13T22:43:49Z

William can you help me find the link to this? I took some screenshots and now can't navigate back to it (GitHub Newbie!).
It shows the changes, but things are very nebulous and I would like to check in on it again. Many thanks, Susie

ghost · 2023-07-13T22:52:52Z

I found it!

WilliamRoyNelson · 2023-07-13T23:31:33Z

A lot of new people to GitHub, so here's a bit of background.

In this "pull request" the developer @dholms added a list of slurs to filter out:
#1318

Shortly after, a second PR was made by @bnewbold that put things in alphabetical order and removed a few that the developer disagreed with:
#1319

You can click on "Files changed" to see the differences.

WilliamRoyNelson · 2023-07-14T00:01:02Z

Looks like there's a start by @dholms
Obviously I'm asking for something more robust, but this is moving in the right direction
https://github.com/bluesky-social/atproto/blob/6cf949f0f39006d1b396d693636c9f828ad2df3d/packages/pds/tests/handle-validation.test.ts

dholms · 2023-07-14T03:46:16Z

Yup I'm deploying these changes now: #1336.

We made a more sophisticated flagging system that looks for misspellings and slurs within words. We outright ban explicit slurs and flag words that may contain slurs and require more context from human moderators.

We opted to keep the slur list that we use for flagging private. By keeping it public, we allow bad actors to find loopholes. Because of this, we won't be doing a public testing framework. Of course, reports from in app can help us keep our list up to date

ZASMan · 2023-07-15T00:08:00Z

They need to implement this with internationalization out of the box. English users are obviously the loudest and because the developers speak English and are presumably from the US, they hear the concerns of English speakers first and loudest. But there's tons of other languages being spoken on Bluesky with users that deserve to have a safe experience too.

codingneko · 2023-07-16T23:16:51Z

If you're trotting out the "But in French it means delay", then you're actually much closer to understanding why that translates as a slur.

I am not trotting anything here but that is a genuine concern, it's not just french, in Catalan it also means delay.

El meu tren va amb retard translates as "My train is running late", I get retard is a slur in English, but it turns out English is not the only language being spoken on earth and non-english speakers also deserve to exist...

If you guys were a decent platform you'd enable federation and leave the moderation to actual human beings who can decide for themselves if someone is being ableist or just French. ActivityPub works perfectly fine, trust me.

But alternatively, have you looked into Large Language Models? They'd probably do a better job at deciding whether a word is a slur or not than a "dumb" regex...

WilliamRoyNelson mentioned this issue Jul 13, 2023

Block slurs in feed names #1328

Closed

dholms closed this as completed Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testing framework needed to assist slur filtering efforts #1337

Testing framework needed to assist slur filtering efforts #1337

WilliamRoyNelson commented Jul 13, 2023

ghost commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

ghost commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

ghost commented Jul 13, 2023

ghost commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

WilliamRoyNelson commented Jul 14, 2023

dholms commented Jul 14, 2023

ZASMan commented Jul 15, 2023

codingneko commented Jul 16, 2023

Testing framework needed to assist slur filtering efforts #1337

Testing framework needed to assist slur filtering efforts #1337

Comments

WilliamRoyNelson commented Jul 13, 2023

ghost commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

ghost commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

ghost commented Jul 13, 2023

ghost commented Jul 13, 2023

WilliamRoyNelson commented Jul 13, 2023

WilliamRoyNelson commented Jul 14, 2023

dholms commented Jul 14, 2023

ZASMan commented Jul 15, 2023

codingneko commented Jul 16, 2023