Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing framework needed to assist slur filtering efforts #1337

Closed
WilliamRoyNelson opened this issue Jul 13, 2023 · 12 comments
Closed

Testing framework needed to assist slur filtering efforts #1337

WilliamRoyNelson opened this issue Jul 13, 2023 · 12 comments

Comments

@WilliamRoyNelson
Copy link

There is currently a lot of interest in creating filters to prevent users from using slurs in handles, and presumably could be extended to other features like lists, display names, etc.

Just in the last few days:
#1326
#1324
#1323
#1322
#1321
#1332
#1319
#1318

This may seem like a straightforward feature to implement, but it can become rapidly more complex as more languages are considered and if regex filters are used to try and prevent variations and workarounds.

Numerous examples of this kind of filtering causing problems can be found in the Wikipedia article for the Scunthorpe Problem

When multiple languages are considered, it's probably a good idea to make sure that overly aggressive filtering does not exclude normal innocent words. I know simple vulgarity is not being considered for filtering, but as an example, the word "Slut" in Swedish means "End," and can easily trigger a vulgarity filter. Similarly, words like "Niger" or "Queer" are likely to show up in pre-made lists of slurs, and it's not obvious whether or not they should be filtered.

I think as a starting point, it might make sense to make two lists (or categories of lists): One that contains a variety of slurs in various forms, included obfuscated with alternative text, and one for inoffensive words that should not be flagged.

A good filter should flag everything on the naughty list, and nothing on the nice list.

The overall goal here is that if someone is an absolute genius at regular expressions, but not comfortable deciding whether or not "Dykes-on-Bikes" is an acceptable handle for a motorcycle club, they can lean on the testing framework. At the same time, someone who is not a skilled coder but does have safety expertise can contribute to the test cases and let programmers finish the job.

I'm very interested in feedback from people with a background or expertise in safety.

@ghost
Copy link

ghost commented Jul 13, 2023

Just popping in with my two bobs worth.
Can we please not have the etymology discourse around the words "retard" or "retarded". They are straight up slurs. If you're trotting out the "But in French it means delay", then you're actually much closer to understanding why that translates as a slur. Also, attached are 4 accounts that should be dealt with. Thanks, Susie
Screenshot_2023-07-13-16-42-05-93_40deb401b9ffe8e1df2f1cc5ba480b12
Screenshot_2023-07-13-16-42-23-61_40deb401b9ffe8e1df2f1cc5ba480b12
Screenshot_2023-07-13-16-42-55-52_40deb401b9ffe8e1df2f1cc5ba480b12
Screenshot_2023-07-13-16-43-33-76_40deb401b9ffe8e1df2f1cc5ba480b12

@WilliamRoyNelson
Copy link
Author

I think that's an excellent example of a problem that a programmer doesn't necessarily have the right background to solve.

A rule for handles is probably not going to work universally, e.g., as a content filter in general. Someone who posts exclusively in French probably shouldn't get dinged for an English slur.

@WilliamRoyNelson
Copy link
Author

As another example, the word "Nazi" probably doesn't need to be in anyone's handle, even if something like NaziPunksFuckOff is collateral damage. But at the same time, Nazi probably should be allowed in a block list, e.g. "List of Nazis and White Supremacist accounts"

@ghost
Copy link

ghost commented Jul 13, 2023

This is why Bluesky needs to engage with disabled people and POC. We are so tirrrrrrred of the word games and linguistic gymnastics.

@WilliamRoyNelson
Copy link
Author

This is why Bluesky needs to engage with disabled people and POC. We are so tirrrrrrred of the word games and linguistic gymnastics.

I think that's a really good explanation for why there needs to be dedicated employees for this task. It's a lot of work, and the people who are affected most by it don't really have a lot of time and energy to spend on it, unless they're getting paid to do it as their job.

@ghost
Copy link

ghost commented Jul 13, 2023

William can you help me find the link to this? I took some screenshots and now can't navigate back to it (GitHub Newbie!).
It shows the changes, but things are very nebulous and I would like to check in on it again. Many thanks, Susie
Screenshot_2023-07-14-06-09-13-34_40deb401b9ffe8e1df2f1cc5ba480b12

@ghost
Copy link

ghost commented Jul 13, 2023

I found it!

@WilliamRoyNelson
Copy link
Author

A lot of new people to GitHub, so here's a bit of background.

In this "pull request" the developer @dholms added a list of slurs to filter out:
#1318

Shortly after, a second PR was made by @bnewbold that put things in alphabetical order and removed a few that the developer disagreed with:
#1319

You can click on "Files changed" to see the differences.

@WilliamRoyNelson
Copy link
Author

Looks like there's a start by @dholms
Obviously I'm asking for something more robust, but this is moving in the right direction
https://github.com/bluesky-social/atproto/blob/6cf949f0f39006d1b396d693636c9f828ad2df3d/packages/pds/tests/handle-validation.test.ts

@dholms
Copy link
Collaborator

dholms commented Jul 14, 2023

Yup I'm deploying these changes now: #1336.

We made a more sophisticated flagging system that looks for misspellings and slurs within words. We outright ban explicit slurs and flag words that may contain slurs and require more context from human moderators.

We opted to keep the slur list that we use for flagging private. By keeping it public, we allow bad actors to find loopholes. Because of this, we won't be doing a public testing framework. Of course, reports from in app can help us keep our list up to date

@dholms dholms closed this as completed Jul 14, 2023
@ZASMan
Copy link

ZASMan commented Jul 15, 2023

They need to implement this with internationalization out of the box. English users are obviously the loudest and because the developers speak English and are presumably from the US, they hear the concerns of English speakers first and loudest. But there's tons of other languages being spoken on Bluesky with users that deserve to have a safe experience too.

@codingneko
Copy link

If you're trotting out the "But in French it means delay", then you're actually much closer to understanding why that translates as a slur.

I am not trotting anything here but that is a genuine concern, it's not just french, in Catalan it also means delay.

El meu tren va amb retard translates as "My train is running late", I get retard is a slur in English, but it turns out English is not the only language being spoken on earth and non-english speakers also deserve to exist...

If you guys were a decent platform you'd enable federation and leave the moderation to actual human beings who can decide for themselves if someone is being ableist or just French. ActivityPub works perfectly fine, trust me.

But alternatively, have you looked into Large Language Models? They'd probably do a better job at deciding whether a word is a slur or not than a "dumb" regex...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants