-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testing framework needed to assist slur filtering efforts #1337
Comments
I think that's an excellent example of a problem that a programmer doesn't necessarily have the right background to solve. A rule for handles is probably not going to work universally, e.g., as a content filter in general. Someone who posts exclusively in French probably shouldn't get dinged for an English slur. |
As another example, the word "Nazi" probably doesn't need to be in anyone's handle, even if something like NaziPunksFuckOff is collateral damage. But at the same time, Nazi probably should be allowed in a block list, e.g. "List of Nazis and White Supremacist accounts" |
This is why Bluesky needs to engage with disabled people and POC. We are so tirrrrrrred of the word games and linguistic gymnastics. |
I think that's a really good explanation for why there needs to be dedicated employees for this task. It's a lot of work, and the people who are affected most by it don't really have a lot of time and energy to spend on it, unless they're getting paid to do it as their job. |
I found it! |
A lot of new people to GitHub, so here's a bit of background. In this "pull request" the developer @dholms added a list of slurs to filter out: Shortly after, a second PR was made by @bnewbold that put things in alphabetical order and removed a few that the developer disagreed with: You can click on "Files changed" to see the differences. |
Looks like there's a start by @dholms |
Yup I'm deploying these changes now: #1336. We made a more sophisticated flagging system that looks for misspellings and slurs within words. We outright ban explicit slurs and flag words that may contain slurs and require more context from human moderators. We opted to keep the slur list that we use for flagging private. By keeping it public, we allow bad actors to find loopholes. Because of this, we won't be doing a public testing framework. Of course, reports from in app can help us keep our list up to date |
They need to implement this with internationalization out of the box. English users are obviously the loudest and because the developers speak English and are presumably from the US, they hear the concerns of English speakers first and loudest. But there's tons of other languages being spoken on Bluesky with users that deserve to have a safe experience too. |
I am not trotting anything here but that is a genuine concern, it's not just french, in Catalan it also means delay. El meu tren va amb retard translates as "My train is running late", I get retard is a slur in English, but it turns out English is not the only language being spoken on earth and non-english speakers also deserve to exist... If you guys were a decent platform you'd enable federation and leave the moderation to actual human beings who can decide for themselves if someone is being ableist or just French. ActivityPub works perfectly fine, trust me. But alternatively, have you looked into Large Language Models? They'd probably do a better job at deciding whether a word is a slur or not than a "dumb" regex... |
There is currently a lot of interest in creating filters to prevent users from using slurs in handles, and presumably could be extended to other features like lists, display names, etc.
Just in the last few days:
#1326
#1324
#1323
#1322
#1321
#1332
#1319
#1318
This may seem like a straightforward feature to implement, but it can become rapidly more complex as more languages are considered and if regex filters are used to try and prevent variations and workarounds.
Numerous examples of this kind of filtering causing problems can be found in the Wikipedia article for the Scunthorpe Problem
When multiple languages are considered, it's probably a good idea to make sure that overly aggressive filtering does not exclude normal innocent words. I know simple vulgarity is not being considered for filtering, but as an example, the word "Slut" in Swedish means "End," and can easily trigger a vulgarity filter. Similarly, words like "Niger" or "Queer" are likely to show up in pre-made lists of slurs, and it's not obvious whether or not they should be filtered.
I think as a starting point, it might make sense to make two lists (or categories of lists): One that contains a variety of slurs in various forms, included obfuscated with alternative text, and one for inoffensive words that should not be flagged.
A good filter should flag everything on the naughty list, and nothing on the nice list.
The overall goal here is that if someone is an absolute genius at regular expressions, but not comfortable deciding whether or not "Dykes-on-Bikes" is an acceptable handle for a motorcycle club, they can lean on the testing framework. At the same time, someone who is not a skilled coder but does have safety expertise can contribute to the test cases and let programmers finish the job.
I'm very interested in feedback from people with a background or expertise in safety.
The text was updated successfully, but these errors were encountered: