Software Techniques to Detect Unacceptable Handles #1329

bnewbold · 2023-07-13T16:29:15Z

bnewbold
Jul 13, 2023
Maintainer

The current typescript PDS implementation uses a static list of reserved and slur handle prefixes, which are disallowed on registration.

This is a discussion topic for various algorithms, approaches, and techniques for detecting additional unwanted handles. For example, Levenshtein edit distance, phonetic detection, visual detection, sub-string matching, the Scunthorpe problem, processing exceptions, etc.

HarryGogonis · 2023-07-13T16:46:23Z

HarryGogonis
Jul 13, 2023

Ideas

prioritize writing unit tests so it can be updated with found cases to ban
check other fields not just handles Block slurs in feed names #1328
perhaps scope this as a new library or in a way blsky members can easily use it on their own projects

0 replies

simonblack · 2023-07-13T16:50:19Z

simonblack
Jul 13, 2023

I think this would be solved most easily (in the long term) by using a machine learning model as opposed to solving something like the Scunthorpe problem

in the meantime, permutations of a community (and internally) built template list of slurs, which are then matched by regex would cover most of the cases in a democratic manner.

given a word "hello" , the implementation of #1326 would add [hello,h3llo,hell0,h3ll0,hellos,h3llos,hell0s,h3ll0s,hello5,h3llo5,hell05,h3ll05] to the list, and then if we add regex, we could consider repeating characters as well , for example

h+3+l+l+0+5+

This would give a fair balance of implementation speed/complexity and effectiveness while something more robust like machine learning can be implemented (which i'm assuming is something that will need to be done regardless with scale, hence the suggestion)

0 replies

intrnl · 2023-07-13T16:52:20Z

intrnl
Jul 13, 2023

I have to be honest for a bit and say that working on a slur filter is not a very effective use of time and effort.

There's only so much you can do until nature produces a slightly more clever racist, and then it gets to the point of deciding whether this ambiguous set of letters constitutes as a slur.

The best way for handling this would be a reactive mod team, which I'm sure we do have, this isn't the first case of a user having a slur in their handle. Just two weeks ago we've banned a user with a slur domain like this (you've been warned)

13 replies

simonblack Jul 13, 2023

yep this is what im thinking. it was working fine until a few clever trolls changed the game. we just need to make some slight adjustments and keep up for now, thats all

syke99 Jul 13, 2023

This shouldn't be approached as an either/or situation, in my honest opinion.

To begin with, there should be specifically be code fixes. No, you can't catch everything with code. But completely disregarding that tool in favor of a, as you put it,reactive mod team would be, I'd wager, even less effective in the long term.

Combining both of these techniques would be ideal long term. Code can both catch a large swath, as well as give the mod team a tool to improve to help them. If your mod team isn't using some sort of T&S tool that can be improved, then exponentially more cases of the same slurs (and other restricted words) will get through. And thus, your mod team will have to rely on users to report these accounts with the same slurs over and over again and get bogged down.

Implement code to filter these slurs before they even get through, and hire a T&S/mod team to to address new ones and submit tickets for updates to the filter (or do it themselves if they're code-savvy), imho.

intrnl Jul 13, 2023

If I'd have to add something other than my mouthpiece, it would probably be LLMs,

but I really do not trust LLMs to take any immediate actions whatsoever and do not trust people to make the right decisions on how to balance flagging and it taking immediate actions.

syke99 Jul 13, 2023

sure, an LLM would also help in the long run, too. but i agree, it wouldn't be effective enough, soon enough due to having to train it effectively (and we pretty much know the effort to build one internally is well out of the question right now, but would add a LOT to that timeframe as well). So it should be an incremental process attacked from multiple angles to be most effective, and more importantly, effective quickly

intrnl Jul 17, 2023

something I should've noted as I originally wrote this:

the unfortunate part about having a slur filter is that so long as this social media is heavily catered towards the Anglosphere, then other languages would be disincentivized from participating, especially languages with a common word that are written the same way or similarly to an English slur.

I can think of two languages I know where an extended filter can get very problematic:

non-formal Indonesian speech involves "ngga" for "no"
Japanese users will type in Romaji if IME is not readily available, so 逃げろ "nigero" for "run"

there is no sane way to handle this, this is what has been unconsciously ticking me off since the initial discussions of a slur filter.

now sure, will anyone use them as a handle? that's a valid question, but I certainly don't see why non-English speakers shouldn't be allowed from using the words they want, especially if it's not a slur.

another thing to be kept in mind is that people will suggest the slur filter be used for outside of handles as well, so this is still an important point to consider.

robotblake · 2023-07-13T16:58:34Z

robotblake
Jul 13, 2023

Yoel Roth pointed out this paper regarding open T&S tools. It's a good read, and relevantly there is a list (with links) to tooling that already exist on the final / Appendix page.

https://www.atlanticcouncil.org/wp-content/uploads/2023/06/scaling-trust-on-the-web_annex2.pdf

0 replies

SlickDomique · 2023-07-13T17:06:28Z

SlickDomique
Jul 13, 2023

It is still possible to change into a custom domain handle and have slur in every possible form. I address this and add better protection here #1320

I agree that this would be a band-aid temporary solution that should be replaced with a proper library, but the fix for custom domains will still be useful

3 replies

simonblack Jul 13, 2023

I think this + regex + permutations might provide the best balance for now

SlickDomique Jul 13, 2023

100% agree. I wanted to do something similar to what you've done few hours ago but was too busy

simonblack Jul 13, 2023

also unrelated to specifically handles, but as @HarryGogonis mentioned, this would probably be a good addition to the stack:

#1328

however if we go one deeper to filter things like posts and bios I say we replace those with **** instead of outright restrictions

Scotchester · 2023-07-13T18:43:33Z

Scotchester
Jul 13, 2023

I've got a technique for you: Don't remove known slurs from the current list without explanation or review.

0 replies

ixtli · 2023-07-13T19:03:46Z

ixtli
Jul 13, 2023

i dont want to derail this as its more of a specific discussion on algorithms reasonable to implement in the tsx, but it may be meaningful to note that you could use more complex / expensive sentiment analysis tools on content creation / modification from users known to have a certain minimum "distance"** from those who have for any reason had action taken against them for violation of hate speech rules.

regex/combinatorics/disallow lists have been shown to only be able to go so far, and our shared experience with FB and Twitter show us that when there are fixed sets of disallowed phrases this teaches bad actors how to communicate hate / violence without repercussion.

** distance could be distance in the invite tree, social graph network distance, etc.

0 replies

mletterle · 2023-07-13T21:26:33Z

mletterle
Jul 13, 2023

I know it's more fun to talk about the technical challenges, but until #1325 starts being addressed it's largely moot.

3 replies

simonblack Jul 13, 2023

Fair point

However Github is strictly a place to discuss code, if you'd like to discuss this I advise you try your luck contacting the moderation team directly via email.

mletterle Jul 14, 2023

This is incorrect, Github is adamantly not strictly a place to discuss code. Thinking that software, especially software like Bluesky, is only about code is part of how we have ended up here.

Also there is no moderation team, that's also part of the problem.

simonblack Jul 14, 2023

Agree to disagree

Scotchester · 2023-07-14T12:29:42Z

Scotchester
Jul 14, 2023

For folks that might have missed it: #1336

Still waiting for an apology for the original issue, and an explanation of the words removed in #1319.

1 reply

Scotchester Jul 23, 2023

Acknowledging that several flavors of explanation and apology have now been given, and I take them at their word, though I still wish they had come much quicker.

Sominemo · 2023-07-16T15:12:29Z

Sominemo
Jul 16, 2023

I'll duplicate what I said under the original PR here:

Dictionary filtering is a very basic but also very flawed technique:

It has a lot of false positives: filtered words inside of legitimate words; words from other languages; junctions between two words, — and combinations of these. Even English has a word which contains the hard R inside itself. Trying to consider all the other languages and the junctions, and it becomes unmanageable.
It's easy to bypass: no matter how sophisticated your anti-bypass filters are, humans always find ways to put something offensive in all sorts of places like usernames

I expect it to hurt legitimate users more than preventing abuse.

This way of things is "good enough" for the current state of the project when moderation is not in priority and is unaffordable, but I think it shouldn't be cemented into the foundation of the software and expected to be improved.

In my opinion, it's better to consider this solution to be temporary instead of thinking how to improve it.

For the future in far perspective, I'd suggest relying on human moderation and heuristics instead. For example, a warning system for moderators when an account with a handle where a dictionary word is detected signs up or a few reports are sent about such handle.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Software Techniques to Detect Unacceptable Handles #1329

{{title}}

Replies: 10 comments 20 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Software Techniques to Detect Unacceptable Handles #1329

bnewbold Jul 13, 2023 Maintainer

Replies: 10 comments · 20 replies

bnewbold
Jul 13, 2023
Maintainer

Replies: 10 comments 20 replies