Software Techniques to Detect Unacceptable Handles #1329
Replies: 10 comments 20 replies
-
Ideas
|
Beta Was this translation helpful? Give feedback.
-
I think this would be solved most easily (in the long term) by using a machine learning model as opposed to solving something like the Scunthorpe problem in the meantime, permutations of a community (and internally) built template list of slurs, which are then matched by regex would cover most of the cases in a democratic manner. given a word "hello" , the implementation of #1326 would add [hello,h3llo,hell0,h3ll0,hellos,h3llos,hell0s,h3ll0s,hello5,h3llo5,hell05,h3ll05] to the list, and then if we add regex, we could consider repeating characters as well , for example
This would give a fair balance of implementation speed/complexity and effectiveness while something more robust like machine learning can be implemented (which i'm assuming is something that will need to be done regardless with scale, hence the suggestion) |
Beta Was this translation helpful? Give feedback.
-
I have to be honest for a bit and say that working on a slur filter is not a very effective use of time and effort. There's only so much you can do until nature produces a slightly more clever racist, and then it gets to the point of deciding whether this ambiguous set of letters constitutes as a slur. The best way for handling this would be a reactive mod team, which I'm sure we do have, this isn't the first case of a user having a slur in their handle. Just two weeks ago we've banned a user with a slur domain like this (you've been warned) |
Beta Was this translation helpful? Give feedback.
-
Yoel Roth pointed out this paper regarding open T&S tools. It's a good read, and relevantly there is a list (with links) to tooling that already exist on the final / Appendix page. https://www.atlanticcouncil.org/wp-content/uploads/2023/06/scaling-trust-on-the-web_annex2.pdf |
Beta Was this translation helpful? Give feedback.
-
It is still possible to change into a custom domain handle and have slur in every possible form. I address this and add better protection here #1320 I agree that this would be a band-aid temporary solution that should be replaced with a proper library, but the fix for custom domains will still be useful |
Beta Was this translation helpful? Give feedback.
-
I've got a technique for you: Don't remove known slurs from the current list without explanation or review. |
Beta Was this translation helpful? Give feedback.
-
i dont want to derail this as its more of a specific discussion on algorithms reasonable to implement in the tsx, but it may be meaningful to note that you could use more complex / expensive sentiment analysis tools on content creation / modification from users known to have a certain minimum "distance"** from those who have for any reason had action taken against them for violation of hate speech rules. regex/combinatorics/disallow lists have been shown to only be able to go so far, and our shared experience with FB and Twitter show us that when there are fixed sets of disallowed phrases this teaches bad actors how to communicate hate / violence without repercussion. ** distance could be distance in the invite tree, social graph network distance, etc. |
Beta Was this translation helpful? Give feedback.
-
I know it's more fun to talk about the technical challenges, but until #1325 starts being addressed it's largely moot. |
Beta Was this translation helpful? Give feedback.
-
For folks that might have missed it: #1336 Still waiting for an apology for the original issue, and an explanation of the words removed in #1319. |
Beta Was this translation helpful? Give feedback.
-
I'll duplicate what I said under the original PR here: Dictionary filtering is a very basic but also very flawed technique:
I expect it to hurt legitimate users more than preventing abuse. This way of things is "good enough" for the current state of the project when moderation is not in priority and is unaffordable, but I think it shouldn't be cemented into the foundation of the software and expected to be improved. In my opinion, it's better to consider this solution to be temporary instead of thinking how to improve it. For the future in far perspective, I'd suggest relying on human moderation and heuristics instead. For example, a warning system for moderators when an account with a handle where a dictionary word is detected signs up or a few reports are sent about such handle. |
Beta Was this translation helpful? Give feedback.
-
The current typescript PDS implementation uses a static list of reserved and slur handle prefixes, which are disallowed on registration.
This is a discussion topic for various algorithms, approaches, and techniques for detecting additional unwanted handles. For example, Levenshtein edit distance, phonetic detection, visual detection, sub-string matching, the Scunthorpe problem, processing exceptions, etc.
Beta Was this translation helpful? Give feedback.
All reactions