-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the slur filter editable from the site itself #622
Comments
I'll have to think about this. Hard-coding it means I don't have to do a database migration every time someone comes up with a new slur. And putting it in a DB table means someone could very easily remove it by deleting every row of that table, which isn't good. I want to make it very difficult for racist trolls to use the most updated version of Lemmy. As far as adding / removing slurs, I'd rather these be done by community consensus and basic standards of decency. So far we haven't gotten any requests to add or remove specific slurs, so a debate / process hasn't emerged, but I imagine as lemmy grows, this will happen naturally too. Multi-language slur filter would require things from #440 #391 , mainly to analyze the incoming language, and then apply the correct language slur filter. This is so far in the future its not worth thinking about right now. |
Is this not clear enough? Slurs are against our code of conduct and the goals of this project. Go to voat or gab if you want to use racist or sexist slurs; we don't allow them here. |
Not that I want to use slurs or that anyone else should, just that you have a hard-coded words restrictions. But that sounds exactly like a thing that should be entirely configurable by the admin, as they are the ones in charge of moderation on the instance. |
They have a point, if this is supposed to be an open source alternative to Reddit in which other users can use, wouldn't the site creator or creators be able to edit all aspects of the software? |
If you dont like it, fork it. Stop bothering us about it, we will never fully remove the slur filter. |
Racism, sexism, transphobia et al are against the code of conduct and goals of this project. Re-arranging how slurs are filtered, where they're stored, or what should be included is up for discussion, but the existence of a slur filter is not. Again, your "uncensored" reddit experience already exists: go to voat or gab if you want to use slurs. |
Having the ability to edit the words without having to recompile is a legitimate issue. It's already very easy to remove that regex, it's not going to stop anyone. Maintenance of the word list isn't so easy specially when it can grow any moment and be in any language. Please, don't turn this into an us vs them issue, it's irrelevant. Don't sacrifice flexibility just to make a point. |
Whether updating the slurs comes from a DB migration, or another line in a source code file, still requires an update of the software, but doesn't require anyone running it to recompile on their own. In 99% of cases, it'll just be changing a line in docker-compose.yml from Unless you're talking about having all instances rely on a slurs file hosted somewhere, I don't see how it could get any more "flexible". Again, it should not be easy for an instance to remove the slur filter, and use the most updated version of Lemmy. |
Would it make sense to have an "additional slurs" field to add to the hard-coded list? I agree with having a hard-coded list but I can see there being instances that may want to add additional words as needed. |
Ya I could definitely see that being useful. |
Unfortunately, this is extremely easy to bypass to the point where I don't think the protection of a hard coded or otherwise difficult to edit slur filter is nearly worth the disadvantages to good instance maintainers. Someone could easily fork the project with the slur filter removed and just pull new code while always ignoring that file or method. Anyone with even a basic understanding of programming will be able to locate and disable the slur filter. It's also extremely easy to bypass by using alternate UTF-8 characters such as stylized letters or letter based mathematical symbols. However, the disadvantages are many with a hard coded slur filter. The biggest one is that it massively over-blocks as it has no concept of context. Words in other languages sometimes get misidentified (which will be a problem for non-English instances), as do parts of benign English words (smartwatch gets blocked because of the word tw*t). It'll be extremely awkward when someone's own name gets blocked, and it will make them feel unwelcome. Each of these cases would require a refinement to the filter and in the hard coded model would involve editing the source files and recompiling the entire backend, which at best is annoying for the instance maintainer (especially if it's volatile and an update will reset it), at worse is the instance maintainer doesn't know how to properly change the slur filter (regular expressions are difficult to write and even more difficult to write well) and can break it, and at worst if they really don't know what they're doing, they can break the entire backend or create a security vulnerability since they would need to edit one of the Rust source files. Another example I can think of would be a niche instance dedicated to literature where people analysing older texts may post exact quotes which may contain words that by today's standards are offensive. I believe that in general, context matters when it comes supposedly "offensive" words. |
localization makes static lists hard |
I agree that having a hard coded list is a bad idea. Context is a huge part of language, and a hardcoded filter completely ignores it. See the Scunthorpe problem as one example. This thread on Tildes also brings up a lot of good points. I'm all for blocking Nazis, but there's got to be a better way. |
I hope this could be reconsidered and looked at more maturely and from both views and not just "word bad and offends I don't like, so we block." Filtering words goes far beyond this. But first, I assumed, it is only if the instance you're on applies the filter to their communities, now in it's current state I assumed if you're on an instance that applies it to their communities, it would still be applied to other instances that host those communities and choose it on. But I guess this is not the case, as you mention here it is hardcoded, which you made it seem in the past but I thought you got a little confused. This is absolutely terrible, please read my latter to hear my point. In a theoretical state, where it would not be hardcoded into Lemmy, but set per instance: I quite honestly still fail to see why this would be an instance controlled thing, rather than a user choice. If an instance wants it on, that's understandable. But if users want to turn it off client side, and uncensor everything, I fail to see why this would be an issue. It gives users the freedom and assurance to know that nothing is being censored if they wish, not everyone gets upset over the same things, not everyone finds everything offensive, why not let them just uncensor everything locally? Everything would still be filtered for people who keep the filter enabled obviously. This is also an issue when it comes to politics, and everything else. Who's to say an instance who has a major political bias, cannot just start censoring a politician's name, or what not. I think the language filter should be able to be customized by the instance host which I assume it is, and then also allowed to be turned off client side in which that toggle would affect EVERY instance that implements it so they can cross communities on different instances and know what they are seeing is exactly what the user on the other end typed. And for users that don't want that? Well they can choose to keep the filter on. I know this has been a controversial subject, not just with me but with many others. I think this would be the best way about doing this, you give users the freedom to see exactly what they want, while still allowing the instances to filter words. It's just up to the user whether or not they want to see it filtered or unfiltered. This is extremely important. For all we know, if you have a major political bias towards Trump for example, you could just throw the word "Trump" onto the hard-coded filter list, and that's it. This would piss off a lot of users. Filter lists are absolutely terrible for censorship. This has absolutely NOTHING to do with "wanting to be racist, sexist, etc." It goes a lot further than that, it is the CORE concept of not giving control to a certain small group of people. There is nothing you can do to stop someone from making an instance hosting extremely controversial material, the language filter means nothing to them as they can just use different words and you'd be playing cat and mouse with those words just to get them out. Ontop of doing this, you may be adding thousands of meaningless words to your filter, which would also harm users on all other instances. Instead of just choosing to let them keep their instance up, and NOT FEDERATE WITH THEM. Nobody on YOUR instance will have anything to do with them, they won't see the posts or communities or anything. I don't understand why it cannot be left to this. I have mentioned this in the past, and you just deleted my post and called me racist and sexist and said to go back to Gab. I had made it extremely clear, that my post was not standing up to racism, sexism, or anything. I was talking purely theoretical, and as I mentioned in this post, it goes far beyond being racist. You can be the nicest person ever, hate racism and sexism and whatever and completely be against it, but also be against the idea that people get censored and can't say what they want to say. Ontop of this, there are more languages than just English, you'd have to filter everything you consider bad in every language, as well. Again also words are used differently in different regions. And I'm sure the language filter wouldn't be perfect. You're going to play cat and mouse with the filter, even if compiled by the comminity, for absolutely no real benefit and massive loss in flexibility, when you could just leave it to the instances. I like Lemmy, I want to see it grow, this is why I am speaking out about this issue. The Fediverse should give the control to the users. That is all it is about. Let the users decide. From the amount of posts being made about the language filter, it is clear I am not the only one who has concerns about this. Please be more open minded. It is not fair to the users to hardcode a language filter into Lemmy that applies to all instances, it does nothing but hurt the users. Let instances choose to federate with other instances, and apply language filters to those instances, and even better would be to also allow users to turn off the language filter locally across all instances.
Also I completely agree with this, this is something we do today, long term this is not a great idea. Words change, meanings change. I actually had a few more statements, one of which relates to this and questions the possibility of implementation of a blockchain like system/P2P/etc. for long-term archival, which I typed up but decided to post the part about the language filter here and go more in depth. You can read my comments in #647 Please, please, look at this from our point of view and understand how this looks bad. We WANT Lemmy to grow, and see it do well. This is why we speak out about these concerns. This is why we use the Fediverse, because we care. If we didn't care we wouldn't be sitting here complaining about it typing long paragraphs of our concerns trying to help you understand. It's not because we are racist and sexist or whatever. Please just stop throwing that on us, it is frustrating. Please consider being more open minded about this. |
Honestly, this is the only relevant point. Hardcoding anything is antithetical to the point of federation. Mastodon doesn't have a word filter. It doesn't need one. Instances handle the issue of bigots by blocking them from federation. There is no reason that Lemmy would be any different. Regardless, the word filter isn't even going to stop anyone and achieve your stated goal. Forking and automating it out on code updates is trivial. I don't quite get what you're trying to achieve. The chapo.chat fork has already stated their intention to make this editable on runtime. |
Can you at least use bare-minimum techniques like stemming and other normalization (letter normalization would be needed to handle things like "slur" being bypassed by "slür" for example)? For example, there was someone who tried to talk about Stardew Valley and only managed to talk about Sremovedew Valley. At least focus on a competent implementation before accusing people asking for some flexibility as being voat concern trolls. Its existing hard coded list is also centered on American/European English speaking countries, with English slurs in countries like South Africa being completely ignored. Is this "federated" software only intended for those predominantly white populations? |
There are a lot of valid concerns about improving the implementation and allowing extension of the filter (for non-English speaking communities especially). But I want to make the case for keeping it hardcoded, and push back on the notion that "anyone will be able to fork it anyway." Yes, it's true, anyone will ultimately be able to get around it, but I think the goal is to add friction to the process. And I think that's a worthwhile goal that will materially impact hateful speech online. The Stratechery newsletter -- which I very often disagree with -- makes this case very well. From a newsletter that was talking about the NSA revelations in 2013:
And here's a snippet from another, more recent newsletter, in this case pondering the tradeoffs of end-to-end encryption on platforms like Facebook where such encryption might hinder law enforcement:
"Evil folks will always be able to figure out the most efficient way to be evil." I wholeheartedly agree with this, and I think that's why I'm not too convinced by arguments that take the form "people will get around it anyway." In this light, maybe what we should think about are the things we want to have the most friction, and the things we want to have the least. There should be minimal friction in:
There should be maximum friction in:
To me, this means keeping the filter somewhat centralized and opinionated, yet at the same time frequently updated. It sounds like the repo's maintainers want to take on the responsibility of doing that, and that this work aligns with the goals of the project. |
When the "maximum friction" can be subverted with trivial automated scripts and uploaded for anyone to use, I question the value of maintaining a clunky static filter.
…-------- Original message --------From: jcfrancisco <notifications@github.com> Date: 7/27/20 2:20 PM (GMT-05:00) To: LemmyNet/lemmy <lemmy@noreply.github.com> Cc: Andrew Donshik <andrewdonshik@gmail.com>, Comment <comment@noreply.github.com> Subject: Re: [LemmyNet/lemmy] Make the slur filter editable from the site itself (#622)
There are a lot of valid concerns about improving the implementation and allowing extension of the filter (for non-English speaking communities especially).
But I want to make the case for keeping it hardcoded, and push back on the notion that "anyone will be able to fork it anyway."
Yes, it's true, anyone will ultimately be able to get around it, but I think the goal is to add friction to the process. And I think that's a worthwhile goal that will materially impact hateful speech online.
The Stratechery newsletter -- which I very often disagree with -- makes this case very well. From a newsletter that was talking about the NSA revelations in 2013:
David Simon, of The Wire fame, wasn’t that impressed with the Verizon revelations:
Having labored as a police reporter in the days before the Patriot Act, I can assure all there has always been a stage before the wiretap, a preliminary process involving the capture, retention and analysis of raw data. It has been so for decades now in this country. The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data. But the legal and moral principles? Same old stuff.
Allow for a comparable example, dating to the early 1980s in a place called Baltimore, Maryland.
The example involves pay phones and pagers, and the collection of metadata surrounding calls, but not the calls themselves. To requote Simon:
The only thing new here, from a legal standpoint, is the scale on which the FBI and NSA are apparently attempting to cull anti-terrorism leads from that data.
Let’s say Simon is right, and was universally acknowledged as such; I bet the outrage would persist. The problem is the lack of friction.
In Baltimore, those detectives had to identify the relevant pay phones, install a dialed-number recorder on each pay phone, clone the pagers, and even then they often didn’t know who the drug dealers were.
Things are much easier today; global communications is largely routed through a few key backbone and service providers, many of which are located in the US. It’s arguably easier to collect the call records of everyone on the planet – and identify them – than it was to collect the records and identities of those Baltimore drug dealers.
One could argue that friction was the foundation of our privacy, and now friction is gone.
And here's a snippet from another, more recent newsletter, in this case pondering the tradeoffs of end-to-end encryption on platforms like Facebook where such encryption might hinder law enforcement:
The fact of the matter, as I noted above, is that encryption is a real things that exists, and it is not going anywhere. Evil folks will always be able to figure out the most efficient way to be evil. The question, though, is how much friction do we want to introduce into the process? Do we want to make it the default that the most user-friendly way to discover your “community”, particularly if that community entails the sexual abuse of children, is by default encrypted? Or is it better that at least some modicum of effort — and thus some chance that perpetrators will either screw up or give up — be necessary?
To take this full circle, I find those 12 million Facebook reports to be something worth celebrating, and preserving. But, if Zuckerberg follows through with his “Privacy-Focused Vision for Social Networking”, the opposite will occur. I do remain a fierce defender of encryption, and opponent of backdoors, but at the same time, we do as a society at some point have to grapple with the downside of the removal of Friction.
"Evil folks will always be able to figure out the most efficient way to be evil." I wholeheartedly agree with this, and I think that's why I'm not too convinced by arguments that take the form "people will get around it anyway."
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.
|
yeah i'm pretty sure your wow-so-hard-to-bypass filter is like one line of regex lmaooo |
^ Maybe so for experienced users, but users who aren't as computer savvy still have to do some work to figure it out. If it's that easy, then maybe the goal should be to make it harder. |
Users who aren't as computer savvy aren't hosting instances. How, out of curiosity, do you suggest making it harder in an open source project? |
if you think the venn diagram of "people who can set up this federated social network service and deploy it for a userbase" and "people who can grep the codebase for EXAMPLE_SLUR and edit it out" isn't a circle then i don't know what to tell you lmao |
I stand corrected. I thought you were suggesting some kind of client-side workaround and your point is well-taken!
I don't know. Got any ideas? |
I think this is besides the point though, no? Of course they'll do it if they want to. But the friction in this case would be - I'm shopping around for something to deploy for my forum, I check out Lemmy, I find that it has some things I don't want for my community like a slur filter, so I decide to deploy something else. If the goal is for as many people to use Lemmy as possible then a hardcoded slur filter makes no sense - you want to support as many use cases you can and bend over backwards for people. But I'm not sure that's the goal of this project (the maintainers can feel free to chime in if I'm wrong) |
So instead of using Lemmy, the hypothetical server host would use Forked-and-Auto-Patched-Not-Lemmy? Again, it just seems like a royal waste of effort. |
Exactly right -- compelling people to make the extra effort is the point! |
Uhm, I meant extra effort on the part of the maintainers of Lemmy. |
Gotcha. Well, it's on them to decide if it's worth it. Reading between the lines of their comments here, my sense is they find intrinsic value just in having a strong stance on these words in the main branch, regardless of whether it can be circumvented or not. I think that's valuable in and of itself. But I won't speak for them -- I think I've made my point & don't want to take up all the air here |
I suppose you're right. It is, in fact, their prerogative to make decisions that screw themselves over. No reason to keep bickering over it. |
If you want to report any issues with chapo.chat (like this about the word "bastard" being filtered), please report it on their issue tracker as the instance is heavily modified. |
Did you even read the reason I mentioned them?
…-------- Original message --------From: Felix Ableitner <notifications@github.com> Date: 7/27/20 6:30 PM (GMT-05:00) To: LemmyNet/lemmy <lemmy@noreply.github.com> Cc: Andrew Donshik <andrewdonshik@gmail.com>, Comment <comment@noreply.github.com> Subject: Re: [LemmyNet/lemmy] Make the slur filter editable from the site itself (#622)
If you want to report any issues with chapo.chat (like this about the word "bastard" being filtered), please report it on their issue tracker as the instance is heavily modified.
https://gitlab.com/chapo-sandbox/production
—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.
|
Here's simple example of the problems with languages and forbidden words: https://languagelog.ldc.upenn.edu/nll/?p=48302 |
Danish lemmy user is concerned about slur filter because a slur means something else in Danish |
As context can be a significant factor in whether a word is offensive or not, would some kind of sentiment analysis (either used in tandem or as a replacement) be a good idea? |
No IMO, not only would something like that be impossible to do because of the complexities of language, but it would also be counter-productive: in public spaces there's no appropriate context in which bigoted slurs should be used. |
That's true, but the context can deduce that the matched string is actually a part of a normal word or name, or is a word in some other languages. (Sentiment analysis doesn't have with that tho so I take back what I said above.) |
This comment was marked as abuse.
This comment was marked as abuse.
This comment has been minimized.
This comment has been minimized.
There's a reason why email spam filtering evolved from simple word-based blocking, to scoring and further to trained bayesian filtering etc. |
@w3bb We have made our policy clear on this topic, and we are not going to change it. So there is no point arguing about it, especially not in unrelated issues. If you dont like it, you can fork Lemmy or simply not use it. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This is done now with #1481 |
why not just have the 'slur filter' be opt-out in settings, with some kind of a TOS that has to be agreed to if the server decides to opt out? releasing 'Lemmy' from anything if they choose to do so? |
This issue was completed a long time ago, the slur filter is entirely optional, you can add one using the config: https://join-lemmy.org/docs/en/administration/configuration.html |
It's generally not a good idea to hard code something like the slur filter because the needs of every instance is different. Instances in another language would need their versions, and cases where the slur filter over blocks need to be addressed by the admins.
A good idea would be to store the slur filter in the database and initialize it with a default when spinning up an instance, but make it editable by admins without changing source files.
Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.
The text was updated successfully, but these errors were encountered: