Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identifier: tweaks and additions to slur list #1319

Merged
merged 1 commit into from
Jul 13, 2023
Merged

Conversation

bnewbold
Copy link
Collaborator

No description provided.

@bnewbold bnewbold merged commit 0bac4e7 into main Jul 13, 2023
@bnewbold bnewbold deleted the tweak-slur-list branch July 13, 2023 05:37
@Scotchester
Copy link

Some highly questionable removals from the list in here. Care to defend them, or nah?

@douglaschu
Copy link

Could we get some more information on why there were removals from the list?

@undefinedopcode
Copy link

Highly questionable changes. And merged without review. Is this how folks roll in this repo?

@Aguyuno
Copy link

Aguyuno commented Jul 13, 2023

Still looking for an explanation on those removals fam

@siobhandougall
Copy link

The next time I hear someone say code can’t be racist or ableist, I’ll just point them to this commit. This is a 🤌🏻 illustration.

@bnewbold anything to say about it? Or are you just gonna lock comments so you all can pretend there are no consequences?

@acgh213
Copy link

acgh213 commented Jul 13, 2023

Some words that should've stayed put got removed, and that's not cool. It seems we're opening the door to potential harm or offense.

Maybe we could amp up our feedback methods, make them more transparent, and really get everyone involved? Just throwing some ideas out there.

I understand this is a quick patch onto a community commit, but i really think there needs to be a proper response and discussion opened about this, not later, today.

@mletterle
Copy link

It's not a good look.

Yeah, some explanation would be welcome.

@mletterle
Copy link

As an aside, I wonder about the wisdom of all these being hardcoded, seems like it would be better to have separate wordlist files (and include defaults). This would make updates easier (just edit the wordlist in production) and allow future federated servers to tweak the lists to their standards.

@ShaneWWatson
Copy link

I'm wondering why this only applies to handles newly created and not to changed handles? That's how we ended up with the debacle we had this morning. Also, what's up with the removal of certain words in the list? What was the logic behind that?

@scarletcs
Copy link

Yes so, why did several real slurs get removed? They're still slurs.

@JamesH33
Copy link

Could potentially use the work that was already done here:

https://github.com/Blank-Cheque/Slurs

@ftdftd
Copy link

ftdftd commented Jul 13, 2023

Just adding to the chorus of voices demanding an explanation on this.

@ixtli
Copy link

ixtli commented Jul 13, 2023

I made a discussion about this in case maintainers don't check conversation on merged prs #1325

@robotblake
Copy link

My guess is that this got merged without any thorough review, which is... not great.

@ferranrego
Copy link

ferranrego commented Jul 13, 2023

I endorse the political background of the concerns rised and how important and sensitive the topic is. I expect not to offend anyone.

Said that, I support some of the changes on the current list made by the devs (and I would like to know more about the rationale behind some others, because I think they can do it better or worse but to me it seems they care about the topic taking a look at the blacklist).

I understand that these words can be harmful, but both negro and retard are Latin words commonly used in languages (such as Spanish, 4th most widely speaking language) in regular day-to-day conversations and are not necessarily pejorative. As example, negro in Spanish is a main color. Blacklisting "negro" is like blacklisting "black" in an English conversation.

Let's give a practical example: by prohibiting the use of the word "negro" a Spanish restaurant called "El Gato Negro" (The Black Cat in English) or a Catalan bot-account about delays in trains on the R15 line called "@RetardsR15" (delaysR15 in English) would not be able to create accounts.

That's why censoring these two specific words through blacklisting doesn't make any sense to me, and it's a quite anglocentric approach.

Blacklisting is not easy nor the ultimate solution. For edge cases we should find other complementary ways to moderate and make sure that these "edge case words" are not used for hate.

@ali-reeser
Copy link

ali-reeser commented Jul 13, 2023

It's still disgusting to have this PR with NO explanation or accountability for who approved and merged. There is no community involvement here. Good points but it's still ridiculous.

@Jesstradiol
Copy link

I second the recommendation of using the Blank Check Code.

Could potentially use the work that was already done here:

https://github.com/Blank-Cheque/Slurs

While things like negro are debatable and not easy to blacklist, many many of the variations removed from this push ARE.

Also, why on earth is this limited to account creation?
Why not when people change names?
Why not keep these out of bios, out of posts, and out of lists. I just saw a user who had a list title using some of these words,and was using them in posts.

An apology is owed to the users, especially the black ones. That's base level PR you guys.
For you Devs, no hard coding isn't the best solution for all words, but please use the regex provided. And know this is the work at hand. This doesn't get solved, might as well not work on anything else.

@JamesH33
Copy link

Even if they don't use the Blank Cheque list, I'm sure that others exist. This work has been done before, the wheel doesn't need to be reinvented.

@IAmJSD
Copy link

IAmJSD commented Jul 13, 2023

I have made my take clear on bluesky that I think word lists are a bad solution for moderation and T&S and flagging is the actual one. HOWEVER if you are going to do them, instead of hardcoding each possible combination of a bad word, it makes sense to turn them into regex where each character is substituted for a series of characters it could be. Else this is trivial to bypass (I mean it is anyway, but more so).

@IAmJSD
Copy link

IAmJSD commented Jul 13, 2023

That doesn't mean make them a long regex string to be clear, but actually compile it at runtime once and then store it.

@jmbohn807
Copy link

Just a small selection of the slurs you removed from the list (and, by extension, deemed acceptable to use). Care to tell us why?

'nigs',
'nig-nog',
'nig-nogs',

'retard',
'retards',

'shemale',
'shemales',

@londontim
Copy link

Really disappointed to see some of the words that have been removed; if this is going to be a solution, even in the interim, a whole lot more thought and transparency needs to go into it.

@bnb
Copy link

bnb commented Jul 13, 2023

It’s exceptionally disturbing to see a number of slurs were dropped.

Please listen to users here who are making valid suggestions for how to do better.

'pickaninnie',
'pickaninnies',
'pickaninny',
'pickaninnys',
'raghead',
'ragheads',
'retard',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remind me why this needs to be removed from the list? Isn't this abelist language?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Screen Shot 2023-07-13 at 18 38 43

Copy link

@ferranrego ferranrego Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @mackuba is saying, and as a speaker of 4 Latin languages, I have to agree that doesn't make sense to just blacklist some of these words because they are of common use (no pejorative) and we should find more sophisticated ways of moderation and avoid hate.

Further explanation here:

#1319 (comment)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree strongly. Having the ability to use this word in usernames is not worth the tradeoff of making others feel unwelcome. I’d like this to be a community that prioritizes psychological safety, especially when there are other ways to say “delay” in multiple languages.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, that totally explains why some of the other words are being removed too... /s I'm sure there are other uses of the word, but it's most known use is as an ableist slur. Remind me again why you're making this a hill to die on, fighting so hard for the use of a word to mean delay or slow down? Who's going to be hurt by the removal of this word? (No one). Who will be hurt by the USE of this word? (Lots of folks)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nod I can ALMOST guarantee that no one that works in this space ever feels like it's a solved problem. Ever. At least those that choose to work in the space rather than being voluntold to. :)

And I recognize, particularly the edge case negro for example, brings with it complications around common usage in Spanish. The decision points I'm seeing here are likely: 1) What's the moderation cost to handle the edge case, 2) How many users will be upset by not being able to use "Black" in Spanish in their username, 3) How many people will be offended by seeing "negro" pop up in their feeds?

They could likely do some user studies to test this theory. I will say, from my perspective the BEST way to test this theory while also creating a safe environment for as many users as they can, is to do what they're doing, and ban far and wide. Be as inclusive of any words you can find in as broad a swath as possible across all languages. Go how far you think you should go, then go farther.

When folks start chiming in "WTF?! Why did you ban xxxx?!" you can start to have some decent data around the impact because it's super explicit. It's much harder to get data the other way (not impossible certainly). But like, a user that leaves the platform (or just doesn't engage but doesn't leave) because they were tired of seeing slurs in usernames is harder to capture (unless you ask specifically in some kind of exit poll, but even then it's not likely you'll get large engagement because the person is already done with the platform).

Paramount is creating a safe space. In ALL languages. When you've gone too far and are at panic room level lockdown, you can ALWAYS back off and consider those edge cases at the other end once you've established some data around just how folks feel. But if your space isn't safe, you'll never find out cause folks aren't engaged with the platform.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are very interesting points and arguments to consider @GabeWeiss. I will think about them.

Thank you very much for taking your time to explain and sharing your takes!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ferranrego of course! Right back at ya. Glad it stayed civil. :D

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not arguing that you can't use that word in your own language. I'll sideline "negro" for a bit, because I realize that it means "black" in Spanish. "Retard" on the other hand is very offensive and has very little use outside of some edge cases that you mentioned. Again, how many businesses are hurt because they can't use "retard" in their username? I'd argue very few. Yet, because it is allowed, we have "Capt Retard" as a username now. I find it very telling that your argument to have the freedom to use your language as you wish negates my need to not be called a "retard". I'm not sitting over here telling you that English is superior or that we should consider English swear words above other use cases. What you bring to the table, specifically for "retard" is a rare case, where it's easily abused. Beyond that, there are other words I'd rather not type here that have been removed that have ZERO use case outside of being a slur. Outside of the word "negro", I don't see any reason why any of the words that have been removed (especially "retard") have any clear and common use case in any language. You're taking a one-off situation and holding it up as the standard, claiming that I'm being English-superior, which I am not. I'm just saying that your use of certain words as an edge case does not negate my need for not being called a version of stupid that implies that people with intellectual disabilities should be killed. I know you don't like this being a good/bad case, but I'd rather not have people imply certain things in their usernames, which I think is more important that someone's "freedom" to use a certain word.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the linguistic perspectives in this discussion. It's always been the case that a slur in one language may be a colloquial use in another.

But at the end of the day:

  1. Only the general population actually effected by a slur should get a say in its inclusion or exclusion.

To put it simple - you don't know, and you can't know the full harmful effects caused by the word. You are not an expert. No matter how much you try to justify that you are. You cannot possibly make an informed decision.

  1. If someone can't deal with the inability to use a few 5 character words in usernames, then they aren't capable of good faith reasonable discourse and are joining a social network with the intent of acting in bad faith and showing that they have no desire to adhere to further community rules. When invites are over, they'll just spam new accounts continuously with these words.

@dd-toronto
Copy link

Slurs and hateful language on Bluesky are not a trivial problem, and our concerns are not trivial either. If you want to be TruthSocial just say so and we will all go elsewhere.

@fauxpearls
Copy link

fauxpearls commented Jul 13, 2023

We have a lot of comments here focusing on what you removed, but I’m also concerned by some of what was added. You changed a perfectly reasonable heading about this being an incomplete list, and instead labeled it “naive”

the list is explicitly labeled as a list of slurs, but you have just added Hitler and KKK, neither of which are slurs.

Explain?

To be clear, I understand this is likely about preventing usernames with those words in them. I agree with that. They still aren’t slurs.

@Aguyuno
Copy link

Aguyuno commented Jul 13, 2023

We have a lot of comments here focusing on what you removed, but I’m also concerned by some of what was added. You changed a perfectly reasonable heading about this being an incomplete list, and instead labeled it “naive”

the list is explicitly labeled as a list of slurs, but you have just added Hitler and KKK, neither of which are slurs.

Explain?

So real quick I wanted to cover this because I misunderstood at first. This list is specifically for usernames vs a list of banned words. So like yeah no one cares if you just say the word hitler but also they don’t want a hitler@bluesky.social running around.

The additions are fine honestly. But the removals are all bad.

@ghost
Copy link

ghost commented Jul 13, 2023

#1319 (comment)
Thank you for responding and escalating this. Much appreciated. Cheers, Susie

@contolini
Copy link

I endorse the political background of the concerns rised and how important and sensitive the topic is. I expect not to offend anyone.

Said that, I support some of the changes on the current list made by the devs (and I would like to know more about the rationale behind some others, because I think they can do it better or worse but to me it seems they care about the topic taking a look at the blacklist).

I understand that these words can be harmful, but both negro and retard are Latin words commonly used in languages (such as Spanish, 4th most widely speaking language) in regular day-to-day conversations and are not necessarily pejorative. As example, negro in Spanish is a main color. Blacklisting "negro" is like blacklisting "black" in an English conversation.

Let's give a practical example: by prohibiting the use of the word "negro" a Spanish restaurant called "El Gato Negro" (The Black Cat in English) or a Catalan bot-account about delays in trains on the R15 line called "@RetardsR15" (delaysR15 in English) would not be able to create accounts.

That's why censoring these two specific words through blacklisting doesn't make any sense to me, and it's a quite anglocentric approach.

Blacklisting is not easy nor the ultimate solution. For edge cases we should find other complementary ways to moderate and make sure that these "edge case words" are not used for hate.

I appreciate the dialogue. FWIW your use of the word "blacklist" indicates you might want to read about inclusive language: https://www.vice.com/en/article/v7dd3d/we-need-to-stop-saying-blacklist-and-whitelist

'n1gga',
'n1gger',
'nazi ',
'negorid',
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a typo. Shouldn't it be negroid?

@ixtli
Copy link

ixtli commented Jul 13, 2023

I appreciate the dialogue. FWIW your use of the word "blacklist" indicates you might want to read about inclusive language: https://www.vice.com/en/article/v7dd3d/we-need-to-stop-saying-blacklist-and-whitelist

correct. ive been a professional engineer for 20 years and i prefer using the words allow/denylist because black and white were intended originally as segregationist: it's not an accident of history.

@quoll
Copy link

quoll commented Jul 13, 2023

Like many people above, I'm astounded at some of the removals. I appreciate that #1333 reverted some of these, but there are still some nasty ones that are still gone. What was @bnewbold thinking?

@ShaneWWatson
Copy link

Thanks to the recent changes we now have the "Captian Retard" and "retarded.bsky.social" accounts. Any of y'all both-siders want to explain how these could possibly mean "Captain Delay" or something like that?

@Scotchester
Copy link

Like many people above, I'm astounded at some of the removals. I appreciate that #1333 reverted some of these, but there are still some nasty ones that are still gone. What was @bnewbold thinking?

Save your appreciation for if that PR actually gets merged.

@ferranrego
Copy link

ferranrego commented Jul 13, 2023

Yes, I can explain it to you really easy, @ShaneWWatson.

Retard and Negro are basic and non-pejorative words from many Latin languages in the world. Just Spanish itself is the 4th language spoken worldwide.

Just because you don't want to take into account the languages of more than a billion people that uses these words regularly (because they are really basic non-pejorative words), doesn't mean that we should just go straight an blacklist it. We can do better.

Imho the Bluesky Team should find a better way to tackle this issue with words that are of regular use in other languages. Like tracking these words and reviewing them manually or automatically to see if they infringe something when users use it.

Blacklisting is not a magic wand. It works for a lot of words but not for all of them, because you are at risk of censuring unnecessary also other people.

And, imho, this way of discussing without considering other peoples perspectives and realities it's not beneficial to anyone.

I would advocate for a discussion that is constructive and seeks solutions. And that it maintains these minimums of antiracism, antixenophobia, antifascism and antiharassement on which most I think we agree.

@vscabral
Copy link

I feel like we're all aware that there are better solutions than a blacklist, but in the meantime when people are mostly seeming to use these words specifically as slurs on the platform, a blacklist is a useful tool until they can get a better solution in place that understands intent.

@ghost
Copy link

ghost commented Jul 13, 2023

Hard disagree on this. Listen to the people who are saying that these words are used in English as slurs. They cause more real harm to disabled people, and POC, than the mild inconvenience of having to find an alternative word. You're playing word games, meanwhile, people are being vilified.

@mletterle
Copy link

mletterle commented Jul 13, 2023

When people are actively being harmed by being called the r-word and a hypothetical non-English train delay notification account having to pick another name (or appeal for an exception) does not cause harm then it makes sense to prioritize the former above the latter. This is basic, simple stuff. Any attempt to discuss otherwise is simply ignorance or if it continues trolling.

@ferranrego
Copy link

ferranrego commented Jul 13, 2023

I completely understand your concern @DrSusieWalsh and I want the better, don't get me wrong. Me myself have been diagnosed as Asperger and I've a degree on social education, I'm familiar with diversity care.

But, sorry, I feel certain linguistic supremacism in these argument (English is not a 1st level language against the others, all are at the same level).

So my idea is that there are better ways to handle this. Because no one wants these words being misused around. Starting by actively analyzing in real time the use of these (edge case) words like negro or retard that can be used as slurs and acting fast to ban and delete accounts.

This is a very important political decision and cannot be taken lightly in my opinion. Because it justifies cultural and linguistic supremacism and it could be the door to the justification of other "unnecessary side effects".

I'm pretty pretty sure we can do better.

@ghost
Copy link

ghost commented Jul 13, 2023

The discussion started because of the English usage of "retard", and right at this moment, on this site, Autistic people are beginning to leave because it is unsafe.
I'm not going to go any further in this conversation with you.

@ferranrego
Copy link

ferranrego commented Jul 13, 2023

Yes, and Bluesky is not only for people from English speaking countries.

But thanks for considering other perspectives and alternative solutions that would fit everyone I guess 🥲

@smenor
Copy link

smenor commented Jul 13, 2023

I just don't believe that this is a serious argument @ferranrego

You're weaponising the idea of cultural / linguistic supremacy but you're also totally cool with the term「 blacklist 」even though someone above pointed out that it's problematic which makes me think that you're kinda full of it

Yeah it's true 「 negro 」 means 「 black 」 and 「 retardo 」 means 「 delay 」 but [ as far as I know ] 「 Black 」 isn't a slur in any language - and if it was then we could disallow it.

「 shabby 」 sounds like「 傻屄 (shǎ bī) 」 in Chinese [ meaning: stupid c- ]

It's not「 cultural supremacy 」by any reasonable definition to say that we can also disallow that and other words that may be racial and ableist slurs in other languages too.

@ferranrego
Copy link

ferranrego commented Jul 13, 2023

Thanks for pointing this out @smenor, I didn't know about the blacklist word controversy, but after reading about it I'll stop using it, as I did with many other words in my personal life.

Regarding the other things you mention, if you read my other comments you'll see that I'm just trying to debate about a better way to handle some edge case words. Because I believe there is a way to do it than just banning common usage words used by more than a billion people by default.

I'm referring specifically to a common use words like "Negro" that is not only used in daily basis by billions but that also is being used for the name of thousands and thousands of towns, cities, neighborhoods, companies, brands, associations, NGOs, cooperative, magazines, lakes, rivers, animals, people's surnames, etc.

Don't get me wrong, I'm totally in favor of actively forbidding hate speech, racism, xenophobia, fascism and harmful language.

What I'm wondering is if we can have a better way to prevent and moderate it for certain cases, because I believe we have the technology and expertise that could allow that. And this could be just complementary to the list of banned words. Different solutions doesn't need to be mutually exclusive.

I want to finish my comment saying that if anyone felt offended by my arguments, I really apologize. I am aware and admit that many of us start from different situations and privileges.

I don't want to underestimate anyone's sensitivity or the suffering that is experienced with certain situations. That is not my intention, on the contrary.

I don't necessarily have to be right, but I think it's good to debate on the subject (always within the minimum of anti-racism, anti-fascism and anti-xenophobia). I really believe that we can come with better ideas and solutions mixing tech and human work that not only pass by a banned words list. Or at least explore them.

@gay-frogs
Copy link

is there any particular reason why "chinaman" was removed?

@nobember11
Copy link

nobember11 commented Jul 13, 2023

@ferranrego
I see your point but your argument does not justify the whole PR since not all of the words removed in this PR have non-pejorative meanings in other languages, and the words added can hold non-pejorative meaning in some other languages. While your argument is somewhat valid, it doesn't have much relevance to this PR.

Can you provide a logical explanation as to why some slurs against Asians were removed BUT some words pertaining to racists were added?

@nobember11
Copy link

nobember11 commented Jul 14, 2023

@ferranrego
In summary, your argument, at best, justifies the removal of certain words. Our main focus here should be on how to deal with the PR, which arbitrarily added and removed words without any explanation. I think only option is reverting it for we should not condone such destructive behavior.

Perhaps you could create a separate issue or PR where we can discuss your proposal for a better solution.

'muzzie',
'n1gga',
'n1gger',
'nazi ',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing space here appears incorrect.

@Scotchester
Copy link

With the release of #1336, I don't expect we'll ever get any clarity on why the removals in this PR were made, or what the true list is going forward. Not exactly confidence-inspiring.

@nobember11
Copy link

nobember11 commented Jul 14, 2023

They've completely removed the word list from the codebase and hidden it within their environmental variables, making it impossible for anyone to correct it.

https://github.com/bluesky-social/atproto/pull/1336/files#diff-c0d7d43fbf517bf9e2ea1a5897a308541e82bdada433d83ee1472546513aca8bR168-R170

@smenor
Copy link

smenor commented Jul 14, 2023

With the release of #1336, I don't expect we'll ever get any clarity on why the removals in this PR were made..

Oh I think they're being plenty clear about it

@nobember11
Copy link

nobember11 commented Jul 15, 2023

@smenor
Oh is it clear to you now? Then could you explain the reasons why some words like j*ps and c*****men were removed and why some words like nazi, kkk, and hitler were added?

@smenor
Copy link

smenor commented Jul 15, 2023

@nobember11 of course it’s completely clear - they DGAF and are doing the absolute bare minimum they can get away with and hiding it in environment variables so there is no oversight

I don’t think they give the slightest fuck about the n-word either but the outcry and appearance was bad enough they wanted to try to put a fig leaf on it

@bnewbold
Copy link
Collaborator Author

The removal of some terms in this commit diff clearly implies that we don't care about harassment of those groups. I pushed this commit, and I am sorry for this situation, both to those groups, and to everybody looking on.

This diff is the result of replacing the publicly contributed list of slurs with an emergency list that we had assembled in parallel. Our list was based on identifying the most urgently reported and burning concern in the moment, which was slurs targeting Black folks, as well as some frequently reported Nazi and antisemitic terms. There are obvious problems with that emergency list, including typos and omissions. This temporary technical measure (a list of exact string matches) has since been replaced with a more complex mitigation that attempts to catch a broader set of intolerant handles. The new system will also surely need to be revised over time.

We do care about intolerance and targeted harassment. That includes both slurs in handles, and in all the other forms that people use for harassment, both foreseen and novel. We have policies and guidelines against intolerance and harassment, and the moderation team takes action every day based on those policies. That does not excuse tone-deaf commit diffs like this one.

I am, as predicted, going to lock this conversation now. Feel free to re-open a Github Discussion if you want to continue talking about this.

@bluesky-social bluesky-social locked as too heated and limited conversation to collaborators Jul 15, 2023
@dholms
Copy link
Collaborator

dholms commented Jul 22, 2023

Wanted to add a brief follow up here as we've had several questions about this PR.

This diff was the result of merging two incomplete word lists, and the final list is more comprehensive. Our current production list includes all slurs from both of the lists in question as well as additional slurs that have come up in other issues/PRs/in-app reports. As is common in abuse management systems, this list has been moved to our private server config to avoid being gamed by bad actors.

We'll continue to add new words over time as they come to our attention.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.