Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonInclusiveLanguage sniff #59

Open
jrfnl opened this issue Jun 9, 2020 · 49 comments
Open

NonInclusiveLanguage sniff #59

jrfnl opened this issue Jun 9, 2020 · 49 comments

Comments

@jrfnl
Copy link
Collaborator

@jrfnl jrfnl commented Jun 9, 2020

A sniff to examine code and comments for the use of non-inclusive language and throw a warning when found.

Specifically, the sniff should look for sexist, racist, ablist or ethnocentric language, which can contribute to a hostile work environment.

Initial word list

Search for Alternatives to suggest Notes
whitelist, blacklist allowlist/safelist/acceptlist, denylist/blocklist/rejectlist
master, slave primary/main, secondary/replica
he, she, him, her, his, himself, herself they, them, their, themself may need to limit this search to comments
crazy peculiar, baffling
dummy placeholder

Input requested and very welcome !!!

Particularly on:

  • additional words/terms to look for;
  • what should be the preferred alternatives.

What to examine:

Search for these in:

  • Comments and docblocks.
  • Variable names
  • Constant names
  • Namespace names
  • Class names
  • Function names

For constructs, report on these only when the construct is declared, not when used, as usage cannot be changed until the declaration has been changed.

Additional notes:

  • The sniff should be aware of variants of words, i.e. $white_list, whitelisting etc.
  • The regexes used should be careful not to match too much, i.e. sheer should not match she, while master should not match mastering.

External references:

@joemcgill
Copy link

@joemcgill joemcgill commented Jun 9, 2020

This is a great initiative. Thank you for taking it up! I've found this list of disability terms with negative connotations a helpful resource.

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 9, 2020

@joemcgill Thanks, though the credit should also go to @jdevalk.

Thanks for the link. I've had a look through the list, but there are only a few words there which I can imagine people would ever use in a code-context, but maybe I'm wrong ? Please tell me if I am !

The only ones which sprung out at me from that list (other than those already listed above) were:

  • "invalid" - the noun is the offensive word though, while in code, this is usually used as an adjective and I would not be able to distinguish between the two.
  • "disabled" - similar situation.
  • "dim" - similar situation, think "dimming the screen" which is quite different from calling a person "dim".

And possibly

  • "blind" - as in "double-blind testing"

Specific words to search for with suggestions for alternatives are most helpful to get this off the ground.

@vavroom would you care to comment ?

@vavroom
Copy link

@vavroom vavroom commented Jun 9, 2020

Very happy y'all are looking at this kind of thing :)

On the term disabled, I wouldn't be too worried. While for a very long time there's been a push to use "person with a disability", instead of "disabled", there's also been a massive push for using just disabled, by disabled folks. If you look on twitter for the #SayTheWord hashtag, you'll get a feel for it.

Also, the idea of ableist language is using a medical/disability related word in a negative context. A disabled button is pushing that envelope a bit. I'd not be too worried about it. Then again, I would hesitate to use disabled buttons but that's a story for another day :D

@jrfnl points out correctly that dimming the screen is very different from calling someone dim. Again, I wouldn't worry about it.

I'd be curious to know what blind folks think of "double-blind testing". I personally don't view it as objectionable, but then I'm not the target market of that kind of potentially ableist language.

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 9, 2020

@vavroom Thanks for taking the time to give feedback. Much appreciated.

@maccath
Copy link

@maccath maccath commented Jun 9, 2020

This is a great initiative!

I was also wondering about the terms disabled/enabled.

It feels unnecessary when there are terms like inactive/off/deactivated/restricted that do the job just as well... But I'm not disabled, so I don't think I can speak with any authority. Thanks for your input @vavroom

@tomjn
Copy link

@tomjn tomjn commented Jun 9, 2020

I think there should be room to add words that don't have suggested replacements, brazenly outright innapropriate words, such as the N word, or other derogatory terms, such as calling people with downs syndrome the M word, or the P word.

@ChrisWiegman
Copy link

@ChrisWiegman ChrisWiegman commented Jun 9, 2020

Really glad to see this. Thank you!

@benlk
Copy link

@benlk benlk commented Jun 9, 2020

This is a good idea!

  • disabled is an HTML attribute, CSS selector, and property for elements in the DOM
  • invalid is a CSS selector

In code there is no alternative to that string, much like when spellcheck trips over referer. Even limiting the scope of the sniff for these terms to comments will probably cause a tiring number of false positives.

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 9, 2020

@tomjn Good idea and those words which really shouldn't be used, should probably be an error.
I'd be very surprised to ever come across those in code in the first place, but you're right: may as well check for them.

Is it ok if I approach you privately to verify that I interpret the letters you mention correctly ? Or ping you to review the sniff to make sure I have added the right ones ?

@tomjn
Copy link

@tomjn tomjn commented Jun 9, 2020

Sure, but it's hardly an exhaustive list, and the P/M words might be more used in the UK than internationally. Happy to review

@benlk
Copy link

@benlk benlk commented Jun 9, 2020

If the sniffer is going to sniff for a list of naughty words, https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words aims to have comprehensive lists.

@maccath
Copy link

@maccath maccath commented Jun 10, 2020

I think there's a difference between 'naughty' words and exclusionary language. For example, I can see a bunch of anatomical and sex related words on those lists which aren't necessarily used in a demeaning and derogatory way; it's less clear cut - and could end up being exclusionary in and of itself.

@jdevalk
Copy link

@jdevalk jdevalk commented Jun 10, 2020

Honestly we should probably make that sniff a separate issue and just go for the main goal here.

@Jurigag
Copy link

@Jurigag Jurigag commented Jun 10, 2020

I will add my honest opinion that will get a lot of downvotes i guess - people and their behaviors are racist, ethnocentric, abilist etc, not a words themselves used in totally other context.

But yea, if it won't be enabled by default using this package or included in core ever then why not? Someone wants to use then feel free i guess. Otherwise idea is pretty cool because i understand that whitelist/blacklist is not really great naming, allowlist/denylist is much more self explanatory - but creating it as NonInclusiveLanguage and due to racist is just wrong - because those words are not racist themselves, people using them in wrong context are.

You could combine word black with many other nouns - which if said in wrong context can be racist and offended as well, not only blacklist. If we are really going this way we should ban whole black word with combination of anything else, just to make sure that's its bulletproof for future.

At this moment i am offended by this issue and description of it because of this:

Specifically, the sniff should look for sexist, racist, ablist or ethnocentric language, which can contribute to a hostile work environment.

So you are telling that if i currently use blacklist/whitelist i am racist and you suggest that i can have hostile work environment? Idea for this sniff is great, but explanation why it's needed is wrong. This description should be changed honestly.

@vavroom
Copy link

@vavroom vavroom commented Jun 10, 2020

@Jurigag The thing with racist words, or ableist words, is that it's about the people who are on the receiving end of those words. For example, I run https://ableist.is, a site to help make people aware of their own ableist language. I sometimes point people to that site. I am regularly told things like "I didn't mean crazy in a bad way, I'm not ableist". And it doesn't matter at all what they meant. What matters is that there are a lot of people for whom that word evokes really bad stuff.

The fact that you are offended that the language used in projects could create hostile work environments indicates that you are not likely part of one of the groups that are routinely discriminated against.

The words you use, the actions you take, do not mean that you are racist (or sexist, or ableist). But it's not about you. It's about people that these words hurt. With all due respect, your intentions mean very little. I did not intend to drop a glass but I did and my wife stepped on broken glass bare foot and got hurt. My intentions there mean very little. End result is my wife got hurt. It's a similar thing with racist words, or ableist words, or other words.

And when words like that are used in projects, people may feel, consciously or not, that it creates a hostile environment. Everytime I hear people use words like "cripple", "lame", "crazy", it feels like yet another micro-aggression. I deal with these things several times a day, every day. Each instance isn't particularly bad. Just like one paper cut isn't particularly bad. But if you add them up at the end of the day, the week, the month, the year, it takes its toll.

Check your privilege.

@Jurigag
Copy link

@Jurigag Jurigag commented Jun 10, 2020

How blacklist is hurting anyone? The origins of this word are not related to skin colors/race, we as people recently added this racist implications to it.

Then why not ban whole black word? We can figure out many combinations with other nouns which can make black people feel offended.

@maccath
Copy link

@maccath maccath commented Jun 10, 2020

At this moment i am offended by this issue and description of it

Nobody said you were racist/sexist/ableist; we said the language is.

You've been informed and you have a choice.

Make of that what you will.

@Jurigag
Copy link

@Jurigag Jurigag commented Jun 10, 2020

At this moment i am offended by this issue and description of it

Nobody said you were racist/sexist/ableist; we said the language is.

You've been informed and you have a choice.

Make of that what you will.

Yea you said that language is, but also my work environment can be hostile due to those words and that's why i feel offended.

@maccath
Copy link

@maccath maccath commented Jun 10, 2020

my work environment can be hostile due to those words and that's why i feel offended.

So don't use them?

@tomjn
Copy link

@tomjn tomjn commented Jun 10, 2020

@Jurigag
Copy link

@Jurigag Jurigag commented Jun 10, 2020

my work environment can be hostile due to those words and that's why i feel offended.

So don't use them?

Why? Those words are not racist for me in the context i use and i will keep using them. Words are not racist, people using them and their behavior to offend other people is.

This issue is concerned with the implementation and details, it isn’t the venue to air personal political opinions, let’s keep the issue focused, constructive, and move forward.

I agree, then change description about things like hostile work environment or that those words without any context are racist. First post here has already air personal political opinions, that's why i am concern about this, i agree about the idea, but i feel this is yet again some kind of attack to other people like hey, you are racist or you may have hostile work environment if you use those words currently

@vavroom
Copy link

@vavroom vavroom commented Jun 10, 2020

Why? Those words are not racist for me in the context i use and i will keep using them. Words are not racist, people using them and their behavior to offend other people is.

I can only repeat what I said earlier. It's not about you. It's not about your intentions. It's about how people can react to these words.

Using racist or ableist words don't necessarily make you racist or ableist. You may not intend to create a hostile work environment. Nobody is saying you are racist. But. It's not about you.

Check your privilege.

@Jurigag
Copy link

@Jurigag Jurigag commented Jun 10, 2020

Why? Those words are not racist for me in the context i use and i will keep using them. Words are not racist, people using them and their behavior to offend other people is.

I can only repeat what I said earlier. It's not about you. It's not about your intentions. It's about how people can react to these words.

Using racist or ableist words don't necessarily make you racist or ableist. You may not intend to create a hostile work environment. Nobody is saying you are racist. But. It's not about you.

Check your privilege.

And i just disagree with this, because this way we will just go to ban whole black word, simple as that. And that's what i also propose if we want to eliminate and racist implications in our code.

This issue and feature would be great - but without racist implications and suggesting that someone has hostile work environment because they use them. There are many other to explain why we should not use blacklist/whitelist in programming, like they are not self explenatory - allowlist/denylist are much more.

@vavroom
Copy link

@vavroom vavroom commented Jun 10, 2020

Check. Your. Privilege.

'nuff said.

@Jurigag
Copy link

@Jurigag Jurigag commented Jun 10, 2020

And i checked, currently there is freedom of speech, and i can use any words i want. And anyone has privilege to it. Ideas like this are trying to remove some words from use/vocabulary and to reduce freedom of speech, because someone suggests that they have racist implications. You still didn't answer why not ban black word.

You are currently saying that no matter what words i use and what i mean - if someone of other color of skin feels offended by it - i am racist. This logic is just wrong.

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 10, 2020

@Jurigag I'm going to ask you kindly to remove yourself from this discussion.

  1. Like @tomjn said, your comments are not adding anything relevant to the issue at hand and can be interpreted as hostile and destructive to the discussion.
  2. Words like blacklist and whitelist are coming from a racist history. They are metaphors where "white" was associated with "good" and "black" with "bad". The fact that you don't intend them to be perceived as racist, doesn't mean they are not. Please do a simple internet search and educate yourself before commenting on these kind of issues again.
    P.S.: and that is something completely different from using the word "black" purely as a colour, which is the literal meaning and if used as such, not a problem.
  3. Even if you don't see it, because frankly that's irrelevant, non-inclusive language is part of the problem and causes micro-aggression on a daily basis, as @vavroom explained far more eloquently.
  4. As has been said before: Check Your Privilege. You say "there is freedom of speech", well that may be the case in your country. You can disagree with this issue, again, it is a privilege that you have the freedom to do so. Please do a simple internet search on privilege and educate yourself.
  5. Nobody is forcing you to use this sniff once it is created.

Please regard this as a formal warning.

@benlk
Copy link

@benlk benlk commented Jun 10, 2020

Is this just for American English, or will there be region-specific sniffs for other countries' dialects of English?

Is there a significant non-English-speaking PHP community that would justify creating a set of sniffs for non-English languages?

The reason I ask these questions is because separating sniffs by region or language may be easiest to implement from the beginning, rather than adding afterwards once people have integrated the first-contributed sniff into their workflow.

@benlk
Copy link

@benlk benlk commented Jun 10, 2020

A downside of region-based or language-based sniffs is that it would result in code duplication across sniffs where different cultures share some noninclusive words or phrasings. To reduce code duplication, would it instead make sense to have separate sniffs for each separate sort of noninclusive language, allowing sniff-runners to choose which noninclusive language sniffs apply to their situation?

As an example, having a sniff for disabled might cause problems for codebases that deal with <input> elements, whereas a codebase that doesn't deal with <input>s might prefer to include that sniff.

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 10, 2020

@benlk Thanks, that's useful input and actually something I have been thinking about, though I haven't taken a decision yet.

My current thoughts are along the following lines:

  • Start with the NonInclusiveLanguage sniff and set it up to allow for multiple languages, though initially it will only contain English.
    As this particular sniff will use regexes, for additional languages I really would need trusted input from people very familiar with those languages to make sure I get those right.
  • By default all languages would be checked, but the sniff would have a configuration option (public property which can be set from the ruleset) to allow for making a selection of which languages apply to a codebase.
  • For edge-cases, people can use the default PHPCS inline ignore comments, i.e. // phpcs:ignore.
  • Once that sniff is up and running, start collecting input for other related sniffs, like a NaughtyWordsSniff based on the link you provided earlier and for instance a sniff which checks only documentation for certain language/phrases which throw up fences for less experienced people, such as mentioned in this tweet: https://twitter.com/derickr/status/1270510702143430663

Code duplication won't be much of an issue as that can be prevented by using an abstract sniff and/or traits for the shared code.
It's one of the reasons this sniff library is build on top of PHPCSUtils which offers a lot of that kind of tooling to make my life easier ;-)

@benlk
Copy link

@benlk benlk commented Jun 10, 2020

You've reminded me of the ignore comments, and while I agree that those work in some situations, I'm not sure that they're the right option for inclusiveness sniffs. Having one monolithic NonInclusiveLanguage sniff implies that there is One True Way™ to do Inclusiveness™.

Including by default a sniff for gendered language would be anti-useful to an organization whose practice of inclusiveness involves gendering people by their desired gender. (Anecdote: As they/them is increasingly used for a third gender role in English-language discourse, the more people I see who reject its indiscriminate application to everyone as a form of mass misgendering, each instead desiring for themself he/him or she/her.) If such an explicitly-gendered organization wanted to add inclusiveness sniffs, requiring them to add comments around all their gender-someone-correctly code would be an obstacle to incorporating all of the other inclusiveness sniffs.

I've already said my piece on cases where a sniff for disabled might or might not be wanted, but the barrier for adopting inclusiveness sniffs is higher for an organization that would need to sprinkle their codebase with comments in order to adopt a monolithic NonInclusiveLanguage sniff.

Separating the inclusiveness sniffs into separate sniffs cuts the Gordian Knot of competing access needs by allowing each sniffer to use the sniffs that suit their community's needs, without requiring defensive commenting against the sniffs that satisfy other communities' needs.

There's precedent for splitting out sniffs: this repo has contradicting sniffs for the PHP short list syntax; it doesn't package a monolithic ListSyntax sniff.

@tomjn
Copy link

@tomjn tomjn commented Jun 11, 2020

I can foresee that adding a choice of pronouns on a user profile would be caught by the sniff as originally proposed. Is there a mitigation that can be applied? Such as restricting to code comments, or testing for conditional structures or pronoun selections in surrounding code?

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 11, 2020

I can foresee that adding a choice of pronouns on a user profile would be caught by the sniff

AFAICS, the sniff as proposed would not be triggered on that as those would be contained in text strings and the proposal does not cover those.

@Ayesh
Copy link

@Ayesh Ayesh commented Jun 11, 2020

  • I would very much prefer primary/secondary terminology because I don't think replica is widely understood in a code/tech context for people who learned English as a second language. It conveys a rather disfunctional meaning that is probably close to "fake".

  • word suggestion: "guys" . I'm trying to avoid using the word "guys". When translated, it's very much non gender neutral in languages like Sinhalese and Indonesian. I would suggest "folks", or "people" there.

  • I Don't know if it's possible to do this in a sniff, but can we look for ['male', 'female'] or m, f in an array and guess that the array probably should contain more options for those who don't identify themselves with neither?

  • For white/black list terms, I think allow/deny is the one that comes very close in terms of meaning, translations, and it is used in other software already, such as Apache and the recent Ruby changes.

@theresenabl
Copy link

@theresenabl theresenabl commented Jun 12, 2020

@jrfnl:

Words like blacklist and whitelist are coming from a racist history. They are metaphors where "white" was associated with "good" and "black" with "bad". The fact that you don't intend them to be perceived as racist, doesn't mean they are not. Please do a simple internet search and educate yourself before commenting on these kind of issues again.

That is simply a lie. Black-and-white dualism has a loooong history of being used as a metaphor. It's traceable to 4th century BC and is connected with the day cycle when after the day comes the dark night. It goes back to "Table of Opposites" by Pythagoras (yes, the same Pythagoras). Like you said, please, next time, educate yourself and don't create your own theories.

As it goes for blacklist/whitelist it is overused in tech world and sometimes there are better alternatives for names that say more for the rest of programmers. Is it a role of CodeSniffer to check on that? I am not sure. A lot is lost without context and I think that sniffer won't ever work properly.

@vavroom
Copy link

@vavroom vavroom commented Jun 12, 2020

That is simply a lie

I have a problem with such accusations. A lie generally implies intent to deceive.

Perhaps the information @jrfnl found isn't accurate - but it is "common knowledge" that is floating around a LOT and I wouldn't blame anyone for accepting it as making sense.

@benlk
Copy link

@benlk benlk commented Jun 12, 2020

It's probably beyond the scope of this sniff to suggest specific alternatives to blacklist/whitelist and master/slave, because what the functions labeled with those words actually do varies by project.

However, a sniff for those words could link to lists of suggested alternatives, such as https://tools.ietf.org/html/draft-knodel-terminology-01#section-1.1.1 for master-slave and https://tools.ietf.org/html/draft-knodel-terminology-01#section-1.2.1 for blacklist-whitelist. Those two links are from a draft RFC that never made it past the draft stage, but I'm sure there are similar yet authoritative lists from standards bodies that can be referenced in the sniffs' messages.

@tomjn
Copy link

@tomjn tomjn commented Jun 12, 2020

Choosing a specific perfect replacement for master/slave or whitelist blacklist shouldn't be necessary, there are lots of alternatives that fit various contexts while remaining inclusive.

@Foxar
Copy link

@Foxar Foxar commented Jun 14, 2020

We should just straight up use 1984's Newspeak to be 100% sure no-one is offended by anything.

@Foxar
Copy link

@Foxar Foxar commented Jun 14, 2020

I thought I lost all faith in humanity, but if the above poster missed my sarcasm , then I lost that one bit of faith in humanity I didn't know I still got.

You people are so paranoid about racism x-phobias and 'oppression' even if we all were Dr. Who identical Cybermen you'd still complain. You stop disenfranchisement of minorities by stopping seeing them as fragile minorities in need of help but as your equal. You stop racism by ignoring race, not making everything about race. And censoring half of the damn language to try and make offensive language impossible is, ironically, impossible itself.

TL;DR Chill the *** out everyone.

@tomjn
Copy link

@tomjn tomjn commented Jun 14, 2020

@Foxar
Copy link

@Foxar Foxar commented Jun 14, 2020

Is disagreeing with the change altogether constructive and respectful?

@jrfnl
Copy link
Collaborator Author

@jrfnl jrfnl commented Jun 14, 2020

Is disagreeing with the change altogether constructive and respectful?

@Foxar No, that's not helpful and outside of the scope of this issue.

Nobody is forcing you or anyone else to use this sniff once it is created, so if you disagree with the principle of it, just don't use it. It's as simple as that.
You don't need to tell us, you don't need to add a comment to this discussion. Just don't use it.

This issue is open to allow people who are interested in using this sniff to voice their opinion about the proposed implementation, nothing else.

@jdevalk
Copy link

@jdevalk jdevalk commented Jun 14, 2020

Saw passlist / stoplist as suggestions here:

https://twitter.com/dan_abramov/status/1272242325029257223?s=21

@JapanYoshi
Copy link

@JapanYoshi JapanYoshi commented Jun 16, 2020

Alternative for "crazy": "weird"?

@vavroom
Copy link

@vavroom vavroom commented Jun 16, 2020

Alternative for "crazy": "weird"?

@JapanYoshi Yes, weird could be a good alternative. Depending on context, wild could also be used. There are several possibilities.

@zlodes
Copy link

@zlodes zlodes commented Jun 22, 2020

I want to believe that this is a joke... ☹️

@Big-Shark
Copy link

@Big-Shark Big-Shark commented Jun 22, 2020

@Jurigag I'm going to ask you kindly to remove yourself from this discussion.

Democracy and freedom of speech are not welcome here.

@lsmith77
Copy link

@lsmith77 lsmith77 commented Jun 24, 2020

for word lists have a look at https://alexjs.com/

@lsmith77
Copy link

@lsmith77 lsmith77 commented Jun 24, 2020

also just removing belitteling word https://github.com/OskarStark/doctor-rst/blob/master/src/Rule/BeKindToNewcomers.php

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests