Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename Blacklist and Whitelist to be self explanatory #2778

Merged
merged 7 commits into from
Jun 11, 2020

Conversation

remcomokveld
Copy link
Contributor

@remcomokveld remcomokveld commented Jun 6, 2020

I originally started this exercise because I believe the terms blacklist and whitelist have racial connotations that are inappropriate to be used. But while trying to find new names I found that SuppressedFalsePositive and TemporarySuppressedIssues actually describe much more accurately than the previous values.

This PR would currently be a breaking change because existing baseline files would no longer work. If desired I think it would be possible to build in backward compatibility. However, I think the fact that this is a breaking change is a good thing because it will raise more awareness about the fact that blacklist and whitelist are really terms that we should all stop using.

Edit:
Updated the PR to not be a breaking change anymore. Existing baseline files with <Blacklst> and <Whitelist> tags will still be supported, but newly generated baselines will use SuppressedFalsePositive and TemporarySuppressedIssues respectively.

@codecov
Copy link

codecov bot commented Jun 6, 2020

Codecov Report

Merging #2778 into master will decrease coverage by 0.00%.
The diff coverage is 82.35%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #2778      +/-   ##
============================================
- Coverage     80.53%   80.53%   -0.01%     
- Complexity     2321     2324       +3     
============================================
  Files           386      386              
  Lines          6952     6955       +3     
  Branches       1260     1261       +1     
============================================
+ Hits           5599     5601       +2     
  Misses          725      725              
- Partials        628      629       +1     
Impacted Files Coverage Δ Complexity Δ
...h/detekt/rules/style/DataClassContainsFunctions.kt 94.11% <ø> (ø) 8.00 <0.00> (ø)
...ab/arturbosch/detekt/rules/style/WildcardImport.kt 86.66% <ø> (ø) 5.00 <0.00> (ø)
...arturbosch/detekt/core/baseline/BaselineHandler.kt 72.22% <62.50%> (-4.25%) 15.00 <1.00> (+1.00) ⬇️
...gitlab/arturbosch/detekt/core/baseline/Baseline.kt 75.00% <100.00%> (+8.33%) 6.00 <6.00> (+2.00)
.../arturbosch/detekt/core/baseline/BaselineFacade.kt 77.77% <100.00%> (ø) 6.00 <0.00> (ø)
.../arturbosch/detekt/core/baseline/BaselineFormat.kt 83.33% <100.00%> (ø) 4.00 <0.00> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d207bd3...6280a4f. Read the comment docs.

Copy link
Member

@schalkms schalkms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the time to participate in an open source project!
I'm afraid, but I have to reject this PR for the following reasons.

With these changes, the user would need to migrate existing baseline files. This requires effort and hence costs time. You would need to explain this to the user. I see the newly created issues already coming.
Since detekt is also no small project anymore and the baseline feature is used quite often, this change affects a bigger group of users.

@ZacSweers
Copy link
Contributor

I would encourage you to reconsider!

There is wider precedent for this, ietf has a great write-up here: https://tools.ietf.org/id/draft-knodel-terminology-00.html#rfc.section.1.2

Also:
1 - blacklist/whitelist can have racial connotations with "white good black bad" semantics
2 - they have no meaning to non-native speakers, as they are english-only colloquialisms

@schalkms
Copy link
Member

schalkms commented Jun 6, 2020

I understand the potential connotations of these words. I explicitly didn't negate that.

I said that changes like this have an impact on users, clients and 3rd party contributors with custom domain-specific detekt extensions. I think that compatibility and long term support are very important for static code analyzers in order to foster user acceptance. Don't forget the users in this process.

@remcomokveld
Copy link
Contributor Author

Thanks for the feedback. As mentioned in the OP it is quite easy to add backward compatibility. I just pushed a change that does that.
Existing baseline files that contain the XML tags Whitelist and Blacklist are still supported and will be parsed properly while the new values are also supported and the tasks to write the baseline uses the new tags.

@remcomokveld
Copy link
Contributor Author

In the docs it says this

The intention of a whitelist is that only new code smells are printed on further analysis. The blacklist can be used to write down false positive detections (instead of suppressing them and pollute your code base).

I agree that TemporarySuppressedIssues might not be the right term, but the goal of those is to temporarily suppress detekt issues that are found when first adding detekt. Those issues should be considered tech debt and eventually cleaned up, hence temporary.

This actually show that blacklist and whitelist might were bad names to begin with. a whitelist and a blacklist are opposites, one is traditionally used to include things and the other to exclude. In Detekt they are functionally doing the same thing.

@schalkms
Copy link
Member

schalkms commented Jun 6, 2020

@remcomokveld thanks for pointing that out. What do you think of allow and deny list? Do you have other terms/words in your mind?

@arturbosch
Copy link
Member

arturbosch commented Jun 6, 2020

Hi,

thanks for the discussion here.
My first thought was "good change for a 2.0 release" because of breaking change
but implementing this in a backwards compatible way should be safe for a minor release.

@schalkms @remcomokveld @ZacSweers the NCSC uses the terms allow and deny lists as @schalkms mentioned - https://inews.co.uk/news/government-cyber-experts-blacklist-whitelist-racism-fears-2840970

I like @remcomokveld names as they are describtive and capture my initial intent of the baseline sections.

Open question:
Should we automatically migrate the baselines to the new names or if the user used the old names it should stay old until 2.0 so this does not cause an additional git commit for them?

@remcomokveld
Copy link
Contributor Author

They would both be allow because in both lists, if an issue is added there it will allow detekt to succeed even though the issue is there. The only difference between the two lists is communicating reason to developers why the issues are ignored.

The only place where these lists are read is here

fun contains(id: FindingId): Boolean = whitelist.contains(id) || blacklist.contains(id)

@schalkms
Copy link
Member

schalkms commented Jun 6, 2020

Should we automatically migrate the baselines to the new names or if the user used the old names it should stay old until 2.0 so this does not cause an additional git commit for them?

I vote for option 2 here.

@arturbosch arturbosch added this to the 1.10.0 milestone Jun 6, 2020
@arturbosch arturbosch added the migration Marker to add a migration step in the changelog label Jun 6, 2020
@remcomokveld
Copy link
Contributor Author

I agree with option 2 as well. The only thing that might be nice to add to this is a warning message when deprecated <Whitelist> and <Blacklist> tags are being used. That would give users more of a heads up that a breaking change will happen in 2.0 and already start migrating.

@cortinico
Copy link
Member

cortinico commented Jun 6, 2020

Thanks for the change @remcomokveld

I also think that this is the opportunity to remove some of the confusion around the current baseline, and be more inclusive. As you already mentioned, <Whitelist> and <Blacklist> are "opposite" terms while they are both used to let detekt ignore a specific issue report.

Given that when running detektBaseline the former <Whitelist> is populated, maybe we can rename them to something like:

  • Whitelist -> CurrentIssues or DiscoveredIssue to convey the meaning that those issue have been discovered by Detekt and stored inside the baseline. This will suggest the user that they should slowly clean the CurrentIssue list.

  • Blacklist -> ManuallySuppressedIssues to convey the meaning that those issues have been intentionally suppressed by the user.

I would argue against using Temporary in any of the two as having time related terminology here is not a great fit (also there is nothing more permanent than a temporary solution).

As for the SuppressedFalsePositive (aka the Whitelist), technically you could use it to suppress a failure that is not a detekt false positive. The rule might just behave correctly but your specific code imposes you to suppress it for some reason, and I don't consider that a false positive.

@arturbosch
Copy link
Member

Thanks for the change @remcomokveld

I also think that this is the opportunity to remove some of the confusion around the current baseline, and be more inclusive. As you already mentioned, <Whitelist> and <Blacklist> are "opposite" terms while they are both used to let detekt ignore a specific issue report.

Given that when running detektBaseline the former <Whitelist> is populated, maybe we can rename them to something like:

  • Whitelist -> CurrentIssues or DiscoveredIssue to convey the meaning that those issue have been discovered by Detekt and stored inside the baseline. This will suggest the user that they should slowly clean the CurrentIssue list.
  • Blacklist -> ManuallySuppressedIssues to convey the meaning that those issues have been intentionally suppressed by the user.

I would argue against using Temporary in any of the two as having time related terminology here is not a great fit (also there is nothing more permanent than a temporary solution).

As for the SuppressedFalsePositive (aka the Whitelist), technically you could use it to suppress a failure that is not a detekt false positive. The rule might just behave correctly but your specific code imposes you to suppress it for some reason, and I don't consider that a false positive.

I like all three names. Lets go with CurrentIssues and ManuallySuppressedIssues?
And when we are at renaming maybe change SmellBaseline to just Baseline (ofc can be another PR)?

@BraisGabin
Copy link
Member

BraisGabin commented Jun 6, 2020

Should we automatically migrate the baselines to the new names or if the user used the old names it should stay old until 2.0 so this does not cause an additional git commit for them?

I vote for option 2 here.

I vote option 1. We will generate a bit of noise in git but all those users will have their baselines migrated so once they move to 2.0 it will just work.

And thanks for the PR!

@arturbosch
Copy link
Member

Should we automatically migrate the baselines to the new names or if the user used the old names it should stay old until 2.0 so this does not cause an additional git commit for them?

I vote for option 2 here.

I vote option 1. We will generate a bit of noise in git but all those users will have their baselines migrated so once they move to 2.0 it will just work.

And thanks for the PR!

Hm, well to be honest. If the user updates the detekt version he already has to create one commit.
Including baseline tag renamings should be fine.
As I now realize this will only happen for gradle detektBaseline not gradle detekt so it is okay.

@remcomokveld remcomokveld force-pushed the rename-whitelist-blacklist branch 2 times, most recently from 24e59fb to 3e6d66a Compare June 6, 2020 19:28
Copy link
Member

@cortinico cortinico left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good on my end, thanks again for doing this ✊🏾

@remcomokveld
Copy link
Contributor Author

@arturbosch any clue what is causing the checks to fail now? I'm not seeing what exactly went wrong

</Blacklist>
<Whitelist>
</ManuallySuppressedIssues>
<CurrentIssues>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what means "CurrentIssues". Could you please find another name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. The context is ambiguous here. I might have some other "current" issues, which are not in the baseline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CurrentIssues without any context would indeed be confusing. In the context of a baseline file, however, I think that the name captures exactly what is intended. If you have a better name I'm open to suggestions though

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm too late to the party, but I also think CurrentIssues is a very strange name, there was a previous comment to "avoid the concept of time", but "current" is all about time.

I think the relationship should be reinstated between the two type of ignore, that way you can infer the meaning of one by understanding the other.

Perhaps these'll be changed in the future or perhaps never again, here's my input anyway:

BaselineSuppressedIssues
ManuallySuppressedIssues

- blacklist code smells which are false positives
- generate a baseline whitelist of code smells if your project is already old and has many smells, so only new
- suppress code smells which are false positives
- generate a baseline of code smells if your project is already old and has many smells, so only new

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed world. baseline of what?

Copy link

@rooksoto rooksoto Jun 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider:
- generate a baseline list of code smells if your project is already old and has many smells, so only new

docs/pages/documentation/style.md Outdated Show resolved Hide resolved
@dimsuz
Copy link
Contributor

dimsuz commented Jun 8, 2020

I said that changes like this have an impact on users, clients and 3rd party contributors with custom domain-specific detekt extensions. I think that compatibility and long term support are very important for static code analyzers in order to foster user acceptance. Don't forget the users in this process.

I second this. If tomorrow will bring another set of words considered inappropriate in the current momentum, would this mean all project should ignore their users and just blindly go on with whatever is demanded?

Here's a good answer from an opensource project maintainer on a similar issue.

@ZacSweers
Copy link
Contributor

ZacSweers commented Jun 8, 2020

@dimsuz I recognize the point you're trying to make regarding churn and making invasive changes for what you feel may be trivial reasons, and I appreciate your effort to engage in this honestly and from a position of pragmatism. If this were a significant breaking change, it would be valid to suggest waiting until a major release to change. If these came frequently, it could be valid to suggest being tactical about what changes are meaningful. In this specific PR however, there are no breaking changes. It is a free swing at doing something that is meaningful to a nontrivial number of users that do face endemic terminology like this every day.

If social progress were as slippery of a slope as you're suggesting, we'd be in a much better place today than we are. That is not the case though, not yet. In light of that, we should aspire to embrace opportunities for progress in all its forms, whenever and if ever they arrive on our doorstep.

@bendb-instacart
Copy link

bendb-instacart commented Jun 8, 2020

Pretty understandable from professional point of view.

I think it's been demonstrated in this thread that the meanings of blacklist and whitelist are not clear in detekt, and that the proposed names of "ManuallySuppressedIssues" and "CurrentIssues" capture the meanings more clearly.

Copy link

@JohannesPtaszyk JohannesPtaszyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fully supporting this change! Code looks good from my side.

@dimsuz
Copy link
Contributor

dimsuz commented Jun 8, 2020

If everybody feels that sensitive to words they write once and forget and it's not a major breaking change for users then at least we need to find a good replacements. As I've already noted "CurrentIssues" is not a suitable one imo. I'm currently failing to come up with a good one though, maybe someone would?

By the way, are you sure that these are non-breaking changes? If I have a baseline.xml and update to a newer detekt version, it seems like old tag names won't be picked up anymore.

@remcomokveld
Copy link
Contributor Author

In the parsing logic the old and the new tags are treated as synonyms.

// Blacklist and Whitelist were previous XML tags. They have been replaced by more appropriate names
// To not break anything this will still parse those values
MANUALLY_SUPPRESSED_ISSUES, "Blacklist" -> {

There was also a test added which verifies that the old baseline format still parses correctly here
it("supports deprecated baseline values") {
val path = resourceAsPath("/baseline_feature/deprecated-baseline.xml")
val (manuallySuppressedIssues, currentIssues) = BaselineFormat().read(path)
assertThat(manuallySuppressedIssues).hasSize(2)
assertThat(manuallySuppressedIssues).anySatisfy { it.startsWith("LongParameterList") }
assertThat(manuallySuppressedIssues).anySatisfy { it.startsWith("LongMethod") }
assertThat(currentIssues).hasSize(1)
assertThat(currentIssues).anySatisfy { it.startsWith("FeatureEnvy") }
}

@jeksor
Copy link

jeksor commented Jun 9, 2020

What about blockList, allowList?

@BraisGabin
Copy link
Member

What about blockList, allowList?

Both lists are "allowList". The only difference between them is that the values in CurrentIssues are discarted and generated again each time you run --create-baseline and the ones in ManuallySuppressedIssues are keeped.

@schalkms
Copy link
Member

schalkms commented Jun 9, 2020

Based on the reactions this seems to be a controversial change.
What can we do about that? Which improvements do people suggest? What does the community think?

Copy link
Member

@arturbosch arturbosch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my two comments and sorry for requesting changes again.

@arturbosch
Copy link
Member

arturbosch commented Jun 9, 2020

Based on the reactions this seems to be a controversial change.
What can we do about that? Which improvements do people suggest? What does the community think?

Judging by the amount of comments and emojis I see a clear trend towards changing these tags.
Imo we now have much better descriptive names as before (also when looking back on many questions I got about the difference between what to write where in the baseline file) while being backward compatible.

I want to prepare a 1.10.0-RC1 release tonight. So if most are happy with CurrentIssues, we can ship this PR too.

@detekt detekt deleted a comment Jun 9, 2020
@arturbosch
Copy link
Member

@schalkms anything else you see missing in this PR to get it merged?

@schalkms
Copy link
Member

There's a merge conflict. Documentation has been updated as far as I can see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
migration Marker to add a migration step in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.