Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Analyzer] Recommend string.ToLower{Invariant}().Contains/IndexOf/StartsWith("lowercasestring") be replaced by case-insensitive comparison #78606

Closed
stephentoub opened this issue Nov 20, 2022 · 14 comments · Fixed by dotnet/roslyn-analyzers#6662
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer in-pr There is an active PR which will close this issue when it is merged partner-impact This issue impacts a partner who needs to be kept updated
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Nov 20, 2022

There's a significant amount of code that's apparently trying to perform case-insensitive comparisons by first lower-casing and then comparing against a lowercased string, e.g.
https://grep.app/search?q=.ToLower%28%29.Contains&filter[lang][0]=C%23
https://grep.app/search?q=.ToLower%28%29.IndexOf&filter[lang][0]=C%23
https://grep.app/search?q=.ToLower%28%29.StartsWith&filter[lang][0]=C%23
Less but still many for ToUpper with upper-case, e.g.
https://grep.app/search?q=.ToUpper%28%29.Contains&filter[lang][0]=C%23
https://grep.app/search?q=.ToUpper%28%29.IndexOf&filter[lang][0]=C%23
https://grep.app/search?q=.ToUpper%28%29.StartsWith&filter[lang][0]=C%23
and with the Invariant variations, e.g.
https://grep.app/search?q=.ToLowerInvariant%28%29.Contains&filter[lang][0]=C%23

We should write an analyzer that flags these patterns and recommends comparisons using a StringComparison instead.

Such analyzers will help avoid the allocations caused by ToLower/ToUpper calls.

Performance rules Category
Severity = suggestion

@stephentoub stephentoub added code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer labels Nov 20, 2022
@stephentoub stephentoub added this to the 8.0.0 milestone Nov 20, 2022
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@stephentoub stephentoub changed the title [Analyzer] Recommend string.ToLower{Invariant}().Contains("lowercasestring") be replaced by case-insensitive comparison [Analyzer] Recommend string.ToLower{Invariant}().Contains/IndexOf/StartsWith("lowercasestring") be replaced by case-insensitive comparison Nov 20, 2022
@MichalPetryka
Copy link
Contributor

MichalPetryka commented Nov 20, 2022

Should the invariant variants be also changed? Ignore this, wanted to post it in #78607.

@ghost
Copy link

ghost commented Nov 20, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

There's a significant amount of code that's apparently trying to perform case-insensitive comparisons by first lower-casing and then comparing against a lowercased string, e.g.
https://grep.app/search?q=.ToLower%28%29.Contains&filter[lang][0]=C%23
https://grep.app/search?q=.ToLower%28%29.IndexOf&filter[lang][0]=C%23
https://grep.app/search?q=.ToLower%28%29.StartsWith&filter[lang][0]=C%23
Less but still many for ToUpper with upper-case, e.g.
https://grep.app/search?q=.ToUpper%28%29.Contains&filter[lang][0]=C%23
https://grep.app/search?q=.ToUpper%28%29.IndexOf&filter[lang][0]=C%23
https://grep.app/search?q=.ToUpper%28%29.StartsWith&filter[lang][0]=C%23
and with the Invariant variations, e.g.
https://grep.app/search?q=.ToLowerInvariant%28%29.Contains&filter[lang][0]=C%23

We should write an analyzer that flags these patterns and recommends comparisons using a StringComparer instead.

Author: stephentoub
Assignees: -
Labels:

area-System.Runtime, code-analyzer, code-fixer

Milestone: 8.0.0

@stephentoub stephentoub added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Nov 21, 2022
@tarekgh tarekgh added area-System.Globalization and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Nov 21, 2022
@ghost
Copy link

ghost commented Nov 21, 2022

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

There's a significant amount of code that's apparently trying to perform case-insensitive comparisons by first lower-casing and then comparing against a lowercased string, e.g.
https://grep.app/search?q=.ToLower%28%29.Contains&filter[lang][0]=C%23
https://grep.app/search?q=.ToLower%28%29.IndexOf&filter[lang][0]=C%23
https://grep.app/search?q=.ToLower%28%29.StartsWith&filter[lang][0]=C%23
Less but still many for ToUpper with upper-case, e.g.
https://grep.app/search?q=.ToUpper%28%29.Contains&filter[lang][0]=C%23
https://grep.app/search?q=.ToUpper%28%29.IndexOf&filter[lang][0]=C%23
https://grep.app/search?q=.ToUpper%28%29.StartsWith&filter[lang][0]=C%23
and with the Invariant variations, e.g.
https://grep.app/search?q=.ToLowerInvariant%28%29.Contains&filter[lang][0]=C%23

We should write an analyzer that flags these patterns and recommends comparisons using a StringComparer instead.

Such analyzers will help avoid the allocations caused by ToLower/ToUpper calls.

Performance rules Category
Severity = suggestion

Author: stephentoub
Assignees: -
Labels:

area-System.Globalization, area-System.Runtime, code-analyzer, code-fixer

Milestone: 8.0.0

@tarekgh tarekgh added api-suggestion Early API idea and discussion, it is NOT ready for implementation and removed area-System.Runtime labels Nov 21, 2022
@tarekgh tarekgh added blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Nov 21, 2022
@tarekgh
Copy link
Member

tarekgh commented Nov 21, 2022

CC @Youssef1313

@jeffhandley jeffhandley added the partner-impact This issue impacts a partner who needs to be kept updated label Nov 21, 2022
@bartonjs
Copy link
Member

bartonjs commented Nov 22, 2022

Looks good as proposed. All of these cases should ideally use the same diagnostic ID, but if the implementation of the fixer necessitates multiple then that is what it is.

In addition to a.ToLower().Contains(...), it should also identify a.Contains(b.ToLower()) (et al)

It further includes a.CompareTo(b.ToLower{Invariant}()) becoming StringComparer.{Invariant}IgnoreCase.Compare(a, b).

Category:Performance
Severity:Suggestion

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Nov 22, 2022
@buyaa-n buyaa-n added the help wanted [up-for-grabs] Good issue for external contributors label May 4, 2023
@carlossanlop carlossanlop self-assigned this May 26, 2023
@carlossanlop
Copy link
Member

carlossanlop commented May 26, 2023

I'll work on this.

Question: Should we always recommend the same StringComparison value (for example, InvariantCultureIngoreCase), or would it be variable depending on the context of the user's environment? If the answer is the latter, where can I read about the different conditions that would change the suggested StringComparison value? Does it depend on using an "*Invariant" method? @tarekgh @stephentoub

Adding a quick table for myself, for easier reference:

Actual Suggestion
str.ToLower().Contains(substr)
str.ToUpper().Contains(substr)
str.Contains(substr.ToLower())
str.Contains(substr.ToLower())
str.Contains(substr, StringComparison.CurrentCultureIgnoreCase)
str.ToLowerInvariant().Contains(substr)
str.ToUpperInvariant().Contains(substr)
str.Contains(substr.ToLowerInvariant())
str.Contains(substr.ToUpperInvariant())
str.Contains(substr, StringComparison.InvariantCultureIgnoreCase)
str.ToLower().IndexOf(substr)
str.ToUpper().IndexOf(substr)
str.IndexOf(substr.ToLower())
str.IndexOf(substr.ToUpper())
str.IndexOf(substr, StringComparison.CurrentCultureIgnoreCase)
str.ToLowerInvariant().IndexOf(substr)
str.ToUpperInvariant().IndexOf(substr)
str.IndexOf(substr.ToLowerInvariant())
str.IndexOf(substr.ToUpperInvariant())
str.IndexOf(substr, StringComparison.InvariantCultureIgnoreCase)
str.ToLower().StartsWith(substr)
str.ToUpper().StartsWith(substr)
str.StartsWith(substr.ToLower())
str.StartsWith(substr.ToUpper())
str.StartsWith(substr, StringComparison.CurrentCultureIgnoreCase)
str.ToLowerInvariant().StartsWith(substr)
str.ToUpperInvariant().StartsWith(substr)
str.StartsWith(substr.ToLowerInvariant())
str.StartsWith(substr.ToUpperInvariant())
str.StartsWith(substr, StringComparison.InvariantCultureIgnoreCase)
a.CompareTo(b.ToLower())
a.ToLower().CompareTo(b)
a.CompareTo(b.ToUpper())
a.ToLower().CompareTo(b)
StringComparer.CurrentCultureIgnoreCase.Compare(a, b);
a.CompareTo(b.ToLowerInvariant())
a.ToLowerInvariant().CompareTo(b)
a.CompareTo(b.ToUpperInvariant())
a.ToLowerInvariant().CompareTo(b)
StringComparer.InvariantCultureIgnoreCase.Compare(a, b);

@tarekgh
Copy link
Member

tarekgh commented May 26, 2023

@carlossanlop ToLower()/ToUpper() you need to use StringComparison.CurrentCultureIgnoreCase. For ToLowerInvariant()/ToUpperInvariant() need to use StringComparison.InvariantCultureIgnoreCase`

@carlossanlop
Copy link
Member

Thank @tarekgh.

Another question: For the ToLower/ToUpper methods that take a CultureInfo argument, how does the passed culture translate to a StringComparison? Or should I ignore it and keep using the rules you mentioned in your previous reply, like this?:

str.ToLower(System.Globalization.CultureInfo.CurrentCulture).StartsWith(substr); => str.StartsWith(substr, StringComparison.CurrentCultureIgnoreCase)

str.ToLowerInvariant(System.Globalization.CultureInfo.CurrentCulture).Contains(substr); => str.Contains(substr, StringComparison.InvariantCultureIgnoreCase)

The Contains/StartsWith/IndexOf methods don't seem to have an overload that takes a CultureInfo object, so I'm not sure if those should be flagged.

@tarekgh
Copy link
Member

tarekgh commented May 27, 2023

This analyzer approval did not include the ToUpper/ToLower overloads that takes culture info. But it is possible we can include these cases too. Can use CultureInfo.CompareInfo.IsPrefix/IndexOf and pass the option CompareOptions.IgnoreCase.

On a separate note, there is a comment at #78606 (comment) containing instructions that we should carefully consider, in case you haven't already reviewed it.

CC @stephentoub

@carlossanlop
Copy link
Member

carlossanlop commented May 27, 2023

there is a comment at #78606 (comment) containing instructions that we should carefully consider

Yep. I included them in my table.

And you're right, they were not mentioned in the approval. By default I will ignore the CultureInfo overloads, considering they would add much more complexity to the analyzer. I'll only address them if told to do so.

@buyaa-n buyaa-n removed the help wanted [up-for-grabs] Good issue for external contributors label Jun 1, 2023
@carlossanlop
Copy link
Member

@tarekgh The a.ToLower{Invariant}()CompareTo(b) method returns a different value than StringComparer.XCultureIgnoreCase.Compare(a, b):

using System;

string a = "a";
string b = "A";

Console.WriteLine(a.ToLower().CompareTo(b)); // -1
Console.WriteLine(StringComparer.CurrentCultureIgnoreCase.Compare(a, b)); // 0

Console.WriteLine(a.ToLowerInvariant().CompareTo(b)); // -1
Console.WriteLine(StringComparer.InvariantCultureIgnoreCase.Compare(a, b)); // 0

Do we need to change the way the fix behaves for this case? Or should we skip this diagnose altogether?

@tarekgh
Copy link
Member

tarekgh commented Jun 1, 2023

Usually, users who are doing something like a.ToLower().CompareTo(b) they always have b in the lowercase form to compare against. But I see this is not guaranteed. I am still seeing the diagnostics is useful, but we may not provide a fixer for this case.

@stephentoub what do you think about this case?

@mavasani
Copy link

Should we consider also detecting simple patterns across statements, such as below?

var loweredStr = str.ToLower();
if (loweredStr.Contains(anotherStr))

We would need to check that we have exactly one reference to the intermediate temporary variable.

If yes, we already added another analyzer which implements a simple heuristic for detecting above patterns: https://github.com/dotnet/roslyn-analyzers/blob/main/src/NetAnalyzers/Core/Microsoft.NetCore.Analyzers/Runtime/PreferStringContainsOverIndexOfAnalyzer.cs. If not for v1, but feel it may be a worthwhile enhancement, you may also consider filing a tracking issue for this enhancement.

@carlossanlop carlossanlop added the in-pr There is an active PR which will close this issue when it is merged label Jun 17, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer in-pr There is an active PR which will close this issue when it is merged partner-impact This issue impacts a partner who needs to be kept updated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants