Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Analyzer] Recommend string.StartsWith(c) instead of string.StartsWith("c") #78392

Closed
stephentoub opened this issue Nov 15, 2022 · 22 comments · Fixed by dotnet/roslyn-analyzers#6799
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer help wanted [up-for-grabs] Good issue for external contributors partner-impact This issue impacts a partner who needs to be kept updated
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Nov 15, 2022

We should consider adding an analyzer that recommends using the char-based StartsWith(char) instead of StartsWith(string, ...) when the string is a constant single character. The former is more optimized.

However, this can have observable behavioral differences. string.StartsWith(char) is ordinal, whereas string.StartsWith(string) uses the current culture. This, for example, prints True and then False, highlighting that there can be semantic differences due to culture:

using System.Globalization;

CultureInfo.CurrentCulture = new CultureInfo("en-US");
Console.WriteLine("\u0061\u030a".StartsWith("\u00e5"));
Console.WriteLine("\u0061\u030a".StartsWith('\u00e5'));

It would be safe to replace string.StartsWith("c", StringComparison.Ordinal) with string.StartsWith('c') for arbitrary characters. We could do that with one diagnostic ID, and have a separate diagnostic ID (potentially off by default) for the potentially problematic case.

Same for EndsWith, IndexOf, and LastIndexOf.

Proposal to have 2 analyzers with Performance rules Category

For the case when passing StringComparison.Ordinal

Severity = suggestion

For the case when not passing StringComparison parameter

Severity = none (off by default)

@stephentoub stephentoub added code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer labels Nov 15, 2022
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 15, 2022
@ghost
Copy link

ghost commented Nov 15, 2022

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

We should consider adding an analyzer that recommends using the char-based StartsWith(char) instead of StartsWith(string, ...) when the string is a constant single character. The former is more optimized.

However, this can have observable behavioral differences. string.StartsWith(char) is ordinal, whereas string.StartsWith(string) uses the current culture. This, for example, prints True and then False, highlighting that there can be semantic differences due to culture:

using System.Globalization;

CultureInfo.CurrentCulture = new CultureInfo("en-US");
Console.WriteLine("\u0061\u030a".StartsWith("\u00e5"));
Console.WriteLine("\u0061\u030a".StartsWith('\u00e5'));

It would be safe to replace string.StartsWith("c", StringComparer.Ordinal) with string.StartsWith('c') for arbitrary characters. We could do that with one diagnostic ID, and have a separate diagnostic ID (potentially off by default) for the potentially problematic case.

Same for EndsWith.

Author: stephentoub
Assignees: -
Labels:

area-System.Globalization, code-analyzer, code-fixer

Milestone: -

@tarekgh tarekgh added this to the Future milestone Nov 15, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Nov 15, 2022
@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

I support doing the safer version when using Ordinal option. For other cases, I think we have/should have another analyzer checked if using the API without intentionally passing any StringComparer option.

@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

@Youssef1313 are you interested to look at this one?

@Youssef1313
Copy link
Member

@tarekgh Sure. Is it ready to implement or does it go through design review first?

For reference, there is already a similar analyzer: https://learn.microsoft.com/en-us/dotnet/fundamentals/code-analysis/quality-rules/ca1847

@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

@Youssef1313 yes we need to take it to the design review. we need to decide first if we need to do that with the Ordinal option only. Also, I think this analyzer should cover Last/IndexOf too.

@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

@stephentoub I suggest the following for this analyzer rule. If you agree I can add this information to the description, and we can mark the issue ready for review.

Globalization rules Category
Severity = suggestion

@stephentoub
Copy link
Member Author

I'd have thought the category would be performance rather than globalization, since it's primarily about reducing overheads rather than fixing globalization-related correctness issues. But, I don't have a strong opinion.

And, yeah, suggestion / info.

Thanks.

@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

Performance sounds good to me. I'll edit the description and mark the issue ready for review.

@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

I forgot to mention this rule should include Last/IndexOf APIs too.

@tarekgh tarekgh added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Nov 15, 2022
@tarekgh tarekgh modified the milestones: Future, 8.0.0 Nov 15, 2022
@Youssef1313
Copy link
Member

Youssef1313 commented Nov 15, 2022

One thing to note here: globalization analyzers, by default, skip the analysis if InvariantGlobalization property is set to true.

https://github.com/dotnet/roslyn-analyzers/blob/e26a04c7d8005b147db9e2d7169f039c60de3542/src/NetAnalyzers/Core/AbstractGlobalizationDiagnosticAnalyzer.cs#L18-L22

@tarekgh
Copy link
Member

tarekgh commented Nov 15, 2022

I don't have a strong preference here too. I am seeing this rule can still be useful to fire with Invariant mode too. I mean with the case when passing Ordinal option. But I don't mind skipping this rule for Invariant mode too.

@tarekgh tarekgh added the blocking Marks issues that we want to fast track in order to unblock other important work label Nov 17, 2022
@jeffhandley jeffhandley added the partner-impact This issue impacts a partner who needs to be kept updated label Nov 21, 2022
@bartonjs
Copy link
Member

Looks good as proposed:

Target methods:

  • StartsWith
  • EndsWith
  • IndexOf
  • LastIndexOf

Three different diagnostic IDs all in the Performance category

  1. Safe transformations
  • Conditions:
    • When passing StringComparison.Ordinal
    • When passing InvariantCulture with a US-ASCII character
  • Severity = suggestion
  1. No specified comparison
  • Severity = suggestion
  1. Any other specified comparison
  • Severity = none

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Nov 22, 2022
@tarekgh
Copy link
Member

tarekgh commented Jan 13, 2023

@Youssef1313 are you interested in helping with this one?

@Youssef1313
Copy link
Member

Sure, but that won't be soon, unfortunately. It will at least be after I finish exams.

@tarekgh
Copy link
Member

tarekgh commented Jan 13, 2023

@Youssef1313 thanks!

Focus on the exams and forget anything else :-) nothing urgent. good luck!

@buyaa-n buyaa-n added the help wanted [up-for-grabs] Good issue for external contributors label May 30, 2023
@mrahhal
Copy link

mrahhal commented Jul 21, 2023

Hi. Would like to work on this if no one's on it.

@Youssef1313
Copy link
Member

@mrahhal I haven't yet looked into it. You can go for it!

@danmoseley
Copy link
Member

@mrahhal I have assigned to you.

@mrahhal
Copy link

mrahhal commented Jul 21, 2023

Confirming a few things:

  1. There's already UseStringContainsCharOverloadWithSingleCharactersAnalyzer covering string.Contains, which would have been just another target method here. Wondering if we can update this existing analyzer and reuse the same diagnostic ID and extend it to cover the other methods, or if we have to create a new analyzer and a new diagnostic ID.
    • Related, UseStringContainsCharOverloadWithSingleCharactersAnalyzer does not look at the comparison at all and will always suggest the change. Shouldn't this be updated anyway to match the comparison rules/conditions described in this issue? Otherwise we will have a seemingly arbitrary difference in suggested diags between Contains and StartsWith/etc.
  2. I'm assuming I could implement the different methods and cases in a single analyzer (similar to UseExceptionThrowHelpers for example). Asking because the original comment says "Proposal to have 2 analyzers ..."

@tarekgh
Copy link
Member

tarekgh commented Jul 23, 2023

Wondering if we can update this existing analyzer and reuse the same diagnostic ID and extend it to cover the other methods, or if we have to create a new analyzer and a new diagnostic ID.

Please use new diagnostics IDs for all new cases we are covering here.

Related, UseStringContainsCharOverloadWithSingleCharactersAnalyzer does not look at the comparison at all and will always suggest the change. Shouldn't this be updated anyway to match the comparison rules/conditions described in this issue? Otherwise we will have a seemingly arbitrary difference in suggested diags between Contains and StartsWith/etc.

Let's look at the UseStringContainsCharOverloadWithSingleCharactersAnalyzer separately and we can fix it as needed. Supporting more cases with existing one may need to be done with different diagnsotic IDs.

'm assuming I could implement the different methods and cases in a single analyzer (similar to UseExceptionThrowHelpers for example). Asking because the original comment says "Proposal to have 2 analyzers ..."

That is fine to implement these in one analyzer as long as getting the results as described in #78392 (comment)

@sharwell
Copy link
Member

@stephentoub @danmoseley side note that it may help the review process for these analyzers if, after the design review is complete, the accepted design is added to a new section at the end of the first post of the issue. This would allow review of the implementation relative to just that design without needing to read the entire thread.

@mrahhal
Copy link

mrahhal commented Jul 25, 2023

Agreed. It's a bit taxing going back and forth between the original comment and the other comments, especially if the original comment has outdated information, as is the case in this issue.

@dotnet dotnet locked as resolved and limited conversation to collaborators Aug 26, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer help wanted [up-for-grabs] Good issue for external contributors partner-impact This issue impacts a partner who needs to be kept updated
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants