Skip to content

Update AnalyzerConfig.cs s_propertyMatcherPattern regex key#78459

Open
GweltazEFi wants to merge 2 commits intodotnet:mainfrom
GweltazEFi:main
Open

Update AnalyzerConfig.cs s_propertyMatcherPattern regex key#78459
GweltazEFi wants to merge 2 commits intodotnet:mainfrom
GweltazEFi:main

Conversation

@GweltazEFi
Copy link

@GweltazEFi GweltazEFi commented May 6, 2025

Closes #55431
Allowed any character in the key name as per specification, including spaces, excluding : and = characters.
Changed \s* to [^\S\r\n] to account for multiline context

Regex testings: https://regex101.com/r/l0FeUH/9

PR related to issue #55431 (comment)

Notes:

  • To NOT break current implementation parsing, inline comments are still available. Removing the trailing ([#;].*)? part would solve the issue.
  • Multiline parsing is enabled as part of easy regex testings, eventho it's a single-line evaluation process implementation.

@dotnet-policy-service agree

@GweltazEFi GweltazEFi requested a review from a team as a code owner May 6, 2025 11:02
@ghost ghost added Area-Analyzers untriaged Issues and PRs which have not yet been triaged by a lead labels May 6, 2025
@dotnet-policy-service dotnet-policy-service bot added the Community The pull request was submitted by a contributor who is not a Microsoft employee. label May 6, 2025
@jaredpar
Copy link
Member

jaredpar commented May 7, 2025

Regex testings: https://regex101.com/r/l0FeUH/6

Going to need unit tests for this change in our code base.

Copy link
Member

@jaredpar jaredpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for tests

@GweltazEFi GweltazEFi requested a review from jaredpar May 7, 2025 14:08
@CyrusNajmabadi
Copy link
Contributor

My general concern is that regexes are complex, and i do not have any innate sense of confidence that this isn't breaking some case we're not considering. Seems risky.

TBH, i'm somewhat surprised we're using a regex here (unless it's a regex advised by the spec), as otherwise, the idea of splitting a string on =, trimming text, and splitting of a # comment seems like something much easier written out in code form in a way that is much easier to validate.

@GweltazEFi
Copy link
Author

(Reduced quote)

My general concern is that regexes are complex, and [..] something much easier written out in code form in a way that is much easier to validate.

I do agree. My goal was not to tear appart the subject, only to make it work for #55431.
I can explain in detail what the changes do tho, if that would help on the acceptation process (differences in the current regex afterwards):
^[^\S\r\n]*((?:[^=:\s]+[^\S\r\n]*?)+?)[^\S\r\n]*[=:][^\S\r\n]*(.*?)[^\S\r\n]*([#;].*)?$

  • ^[^\S\r\n]* trimms any starting whitespaces-compatible character, excluding end of lines (included in \s evaluation, hence the "NOT non-whitespace excluding line feed and carriage return")
  • ( starts capturing group 1; that is the Key part
  • (?: starts a non-disturbing group, allowing for multiple-part key
  • [^=:\s]+ matches any non-space character, excluding end-of-key markers
  • [^\S\r\n]* matches any whitespaces
  • ?)+ ends the non-disturbing group (in a lazy way), requiring it to be matched at least once
  • ?)[^\S\r\n]* ends capturing group 1 (in a lazy way) while trimming trailing whitespaces; because non-disturbing group is lazy, this trailing trimm will take precedence over group 1 content
  • [=:][^\S\r\n]* separates key from value by either equal sign or colon while trimming trailing whitespaces
  • (.*?)[^\S\r\n]* captures anything (in a lazy way) as a second group, while trimming trailing whitespaces; that is the Value part
  • ([#;].*)?$ captures anything after the first item or semi-colon (if either present) until the end of the line as group 3; because group 2 is lazy, group 3 will take precedence over group 2; if present, that is the (discarded) Comment part

General differences:

  • [^\S\r\n] to avoid line feeds and carriage returns from standard \s evaluation
  • (?: and ?) as internal lazy non-capturing evaluation key group, allowing for multi-part keys
  • [^=:\s] as valid key characters, composed of as little as possible spaces characters
  • Against specifications, inline comments are kept as-is because that vould be a breaking change I'd want to not introduce

@CyrusNajmabadi: Would it be a non-regex based change, could it be filed as a separate refactorization topic, to let the 3+ year old issue be closed in the meantime?
I'd gladly help on the refactorization aswell.

@GweltazEFi
Copy link
Author

Ready for review.

@GweltazEFi
Copy link
Author

Waiting for tests

Hello @jaredpar, are the tests provided here enough, or is there anything more you'd want to see tested?
I tried to be as complete and diverse as possible, hence some tests are somewhat duplicates: testing for double inline comments, and for both end-of-key markers when one would be enough, for example.
Please let me know if something needs refinement.

@GweltazEFi
Copy link
Author

Waiting for tests

@jaredpar: Are you still waiting on tests?

In the meantime, I asked Copilot for a review:

Can you review https://github.com/dotnet/roslyn/pull/78459/files (solely based on modifications), please?

Sure! Here's a review of the modifications in the pull request #78459 on the dotnet/roslyn repository:

  1. AnalyzerConfig.cs:

    • Change: Updated the s_propertyMatcherPattern regex to allow any character, including spaces, except : and =.
    • Details: The new regex pattern is ^[^\S\r\n]*((?:[^=:\s]+[^\S\r\n]*?)+?)[^\S\r\n]*[=:][^\S\r\n]*(.*?)[^\S\r\n]*([#;].*)?$.
  2. EditorConfigParser.cs:

    • Change: Updated the s_propertyMatcher regex to match the changes in AnalyzerConfig.cs.
    • Details: The new regex pattern is the same as above.
  3. AnalyzerConfigTests.cs:

    • Additions: Added multiple unit tests to verify the new regex patterns and edge cases.
    • Details: Tests include cases for properties without values, properties with simple values, properties with values impersonating keys, and various edge cases.

These changes aim to enhance the flexibility and accuracy of property matching in analyzer configuration files.

If you need more details or have any specific questions, feel free to ask!

In case previous message isn't detailed enough, that's Copilot breaking down the new regex:

Can you expand on the regex?

Certainly! Let's break down the updated regex pattern from the AnalyzerConfig.cs file:

^[^\S\r\n]*((?:[^=:\s]+[^\S\r\n]*?)+?)[^\S\r\n]*[=:][^\S\r\n]*(.*?)[^\S\r\n]*([#;].*)?$

Here's a detailed explanation of each part:

  1. ^[^\S\r\n]*:

    • ^: Asserts the position at the start of a line.
    • [^\S\r\n]*: Matches any whitespace character except for newlines (\r and \n), zero or more times. This allows for leading spaces or tabs.
  2. ((?:[^=:\s]+[^\S\r\n]*?)+?):

    • (?:...): Non-capturing group to group parts of the pattern without creating a backreference.
    • [^=:\s]+: Matches one or more characters that are not =, :, or whitespace.
    • [^\S\r\n]*?: Matches any whitespace character except for newlines, zero or more times, non-greedily.
    • +?: Matches the previous pattern one or more times, non-greedily.
    • ((...)): Capturing group to capture the property name.
  3. [^\S\r\n]*[=:][^\S\r\n]*:

    • [^\S\r\n]*: Matches any whitespace character except for newlines, zero or more times.
    • [=:]: Matches either = or :.
    • [^\S\r\n]*: Matches any whitespace character except for newlines, zero or more times. This allows for spaces around the = or :.
  4. (.*?)[^\S\r\n]*:

    • (.*?): Capturing group to capture the property value. Matches any character (except for newlines) zero or more times, non-greedily.
    • [^\S\r\n]*: Matches any whitespace character except for newlines, zero or more times. This allows for trailing spaces or tabs.
  5. ([#;].*)?$:

    • ([#;].*)?: Optional capturing group to capture comments. Matches # or ; followed by any character (except for newlines) zero or more times.
    • $: Asserts the position at the end of a line.

This regex pattern is designed to match and capture property names and values in a configuration file, allowing for flexible spacing and optional comments.

If you have any specific questions about parts of the regex or need further clarification, feel free to ask!

To rephrase Copilot: if you have anything in mind, please let me know.

Removed most of the redundant tests from the explicit test coverage.
Spaced keys were already tested: previously disallowed, now allowed.
Added tests for Values looking like keys to ensure regex do end at the leftmost of those break points.
Added tests for Values looking like config options without being them.
@GweltazEFi
Copy link
Author

My account will be suspended or deleted in the near future, this code is free to use.

Daynvheur pushed a commit to Daynvheur/roslyn that referenced this pull request Mar 2, 2026
Closes dotnet#55431
Allows space characters in the key name as per [specification](https://spec.editorconfig.org/#file-format).

Regex testings: https://regex101.com/r/l0FeUH/9 (Changed `\s` to `[^\S\r\n]` there only to account for multiline context, see notes)

PR related to issue [dotnet#55431 (comment)](dotnet#55431 (comment))
Closes PR [!78459](dotnet#78459)

Notes:

* To NOT break [current](https://github.com/dotnet/roslyn/blob/1e14d8a2f9eb04b0c9b4076fdc8a7f02d5d53ab1/src/Compilers/Core/Portable/CommandLine/AnalyzerConfig.cs#L25) implementation parsing, inline comments are still available. Removing the trailing `([#;].*)?` part would solve the issue.
* It's a single-line evaluation process implementation, I changed `\s` to `[^\S\r\n]` only in the multiline testing context
* `[\w\.\-_]` ["a word character, or a dot, or a hyphen, or an underscore"] is much slower than (the previously proposed) `[^=:\s]` ["any non-whitespace character or equal or double dot"], and can't be parsed on multiline online tools with the evaluation list linked above, I thus kept the regex testings link as it was initially

@dotnet-policy-service agree
Daynvheur pushed a commit to Daynvheur/roslyn that referenced this pull request Mar 5, 2026
Closes dotnet#55431
Allows space characters in the key name as per [specification](https://spec.editorconfig.org/#file-format).

Regex testings: https://regex101.com/r/l0FeUH/9 (Changed `\s` to `[^\S\r\n]` there only to account for multiline context, see notes)

PR related to issue [dotnet#55431 (comment)](dotnet#55431 (comment))
Closes PR [!78459](dotnet#78459)

Notes:

* To NOT break [current](https://github.com/dotnet/roslyn/blob/1e14d8a2f9eb04b0c9b4076fdc8a7f02d5d53ab1/src/Compilers/Core/Portable/CommandLine/AnalyzerConfig.cs#L25) implementation parsing, inline comments are still available. Removing the trailing `([#;].*)?` part would solve the issue.
* It's a single-line evaluation process implementation, I changed `\s` to `[^\S\r\n]` only in the multiline testing context
* `[\w\.\-_]` ["a word character, or a dot, or a dash, or an underscore"] is much slower than (the previously proposed) `[^=:\s]` ["any non-whitespace character or equal or double dot"], and can't be parsed on multiline online tools with the evaluation list linked above, I thus kept the regex testings link as it was initially
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-Analyzers Community The pull request was submitted by a contributor who is not a Microsoft employee. untriaged Issues and PRs which have not yet been triaged by a lead VSCode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

.editorconfig for diagnostic category with spaces

4 participants