Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(useNamingConvention): options for custom conventions #2770

Merged
merged 3 commits into from
May 12, 2024

Conversation

Conaclos
Copy link
Member

@Conaclos Conaclos commented May 8, 2024

Summary

Fix #1900
Fix #2314

The feature is basically implemented.
I had some hard time to find a balance between simplicity and expressiveness.

I first implemented the solution I described in the related discussion.
Unfortunately this was not enough expressive to cover some usages.

I switched to a solution that is closer to TypeScript ESLint naming-convention, but hopefully simpler.

A user can basically provide an array of (custom) conventions.
A convention has two parts:

  • a selector that determines which declaration are concerned;
  • some requirements to check.

A selector selects a declaration based on several criteria:

  • kind: the kind of the declaration
    Some examples: variableLike, const, enum, typeLike, enumMember, classMember, ...
    In contrast to ESLint, enumMember is a TypeLike.
    Also, we provide const/let/var/using kinds, while in the ESLint rule they are modifiers.
    I also tried to reduce the number of kinds and to make them more understandable.
  • modifiers: an array of modifiers among abstract, private, protected, readonly, static
  • scope: the scope where the declaration is. It takes a value among global and any (default value)
  • usage (Not implemented yet): how the declaration is used.
    I am thinking about three values: unused, new (used in a NewExpression), component (used as a component).

An empty selector selects any declaration.

A convention must specify at least one requirement among:

  • formats: the allowed string cases for the selected names;
  • match: a regex that the name must match.

In the case where the regex matches and returns a capture,
then the capture is checked against formats instead of checking the entire name against formats.
If formats is unspecified, then the capture is the name that must follow the next conventions in the array.
Only the first capture is considered. Other capturing groups are considered as non-capturing groups.

The algorithm follows several steps:

  1. First determine the characteristics of a declaration (node_selector in the code)
  2. Traverse the array of conventions searching for conventions that apply to the declaration
    based on its characteristics.
  3. Apply the requirements and stop if:
    • match doesn't match or formats is not followed
    • match matches and doesn't capture or capture an empty string
    • formats is respected

If the traversal is not stopped until the end, then we apply the default convention.
The default convention strips leading and trailing underscores and dollar signs only if the name was not trimmed by a match.

Let's take an example:

  • We want to require an interface to start with I except for interface names that end with Error.
  • We want to require every private class properties to start with an underscore and to be in camelCase.
  • We want to require that every constant in the global scope be in CONSTANT_CASE.

These conventions can be achieved with the following configuration:

{
  "linter": {
    "rules": {
      "style": {
        "useNamingConvention": {
          "level": "error",
          "options": {
            "conventions": [
              {
                "selector": {
                  "kind": "interface"
                },
                "match": "I(.+)|(.*)Error"
              },
              {
                "selector": {
                  "kind": "classMember",
                  "modifiers": ["private"]
                },
                "match": "_(.+)",
                "formats": ["camelCase"]
              },
              {
                "selector": {
                  "kind": "const",
                  "scope": "global"
                },
                "formats": ["CONSTANT_CASE"]
              }
            ]
          }
        }
      }
    }
  }
}

When an interface name is checked against the conventions, we apply the convention in-order.
The selector of the first convention selects the interface.
Then we apply the requirement match.

  • In the case of an interface named IArguemts, the match captures Arguemts. The name part Arguemts is then checked against the following conventions.
    No other convention applies, except the default one which checks that Arguemts is in PascalCase.
    All conventions are thus respected.
  • In the case of an interface Error, the match captures an empty string.
    Thus, the convention is respected and we stop there.
  • In the case of an interface named Other, the first convention is not fullfilled.
    We stop there and report the mismatch.

Note that because we forward the captures, the previous configuration could also be expressed as follows:

{
  "linter": {
    "rules": {
      "style": {
        "useNamingConvention": {
          "level": "error",
          "options": {
            "conventions": [
              {
                "selector": {
                  "kind": "interface"
                },
                "match": "I(.+)|(.*)Error"
              },
              {
                "selector": {
                  "kind": "classMember",
                  "modifiers": ["private"]
                },
                "match": "_(.+)"
              },
              {
                "selector": {
                  "kind": "classMember",
                  "modifiers": ["private"]
                },
                "formats": ["camelCase"]
              },
              {
                "selector": {
                  "kind": "const",
                  "scope": "global"
                },
                "formats": ["CONSTANT_CASE"]
              }
            ]
          }
        }
      }
    }
  }
}

NOTE: We could simplify the system by not forwarding a capture, and thus requiring fromats when there is a capture. This could ensure that at most one regex is executed for every member avoiding abuses of regexes by users. I am open to discussions about this simplification.

I decided to use regexes because it allows to condensate several features elegantly: We don't have to provide specific fields like prefix, suffix, exceptions. All these features are provided by a regex.
I was uncomfortable with the idea of exposing to user the extended syntax of the regex crate.
Thus, I implemented a wrapper that ensures that the regex follows a restricted syntax.
I also wrap the regex by default between anchors ^ and $ making the regex more readable and strict.

This PR also improves diagnostics by reporting the range of the name that must respect a format or a regex.

The PR improves biome_string_case crate by using bitsets to check Case compatibility and to combine several cases into a bitset Cases.
Unfortunately, enumflags2 doesn't allow enum variants that set multiple bits.
Thus, I had to provide my own implementation.

I am using enumflags2 for other bitsets, in particular a bitset for members' Modifier.

Default naming convention changes

I changed some defaults to accommodate with known conventions.

Remaining tasks

  • polish the documentation
  • Decide if the kind selector has enough kinds or too much?
  • Add changelog entry
  • Add a validation step to reject invalid association of kind/modifiers/scope
  • implement the usage selector (in the future - this PR is already too big)
  • Implement a migration with migrate eslint (in another PR - this PR is already too big)

Test Plan

I added new tests and updated previous snapshots

@github-actions github-actions bot added A-CLI Area: CLI A-Project Area: project A-Linter Area: linter A-Parser Area: parser A-Formatter Area: formatter L-JavaScript Language: JavaScript and super languages labels May 8, 2024
Copy link
Contributor

github-actions bot commented May 8, 2024

Parser conformance results on

js/262

Test result main count This PR count Difference
Total 49701 49701 0
Passed 48721 48721 0
Failed 980 980 0
Panics 0 0 0
Coverage 98.03% 98.03% 0.00%

jsx/babel

Test result main count This PR count Difference
Total 40 40 0
Passed 37 37 0
Failed 3 3 0
Panics 0 0 0
Coverage 92.50% 92.50% 0.00%

symbols/microsoft

Test result main count This PR count Difference
Total 6322 6322 0
Passed 2036 2036 0
Failed 4286 4286 0
Panics 0 0 0
Coverage 32.20% 32.20% 0.00%

ts/babel

Test result main count This PR count Difference
Total 662 662 0
Passed 593 593 0
Failed 69 69 0
Panics 0 0 0
Coverage 89.58% 89.58% 0.00%

ts/microsoft

Test result main count This PR count Difference
Total 17646 17646 0
Passed 13439 13439 0
Failed 4205 4205 0
Panics 2 2 0
Coverage 76.16% 76.16% 0.00%

@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch from 5a3a996 to dcb4e5d Compare May 8, 2024 17:34
@Conaclos Conaclos changed the title feat(useNamingConvention): add options for cutsom conventions feat(useNamingConvention): options for custom conventions May 8, 2024
@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch 2 times, most recently from 6f9d67e to b486fe0 Compare May 8, 2024 17:59
Copy link

codspeed-hq bot commented May 8, 2024

CodSpeed Performance Report

Merging #2770 will not alter performance

Comparing conaclos/useNamingConvention/custom-conventions (d2ca98b) with main (06a587e)

Summary

✅ 97 untouched benchmarks

@Conaclos
Copy link
Member Author

Conaclos commented May 8, 2024

I will try to reduce the perf impact. Maybe it is caused by the use of a regex by default.

@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch 12 times, most recently from 66bfdd2 to e00d9ea Compare May 8, 2024 22:50
@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch 2 times, most recently from eab38aa to 6d40b9f Compare May 9, 2024 10:51
@Conaclos
Copy link
Member Author

Conaclos commented May 9, 2024

I will try to reduce the perf impact. Maybe it is caused by the use of a regex by default.

I was right, the perf regression was coming from running the regex for each identifier. I removed this default regex and use custom code (as before) to trim underscores and dollar signs before checking the default convention.

Unfortunately the benchmark is timing out on a CSS file. Thus, the benchmark results cannot be updated for now.

@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch 4 times, most recently from 59ae55f to dd5896b Compare May 10, 2024 13:07
@github-actions github-actions bot added the A-Changelog Area: changelog label May 10, 2024
@Conaclos Conaclos requested review from a team May 10, 2024 13:10
@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch 2 times, most recently from 9b1cc4c to 5641657 Compare May 10, 2024 13:15
@ematipico
Copy link
Member

ematipico commented May 10, 2024

Why don't we call the option "conventions" instead of "custom"? That's how you called it in the changelog: "custom conventions" 😄

"custom" doesn't say anything to me, but "conventions" is waaay more descriptive!

CHANGELOG.md Outdated Show resolved Hide resolved
crates/biome_js_analyze/src/utils/regex.rs Outdated Show resolved Hide resolved
Comment on lines +449 to +462
/// ### Regular expression syntax
///
/// The `match` option takes a regular expression that supports the following syntaxes:
///
/// - Greedy quantifiers `*`, `?`, `+`, `{n}`, `{n,m}`, `{n,}`, `{m}`
/// - Non-greedy quantifiers `*?`, `??`, `+?`, `{n}?`, `{n,m}?`, `{n,}?`, `{m}?`
/// - Any character matcher `.`
/// - Character classes `[a-z]`, `[xyz]`, `[^a-z]`
/// - Alternations `|`
/// - Capturing groups `()`
/// - Non-capturing groups `(?:)`
/// - A limited set of escaped characters including all special characters
/// and regular string escape characters `\f`, `\n`, `\r`, `\t`, `\v`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should move this part inside our website, maybe in the analyzer or another section of the website, mostly because this explanation feels reductive. I think we should be more thorough and provide examples for each bullet point where we explain the patterns

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. For now, only this rule uses regexes, however this could change in the future.

I will open a PR on the website repository.
I am unsure where this section should be.
Thus, I will keep this overview for now and will remove it in another PR with a proper link to the future regex section on the website.

@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch from 5641657 to 8abae3f Compare May 10, 2024 14:20
@Conaclos
Copy link
Member Author

Why don't we call the option "conventions" instead of "custom"? That's how you called it in the changelog: "custom conventions" 😄

"custom" doesn't say anything to me, but "conventions" is waaay more descriptive!

I used custom because we have "defaults conventions".
However, I agree that it could make more sense of using conventions.

@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch 4 times, most recently from cdd20b1 to 477a5f2 Compare May 10, 2024 22:27
@Conaclos Conaclos force-pushed the conaclos/useNamingConvention/custom-conventions branch from 477a5f2 to a5a0a3a Compare May 10, 2024 22:29
@Conaclos Conaclos merged commit 06a587e into main May 12, 2024
15 checks passed
@Conaclos Conaclos deleted the conaclos/useNamingConvention/custom-conventions branch May 12, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Changelog Area: changelog A-CLI Area: CLI A-Formatter Area: formatter A-Linter Area: linter A-Parser Area: parser A-Project Area: project L-JavaScript Language: JavaScript and super languages
Projects
None yet
2 participants