Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the "needed-extra-regex" option #246

Closed
wants to merge 7 commits into from
Closed

Add the "needed-extra-regex" option #246

wants to merge 7 commits into from

Conversation

leofeyer
Copy link

@leofeyer leofeyer commented Apr 1, 2020

This is the follow-up PR to #244, which adds the needed-extra-regex option to the quoted-strings rule.

quoted-strings:
  required: only-when-needed
  needed-extra-regex: ^%.*$

This will suppress the "string value is redundantly quoted" warning for strings starting with a % character.

@coveralls
Copy link

coveralls commented Apr 1, 2020

Coverage Status

Coverage decreased (-0.4%) to 97.434% when pulling 91cb70b on leofeyer:feature/needed-extra-regex into 1a13837 on adrienverge:master.

@leofeyer
Copy link
Author

leofeyer commented Apr 1, 2020

Neither the failing tests nor the coverage decrease seems related to my changes. 🤷‍♂

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leofeyer! I haven't tested but it looks functional.

I'm sorry about Travis not reporting tests, I'll look into that.

However for coverage, it is indeed related to your change: you add more code than you add tests (relatively). It's important to cover all cases (e.g. needed-extra-regex: ^%.*$, needed-extra-regex: ^%, needed-extra-regex: ;$, needed-extra-regex: ' ', etc.) including corner cases like multi-line regexes on:

        self.check('---\n'
                   'multiline string 1: |\n'
                   '  line 1\n'
                   '  line 2\n'
		   ...

Also, you need to update documentation if you don't want this to be a hidden feature 😉
Since there are multiple types of regex syntaxes, I think it's worth mentionning that this regex format is PCRE, more info on https://docs.python.org/3/library/re.html#regular-expression-syntax.

yamllint/rules/quoted_strings.py Show resolved Hide resolved
Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @leofeyer, it looks good! New documentation is short and clear 👍 (I just have a suggestion)

Also in tests, can you make sure quotes are required when the regex matches? For example:

        self.check('---\n'
                   'string1: \'$foo$\'\n'                    # fails
                   'string2: \'%foo%\'\n'
                   'string3: $foo$\n'
                   'string4: %foo%\n',                       # fails
                   conf, problem1=(2, 10), problem2=(5, 10))

(by the way, does the code in yamllint/rules/quoted_strings.py correctly ensure that?)

yamllint/rules/quoted_strings.py Outdated Show resolved Hide resolved
yamllint/rules/quoted_strings.py Show resolved Hide resolved
@leofeyer
Copy link
Author

leofeyer commented Apr 8, 2020

Also in tests, can you make sure quotes are required when the regex matches? For example:

        self.check('---\n'
                   'string1: \'$foo$\'\n'                    # fails
                   'string2: \'%foo%\'\n'
                   'string3: $foo$\n'
                   'string4: %foo%\n',                       # fails
                   conf, problem1=(2, 10), problem2=(5, 10))

No, this cannot be tested, as it throws a syntax error: found character '%' that cannot start any token

@adrienverge
Copy link
Owner

No, this cannot be tested, as it throws a syntax error: found character '%' that cannot start any token

Sure, but you get the idea, right? The goal is to test that the new option does its job, for example using another regex like ^-.*-$ with string4: -foo-.

By the way, if % cannot start any token (and creates a syntax error), then checking needed-extra-regex: ^%.*$ is a bit useless because yamllit would already report this starting % as an error? Or maybe I'm missing something?

@leofeyer
Copy link
Author

leofeyer commented Apr 8, 2020

Sure, but you get the idea, right? The goal is to test that the new option does its job, for example using another regex like ^-.*-$ with string4: -foo-.

I can try.

By the way, if % cannot start any token (and creates a syntax error), then checking needed-extra-regex: ^%.*$ is a bit useless because yamllit would already report this starting % as an error? Or maybe I'm missing something?

Maybe % should be in START_TOKENS then? At least the specs list % as a directive indicator, which can never start an unquoted string:

@adrienverge
Copy link
Owner

Maybe % should be in START_TOKENS then? At least the specs list % as a directive indicator, which can never start an unquoted string:

Interesting! Since % is only for directive indicators (at the very beginning of the document), I think adding it to START_TOKENS would be valid, but wouldn't have any positive effect.

Anyway, since % can never start an unquoted string, maybe it's better to change tests to use another regex? For example ^-.*-$ would be an easy change.

@leofeyer
Copy link
Author

leofeyer commented Apr 8, 2020

For example ^-.*-$ would be an easy change.

Isn't - in START_TOKENS and will be quoted anyway?

@leofeyer
Copy link
Author

leofeyer commented Apr 8, 2020

Also, if % can never start an unquoted string, isn't it a bug that yamllint allows it and even says "is incorrectly quoted" if a quoted string starts with %? IMHO, this is a bug that should be fixed independently from this PR (which adds a new feature).

@adrienverge
Copy link
Owner

Isn't - in START_TOKENS and will be quoted anyway?

Actually - can start a string, I think it's valid to write key: -val (where "-val" is a string). So, START_TOKENS is a simplification that doesn't work perfectly. Thanks for pointing this out.
@ruipinge any thoughts? To summarize: with quoted-strings: {required: only-when-needed}, yamllint shouldn't report key: -val as a problem, because "-val" is a valid string that doesn't need quotes.

Also, if % can never start an unquoted string, isn't it a bug that yamllint allows it and even says "is incorrectly quoted" if a quoted string starts with %? IMHO, this is a bug that should be fixed independently from this PR (which adds a new feature).

I'm not sure whether it's strictly forbidden to have something like key: %val, it would need to read the spec carefully. Anyway I agree with you: let's not mess with % in this PR. Let's use something else in tests, for example ^http://?

@leofeyer
Copy link
Author

leofeyer commented Apr 9, 2020

I have adjusted the tests in eb3c857 to use ^http:// instead of ^%.

@adrienverge
Copy link
Owner

Great, thanks! 👍

What about my comment #246 (review)?

@leofeyer
Copy link
Author

leofeyer commented Apr 10, 2020

What about my comment #246 (review)?

This would be a new feature that has nothing to do with my PR, because right now required: only-when-needed does not enforce quotes, either.

You can test this yourself by changing test_quoted_strings.py line 181 'string1: foo\n' to 'string1: *foo\n' and running the tests. They will pass just fine and there will be no string value is not quoted with %s quotes error.

This is a code snippet that would implement the feature:

    elif tag == DEFAULT_SCALAR_TAG and token.value:

        # Quotes are required because the token begins with a special character
        # or matches the extra regex
        if token.value[0] in START_TOKENS or (extra_regex != ''
                and re.search(extra_regex, token.value)):
            msg = "string value is not quoted with %s quotes" % (quote_type)

But as I said, it has nothing to do with this PR, therefore I suggest to merge the PR as is and to create a new PR to adjust the checks to enforce quotes if a string matches START_TOKENS or the new extra regex.

adrienverge added a commit that referenced this pull request Apr 10, 2020
Add ability to:
- require strings to be quoted if they match a pattern (PCRE regex)
- allow quoted strings if they match a pattern, while `require:
  only-when-needed` is enforced.

Co-Authored-By: Leo Feyer (#246)
@adrienverge
Copy link
Owner

adrienverge commented Apr 10, 2020

OK so this feature has 2 parts:

  1. Allow quoted strings if they match regex, while require: only-when-needed is enforced.
  2. Require strings to be quoted if they match regex.

In my opinion part 2 has "something to do with your PR", as users who read require and extra-regex may expect strings that match to be quoted (that was my case, for example).
Moreover, I prefer adding the whole feature at once than just half of it (merging this PR now), because it helps choosing adapted keywords (e.g. needed-extra-regex may not be ideal given the 2 points above, maybe extra-allowed and extra-required would be better?) and detect problems early.

Other things I don't agree with:

  • Your example string1: *foo isn't valid because *foo is a YAML alias (see https://yaml.org/spec/1.2/spec.html#id2786196), not a string.

  • Tests you added don't check needed-extra-regex with other "regular" value types (see the rest of test_quoted_strings.py), and it's a problem because it could create unexpected behaviors (it's frequent).

  • Linting don't pass on your PR (check out CI results).

  • A user could need to give multiple regexes, e.g. ^http:// and ^ftp://: maybe a list of regexes would be better?

For these reasons (and to save time) I will implement this in another PR.

EDIT: See #251 and #252.

adrienverge added a commit that referenced this pull request Apr 10, 2020
Add ability to:
- require strings to be quoted if they match a pattern (PCRE regex)
- allow quoted strings if they match a pattern, while `require:
  only-when-needed` is enforced.

Co-Authored-By: Leo Feyer (#246)
@ruipinge
Copy link
Contributor

Actually - can start a string, I think it's valid to write key: -val (where "-val" is a string). So, START_TOKENS is a simplification that doesn't work perfectly. Thanks for pointing this out.
@ruipinge any thoughts? To summarize: with quoted-strings: {required: only-when-needed}, yamllint shouldn't report key: -val as a problem, because "-val" is a valid string that doesn't need quotes.

My hunch goes with adding any possible indicator character to the list of START_TOKENS. Actually, I missed not only %, but also | , ', ", and >. All of these serve, in their own way, as control characters, so we would make the yaml files safer by "imposing" any string starting with it be quoted.

@ruipinge
Copy link
Contributor

I know that development is already ongoing, so I'm sorry to be late for the party.

Although, after reading the two threads and noticed that what motivated this PR and #246 was actually the fact that

Symfony requires placeholders such as %kernel.project_dir% or %env(DATABASE_URL)% to be quoted

I wanted to give my two cents. If we came to agree that the % was missing as a START_TOKEN (and actually Symfony parser was doing good), would this feature continue to make sense? Wouldn't it become a feature without a use case?

@adrienverge
Copy link
Owner

@ruipinge thanks for your feedback.

I'm still OK to add % to START_TOKENS, I'd accept a PR for that.

But START_TOKENS has proven to be partially broken: looking at the first character of a string is not enough. For example:

  • ----- doesn't need quotes, but redundant quotes would not be detected,
  • :char doesn't need quotes, but redundant quotes would not be detected,
  • "'foo' == 'bar'" needs quotes, but only-when-needed complains if it has,
  • "hello: world" needs quotes, but only-when-needed complains if it has.

By the way this is already a problem for users: an issue was open at #242. Let's continue this topic there, but we may have to drop START_TOKENS.

I wanted to give my two cents. If we came to agree that the % was missing as a START_TOKEN (and actually Symfony parser was doing good), would this feature continue to make sense? Wouldn't it become a feature without a use case?

Good point. You're probably right today, but since the feature is already developed and the use case may arrive in the future (some users may want some types of strings to be quoted, for visual comfort?), I'd be for adding it. What do you think?
If we do, I want these new options to be future-proof and correctly labeled, that's why your feedback would be very welcome :)

adrienverge added a commit that referenced this pull request Apr 13, 2020
Add ability to:
- require strings to be quoted if they match a pattern (PCRE regex)
- allow quoted strings if they match a pattern, while `require:
  only-when-needed` is enforced.

Co-Authored-By: Leo Feyer (#246)
adrienverge added a commit that referenced this pull request Apr 13, 2020
Add ability to:
- require strings to be quoted if they match a pattern (PCRE regex)
- allow quoted strings if they match a pattern, while `require:
  only-when-needed` is enforced.

Co-Authored-By: Leo Feyer (#246)
adrienverge added a commit that referenced this pull request Apr 13, 2020
Change implementation of `required: only-when-needed`, because
maintaining a list of `START_TOKENS` and just looking at the first
character of string values has proven to be partially broken.

Cf. discussion at
#246 (comment).

Fixes #242.
adrienverge added a commit that referenced this pull request Apr 13, 2020
Change implementation of `required: only-when-needed`, because
maintaining a list of `START_TOKENS` and just looking at the first
character of string values has proven to be partially broken.

Cf. discussion at
#246 (comment).

Fixes #242 and
#244.
adrienverge added a commit that referenced this pull request Apr 13, 2020
Change implementation of `required: only-when-needed`, because
maintaining a list of `START_TOKENS` and just looking at the first
character of string values has proven to be partially broken.

Cf. discussion at
#246 (comment).

Fixes #242 and
#244.
adrienverge added a commit that referenced this pull request Apr 15, 2020
Change implementation of `required: only-when-needed`, because
maintaining a list of `START_TOKENS` and just looking at the first
character of string values has proven to be partially broken.

Cf. discussion at
#246 (comment).

Fixes #242 and
#244.
adrienverge added a commit that referenced this pull request Apr 15, 2020
Add ability to:
- require strings to be quoted if they match a pattern (PCRE regex)
- allow quoted strings if they match a pattern, while `require:
  only-when-needed` is enforced.

Co-Authored-By: Leo Feyer (#246)
adrienverge added a commit that referenced this pull request Apr 17, 2020
Add ability to:
- require strings to be quoted if they match a pattern (PCRE regex)
- allow quoted strings if they match a pattern, while `require:
  only-when-needed` is enforced.

Co-Authored-By: Leo Feyer (#246)
@adrienverge
Copy link
Owner

Closing because #252 and #254 fixed the original problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants