Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for missing or duplicated anchors #420

Closed

Conversation

newrushbolt
Copy link

Basic anchor\aliases checks, also fix for #395.

@coveralls
Copy link

coveralls commented Nov 19, 2021

Coverage Status

Coverage: 99.199% (+0.01%) from 99.184% when pulling 542e05f on newrushbolt:395-detect-missing-anchors into b9e1fd1 on adrienverge:master.

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @newrushbolt, thanks for contributing!

The change is great and I'm favorable to merging it, but before I have a few remarks and concerns.

  1. According to the YAML specification:

    • It is an error for an alias node to use an anchor that does not previously occur in the document -- https://yaml.org/spec/1.2/spec.html#id2785586

      Your commit implements that 👍
      But it waits for the end of the document to report it, whereas it's not needed, and can output errors in a wrong order. Since anchors must be declared before aliases, you should report the problem as soon as the unknown alias is found.

    • The alias refers to the most recent preceding node having the same anchor → it's allowed to override anchor -- https://yaml.org/spec/1.2-old/spec.html#id2786448

      Your commit implements and documents "Duplicate anchors are not allowed", which disagrees with the spec.

  2. To solve the previous point, and also to prepare the future (if we want to allow the rule to be extended later), I propose to make the rule configurable:

    rules:
      anchors:
        forbid-unknown-aliases: true  # true by default, because YAML spec fordids it
        forbid-duplicated-anchors: false  # false by default, because YAML spec allows it

    The Python file would look like:

    CONF = {'forbid-unknown-aliases': bool,
            'forbid-duplicated-anchors': bool}
    DEFAULT = {'forbid-unknown-aliases': True,
               'forbid-duplicated-anchors': False}
    
    if conf['forbid-unknown-aliases']:
  3. I'm not sure whether this new rule (at least forbid-unknown-aliases) should be enabled in the default set of rules: it might break things for many users. On the other hand, broken anchors would mean broken YAML and shouldn't exist...

    An option could be to enable it, release a new yamllint version, and if this causes problems for users, you would quickly create a pull request to disable it by default.

    What do you think?

tests/rules/test_anchors.py Outdated Show resolved Hide resolved
tests/rules/test_anchors.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
@newrushbolt
Copy link
Author

Hello @adrienverge, thanks for the review. Sorry for the delay.
Read all the remarks, and realized that this PR had an infinite number of ways to be improved 🙈.

First of all, I decided to split this into two separate rules.
I did some rework, feel free to continue review.

  1. Absolutely
  2. Sure
  3. As well as unknown-aliases is correlates with YAML spec I think it should be enabled by default, and we should add a couple of words in the release note about it.
    And I don't think we should revert this behavior even if the sh#t hits the fan after release.
    Talking about anchor-duplicates, it should probably stay disabled.

Of course l'll squash all the commits after final approve.

@newrushbolt newrushbolt force-pushed the 395-detect-missing-anchors branch 3 times, most recently from cad43ac to 3ad948c Compare January 7, 2022 15:09
yamllint/rules/unknown_aliases.py Outdated Show resolved Hide resolved
yamllint/rules/anchors.py Outdated Show resolved Hide resolved
Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Sergei,

Thanks for the good work!

It looks better but I see a problem now: you split the feature into 2 rules, whereas:

  • they're both related to anchors,
  • it feels strange to have a rule anchor-duplicates containing an option
    with the same name anchor-duplicates. Same for unknown-aliases: {unknown-aliases: true}.

I insist on the following conf, inside a united anchors rule:

CONF = {'forbid-unknown-aliases': bool,
        'forbid-duplicated-anchors': bool}
DEFAULT = {'forbid-unknown-aliases': True,
           'forbid-duplicated-anchors': False}

if conf['forbid-unknown-aliases'] and AliasToken and token.value not in context['anchors']:
    # anchor "{token.value}" is used before assignment
if conf['forbid-duplicated-anchors'] and AnchorToken and token.value in context['anchors']:
    # duplicated anchor "{token.value}"

which is more in the spirit of yamllint. Moreover it would allow extending the rule with more options later.

You can see rules empty-values and octal-values for comparison.

Other than that, please address all my other remarks above and below, so we can merge this 👍

@newrushbolt
Copy link
Author

@adrienverge is everything ok now?
If so can you merge this request?

Copy link
Contributor

@amimas amimas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this PR. Looking forward seeing this merged and released.

In #395, a third option for this rule was also discussed (forbid-unused-anchors). It's similar to the duplicated anchors option. Can that option be added into this PR. I feel that will make this whole.

Thanks again.

if token.value not in context['anchors']:
yield LintProblem(
token.start_mark.line + 1, token.start_mark.column + 1,
f'anchor "{token.value}" is used before assignment')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this message also indicate a potentially reference to a non-existent anchor?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yamllint parses files sequencially, so it does not know whether the anchor is defined later in the file or is totally missing from the document.

(Maybe I didn't get your point, in this case please give example of what you mean :))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message as written right now specifically says something like this:

anchor "foo-bar" is used before assignment

But, as you said already

it does not know whether the anchor is defined later in the file or is totally missing from the document.

That's why I was suggesting that the error message probably could be less specific because it's possible that the anchor hasn't been defined in the entire file. It could be something like:

anchor "foo-bar" is used before assignment/defined or does not exist

Hope that makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrienverge - Trying to learn the code, as I'm new to this. Would like to contribute if I can.

Yamllint parses files sequencially, so it does not know whether the anchor is defined later in the file or is totally missing from the document.

Right now problems are reported from each line as tokens are parsed. Is it possible to save them in a global variable instead of reporting right away? Then if the "end of document token" is reached, report the problems? I didn't see any of the existing rule doing it. Also, can errors from multiple locations be reported? In this example: if an anchor is duplicated 3 times, report the 2nd and 3rd occurrences as problem. Not sure if this is valid:

def check(conf, token, prev, next, nextnext, context):
    # Skipping codes here to keep this snippet short
    yield LintProblem(....)
    yield LintProblem(....)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope that makes sense.

Yes it does, thanks, I see what you mean now 🙏
What I like in the sentence anchor "foo-bar" is used before assignment is that it's simpler (it counts) and isn't incompatible with "this anchor will actually never be assigned later in the file".
I see this sentence as a good trade-off between simplicity and understandability (compared to anchor "foo-bar" is used before assignment/defined or does not exist)

Right now problems are reported from each line as tokens are parsed. Is it possible to save them in a global variable instead of reporting right away? Then if the "end of document token" is reached, report the problems? I didn't see any of the existing rule doing it.

That looks like a solution, but yamllint is designed to output errors right away, which allows:
- having them sorted naturally,
- being performant and avoid consuming too much RAM,
- not losing past errors if the script crashes at some point.
Changing this would need a major refactor, and I'm not sure this potentially adaptative error message (... used before assignment / ... never assigned) is worth it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're saying and it makes sense. I was just asking because I wanted to look at the forbid-unused-anchors option. Without checking the entire file, not sure how to verify that an anchor isn't being used. Let me know if you have any suggestions.

Maybe we should discuss it in the feature request issue.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@newrushbolt not everything seems to be OK, it looks like the Python linter reports a dozen errors. Can you check it? You can run it locally on your machine with python -m flake8 ..
Apart from that, I haven't re-reviewed the PR thoroughly yet since last year, but if all my remarks were handled it should be good to go.

EDIT: I see you did not apply my request for token.value in context['anchors'], replacing it with context['anchors'].count(token.value), which creates a bug if there are ≥ 3 anchors. Maybe there are other problems. Please check all my remarks and follow contributing guidelines so it doesn't waste your time and mine too much. Thanks!

In #395, a third option for this rule was also discussed (forbid-unused-anchors). It's similar to the duplicated anchors option. Can that option be added into this PR. I feel that will make this whole.

@amimas this development has been open for more than a year, I feel it would be better to just finish and merde it. Then, you can implement this new forbid-unused-anchors option if you'd like too.

Comment on lines +32 to +47
clients:
jack: &jack_client
billing_id: 1234
bill: &bill_client
billing_id: 5678
target_client: *jack_client

the following code snippet would **FAIL**:
::

clients:
jack: &jack_client
billing_id: 1234
bill: &jack_client
billing_id: 5678
target_client: *jack_client
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All other examples use 2-space indentation: let's be consistent!

if token.value not in context['anchors']:
yield LintProblem(
token.start_mark.line + 1, token.start_mark.column + 1,
f'anchor "{token.value}" is used before assignment')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yamllint parses files sequencially, so it does not know whether the anchor is defined later in the file or is totally missing from the document.

(Maybe I didn't get your point, in this case please give example of what you mean :))

Comment on lines +29 to +33
DUPLICATED_ANCHOR = '''---
key1: &keyanchor a
key2: &keyanchor b
otherkey: *keyanchor
'''
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a test for anchors duplicated 3 times. Latest version of the PR (commit 542e05f) seems to have this bug!

@amimas
Copy link
Contributor

amimas commented Feb 9, 2023

@newrushbolt - are you able to look into the remaining feedbacks?

@adrienverge
Copy link
Owner

@newrushbolt we need feedback from you, because @amimas is implementing a new option for your new rule in #537.

I know that yamllint requires high-quality contributions, so it's OK if you don't want to spend time on it. Someone else (maybe me) can implement this new rule in a new pull request. Please tell us what you prefer.

adrienverge added a commit that referenced this pull request Mar 19, 2023
According to the YAML specification [^1]:

- > It is an error for an alias node to use an anchor that does not
  > previously occur in the document.

  The `forbid-undeclared-aliases` option checks that aliases do have a
  matching anchor declared previously in the document. Since this is
  required by the YAML spec, this option is enabled by default.

- > The alias refers to the most recent preceding node having the same
  > anchor.

  This means that having a same anchor repeated in a document is
  allowed. However users could want to avoid this, so the new option
  `forbid-duplicated-anchors` allows that. It's disabled by default.

- > It is not an error to specify an anchor that is not used by any
  > alias node.

  This means that it's OK to declare anchors but don't have any alias
  referencing them. However users could want to avoid this, so a new
  option (e.g. `forbid-unused-anchors`) could be implemented in the
  future. See #537.

Fixes #395
Closes #420

[^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
@adrienverge
Copy link
Owner

Due to lack of activity here, and the need of @amimas in #537, I've implemented this new rule in #550.

The code is simpler, should be more performant (use of set() and save anchors only when needed), the implementation more robust (it handles anchors duplicated 2 or 3+ times), and the doc now includes the default values. I've also renamed forbid-unknown-aliases to forbid-undeclared-aliases.

@newrushbolt @amimas could you test it and report whether it works for you?

adrienverge added a commit that referenced this pull request Mar 22, 2023
According to the YAML specification [^1]:

- > It is an error for an alias node to use an anchor that does not
  > previously occur in the document.

  The `forbid-undeclared-aliases` option checks that aliases do have a
  matching anchor declared previously in the document. Since this is
  required by the YAML spec, this option is enabled by default.

- > The alias refers to the most recent preceding node having the same
  > anchor.

  This means that having a same anchor repeated in a document is
  allowed. However users could want to avoid this, so the new option
  `forbid-duplicated-anchors` allows that. It's disabled by default.

- > It is not an error to specify an anchor that is not used by any
  > alias node.

  This means that it's OK to declare anchors but don't have any alias
  referencing them. However users could want to avoid this, so a new
  option (e.g. `forbid-unused-anchors`) could be implemented in the
  future. See #537.

Fixes #395
Closes #420

[^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants