Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anchors: Add new rule to detect undeclared or duplicated anchors #550

Merged
merged 1 commit into from Mar 22, 2023

Conversation

adrienverge
Copy link
Owner

According to the YAML specification 1:

  • It is an error for an alias node to use an anchor that does not
    previously occur in the document.

    The forbid-undeclared-aliases option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default.

  • The alias refers to the most recent preceding node having the same
    anchor.

    This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option forbid-duplicated-anchors allows that. It's disabled by default.

  • It is not an error to specify an anchor that is not used by any
    alias node.

    This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. forbid-unused-anchors) could be implemented in the future. See feat: add rule for unused anchors #537.

Fixes #395
Closes #420

Footnotes

  1. https://yaml.org/spec/1.2.2/#71-alias-nodes

According to the YAML specification [^1]:

- > It is an error for an alias node to use an anchor that does not
  > previously occur in the document.

  The `forbid-undeclared-aliases` option checks that aliases do have a
  matching anchor declared previously in the document. Since this is
  required by the YAML spec, this option is enabled by default.

- > The alias refers to the most recent preceding node having the same
  > anchor.

  This means that having a same anchor repeated in a document is
  allowed. However users could want to avoid this, so the new option
  `forbid-duplicated-anchors` allows that. It's disabled by default.

- > It is not an error to specify an anchor that is not used by any
  > alias node.

  This means that it's OK to declare anchors but don't have any alias
  referencing them. However users could want to avoid this, so a new
  option (e.g. `forbid-unused-anchors`) could be implemented in the
  future. See #537.

Fixes #395
Closes #420

[^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
@coveralls
Copy link

Coverage Status

Coverage: 99.196% (+0.01%) from 99.184% when pulling 9e67d02 on feat/rule-anchors into 15f8204 on master.

Copy link
Contributor

@amimas amimas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had a chance to try it out with actual yaml files. I think it looks good so far though. The unit test currently tests following scenarios:

  • rule disabled
  • rule enabled with option forbid-undeclared-aliases but disabled option forbid-duplicated-anchors
  • rule enabled with option forbid-duplicated-anchors but disabled option forbid-undeclared-aliases

Should there be another test with both options enabled? Just to cover all the scenarios.

Also, like the yaml samples; they cover all the different types of contents for each tests I think.

Comment on lines +24 to +50
self.check('---\n'
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- &f_m {k: v}\n'
'- &f_s [1, 2]\n'
'- *b\n'
'- *i\n'
'- *s\n'
'- *f_m\n'
'- *f_s\n'
'---\n' # redeclare anchors in a new document
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- *b\n'
'- *i\n'
'- *s\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the yaml sample to be hard to read, mostly because of single-quotes and new-line character (\n) on every line. Some samples are are JSON instead of yaml-way. It also makes the self.check(...) a bit hard to read in my opinion, but it's probably a personal preference.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. This style is common to all other rules tests, so I guess it's more a general question. In my case, I got used to this notation and can read it easily.

Comment on lines +108 to +110
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something. Is it a valid yaml? I think *x: 4 is invalid, is it? I tried the yaml-to-json converting tool and also this online yamllint. Both tools are unable to process it because of *x: 4.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid YAML 🙂
Anchor x equals string "b", so *x: 4 means b: 4.

You can try by yourself:

$ echo '{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}' | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)'
{
    "a": 1,
    "b": 4,
    "c": 3,
    "e": 3
}

PS: Don't trust online YAML linters...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... Got it. Thanks for explaining. I have not used anchors that way yet. Learned something new today.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*x: something
is not valid YAML.
It should be
*x : something.
It is required to put a space after the alias.
It's just that libyaml and PyYAML didn't implement this correctly.
The reason is that a colon can be part of an anchor name, but libyaml and PyYAML don't allow this, so they also don't require a space between the alias and the colon.

To be compatible with all YAML processors, add a space.
A linter where the underlying parser does not error on this should probably error or warn on this itself.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perlpunk thanks for clarifying this 👍
I just added the missing spaces + relaxed the colons rule in #558.

Comment on lines +144 to +145
problem6=(24, 7),
problem7=(27, 36))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the referenced problem lines be commented on for consistency like the other problems?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but I don't feel the need for that: I only put comments on problem lines that are tricky to understand/maintain. If you look at other tests files, comments are rare, and placed only if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Copy link
Owner Author

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback @amimas 👍
I would be best if you could try it, because once merged and released, I won't be able spend much more time on it.

Should there be another test with both options enabled? Just to cover all the scenarios.

I asked myself the same question while writing them, but concluded no, otherwise it would create a lot of repetition especially when you add your third option forbid-unused-anchors. And each option should work individually so I think we're OK. What do you think?

Comment on lines +24 to +50
self.check('---\n'
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- &f_m {k: v}\n'
'- &f_s [1, 2]\n'
'- *b\n'
'- *i\n'
'- *s\n'
'- *f_m\n'
'- *f_s\n'
'---\n' # redeclare anchors in a new document
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- *b\n'
'- *i\n'
'- *s\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. This style is common to all other rules tests, so I guess it's more a general question. In my case, I got used to this notation and can read it easily.

Comment on lines +108 to +110
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid YAML 🙂
Anchor x equals string "b", so *x: 4 means b: 4.

You can try by yourself:

$ echo '{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}' | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)'
{
    "a": 1,
    "b": 4,
    "c": 3,
    "e": 3
}

PS: Don't trust online YAML linters...

Comment on lines +144 to +145
problem6=(24, 7),
problem7=(27, 36))
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but I don't feel the need for that: I only put comments on problem lines that are tricky to understand/maintain. If you look at other tests files, comments are rare, and placed only if needed.

@adrienverge
Copy link
Owner Author

I'll merge this now so it can go in the same release as #548 (comment).

@adrienverge adrienverge merged commit ebd6b90 into master Mar 22, 2023
12 checks passed
@adrienverge adrienverge deleted the feat/rule-anchors branch March 22, 2023 09:34
@amimas
Copy link
Contributor

amimas commented Apr 1, 2023

Thanks for merging it. Apologies for the delay here. Have been busy lately and didn't get a chance to look into it further.

adrienverge added a commit that referenced this pull request Apr 4, 2023
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be
noted `{*x : 4}`. The reason is that a colon can be part of an anchor
name. See this comment from Tina Müller for more details:
#550 (comment)

Even if it's not a problem for yamllint, let's fix our tests to include
valid YAML snippets.
adrienverge added a commit that referenced this pull request Apr 4, 2023
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be
noted `{*x : 4}`. The reason is that a colon can be part of an anchor
name. See this comment from Tina Müller for more details:
#550 (comment)

Even if it's not a problem for yamllint, let's fix our tests to include
valid YAML snippets.
adrienverge added a commit that referenced this pull request Apr 4, 2023
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be
noted `{*x : 4}`. The reason is that a colon can be part of an anchor
name. See this comment from Tina Müller for more details:
#550 (comment)

Even if it's not a problem for yamllint, let's fix our tests to include
valid YAML snippets.
adrienverge added a commit that referenced this pull request Apr 4, 2023
In the rare case when the key before `:` is an alias (e.g. `{*x : 4}`),
the space before `:` is required (although this requirement is not
enforced by PyYAML), the reason being that a colon can be part of an
anchor name. Consequently, this commit adapts the `colons` rule to avoid
failures when this happens.

See this comment from Tina Müller for more details:
#550 (comment)
adrienverge added a commit that referenced this pull request Apr 5, 2023
Although accepted by PyYAML, `{*x: 4}` is not valid YAML: it should be
noted `{*x : 4}`. The reason is that a colon can be part of an anchor
name. See this comment from Tina Müller for more details:
#550 (comment)

Even if it's not a problem for yamllint, let's fix our tests to include
valid YAML snippets.
adrienverge added a commit that referenced this pull request Apr 5, 2023
In the rare case when the key before `:` is an alias (e.g. `{*x : 4}`),
the space before `:` is required (although this requirement is not
enforced by PyYAML), the reason being that a colon can be part of an
anchor name. Consequently, this commit adapts the `colons` rule to avoid
failures when this happens.

See this comment from Tina Müller for more details:
#550 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

yamllint allows referencing a non-existant anchor
4 participants