Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

anchors: Add new rule to detect undeclared or duplicated anchors #550

Merged
merged 1 commit into from Mar 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/rules.rst
Expand Up @@ -14,6 +14,11 @@ This page describes the rules and their options.
:local:
:depth: 1

anchors
-------

.. automodule:: yamllint.rules.anchors

braces
------

Expand Down
209 changes: 209 additions & 0 deletions tests/rules/test_anchors.py
@@ -0,0 +1,209 @@
# Copyright (C) 2023 Adrien Vergé
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

from tests.common import RuleTestCase


class AnchorsTestCase(RuleTestCase):
rule_id = 'anchors'

def test_disabled(self):
conf = 'anchors: disable'
self.check('---\n'
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- &f_m {k: v}\n'
'- &f_s [1, 2]\n'
'- *b\n'
'- *i\n'
'- *s\n'
'- *f_m\n'
'- *f_s\n'
'---\n' # redeclare anchors in a new document
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- *b\n'
'- *i\n'
'- *s\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
Comment on lines +24 to +50
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the yaml sample to be hard to read, mostly because of single-quotes and new-line character (\n) on every line. Some samples are are JSON instead of yaml-way. It also makes the self.check(...) a bit hard to read in my opinion, but it's probably a personal preference.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand. This style is common to all other rules tests, so I guess it's more a general question. In my case, I got used to this notation and can read it easily.

self.check('---\n'
'- &i 42\n'
'---\n'
'- &b true\n'
'- &b true\n'
'- &b true\n'
'- &s hello\n'
'- *b\n'
'- *i\n' # declared in a previous document
'- *f_m\n' # never declared
'- *f_m\n'
'- *f_m\n'
'- *f_s\n' # declared after
'- &f_s [1, 2]\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'---\n'
'block mapping 1: &b_m_bis\n'
' key: value\n'
'block mapping 2: &b_m_bis\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &x 3, *x: 4, e: *y}\n'
'...\n', conf)

def test_forbid_undeclared_aliases(self):
conf = ('anchors:\n'
' forbid-undeclared-aliases: true\n'
' forbid-duplicated-anchors: false\n')
self.check('---\n'
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- &f_m {k: v}\n'
'- &f_s [1, 2]\n'
'- *b\n'
'- *i\n'
'- *s\n'
'- *f_m\n'
'- *f_s\n'
'---\n' # redeclare anchors in a new document
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- *b\n'
'- *i\n'
'- *s\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
Comment on lines +108 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something. Is it a valid yaml? I think *x: 4 is invalid, is it? I tried the yaml-to-json converting tool and also this online yamllint. Both tools are unable to process it because of *x: 4.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid YAML 🙂
Anchor x equals string "b", so *x: 4 means b: 4.

You can try by yourself:

$ echo '{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}' | python -c 'import sys, yaml, json; json.dump(yaml.safe_load(sys.stdin), sys.stdout, indent=4)'
{
    "a": 1,
    "b": 4,
    "c": 3,
    "e": 3
}

PS: Don't trust online YAML linters...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... Got it. Thanks for explaining. I have not used anchors that way yet. Learned something new today.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*x: something
is not valid YAML.
It should be
*x : something.
It is required to put a space after the alias.
It's just that libyaml and PyYAML didn't implement this correctly.
The reason is that a colon can be part of an anchor name, but libyaml and PyYAML don't allow this, so they also don't require a space between the alias and the colon.

To be compatible with all YAML processors, add a space.
A linter where the underlying parser does not error on this should probably error or warn on this itself.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perlpunk thanks for clarifying this 👍
I just added the missing spaces + relaxed the colons rule in #558.

self.check('---\n'
'- &i 42\n'
'---\n'
'- &b true\n'
'- &b true\n'
'- &b true\n'
'- &s hello\n'
'- *b\n'
'- *i\n' # declared in a previous document
'- *f_m\n' # never declared
'- *f_m\n'
'- *f_m\n'
'- *f_s\n' # declared after
'- &f_s [1, 2]\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'---\n'
'block mapping 1: &b_m_bis\n'
' key: value\n'
'block mapping 2: &b_m_bis\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &x 3, *x: 4, e: *y}\n'
'...\n', conf,
problem1=(9, 3),
problem2=(10, 3),
problem3=(11, 3),
problem4=(12, 3),
problem5=(13, 3),
problem6=(24, 7),
problem7=(27, 36))
Comment on lines +144 to +145
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the referenced problem lines be commented on for consistency like the other problems?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but I don't feel the need for that: I only put comments on problem lines that are tricky to understand/maintain. If you look at other tests files, comments are rare, and placed only if needed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.


def test_forbid_duplicated_anchors(self):
conf = ('anchors:\n'
' forbid-undeclared-aliases: false\n'
' forbid-duplicated-anchors: true\n')
self.check('---\n'
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- &f_m {k: v}\n'
'- &f_s [1, 2]\n'
'- *b\n'
'- *i\n'
'- *s\n'
'- *f_m\n'
'- *f_s\n'
'---\n' # redeclare anchors in a new document
'- &b true\n'
'- &i 42\n'
'- &s hello\n'
'- *b\n'
'- *i\n'
'- *s\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &y 3, *x: 4, e: *y}\n'
'...\n', conf)
self.check('---\n'
'- &i 42\n'
'---\n'
'- &b true\n'
'- &b true\n'
'- &b true\n'
'- &s hello\n'
'- *b\n'
'- *i\n' # declared in a previous document
'- *f_m\n' # never declared
'- *f_m\n'
'- *f_m\n'
'- *f_s\n' # declared after
'- &f_s [1, 2]\n'
'---\n'
'block mapping: &b_m\n'
' key: value\n'
'---\n'
'block mapping 1: &b_m_bis\n'
' key: value\n'
'block mapping 2: &b_m_bis\n'
' key: value\n'
'extended:\n'
' <<: *b_m\n'
' foo: bar\n'
'---\n'
'{a: 1, &x b: 2, c: &x 3, *x: 4, e: *y}\n'
'...\n', conf,
problem1=(5, 3),
problem2=(6, 3),
problem3=(21, 18),
problem4=(27, 20))
1 change: 1 addition & 0 deletions yamllint/conf/default.yaml
Expand Up @@ -6,6 +6,7 @@ yaml-files:
- '.yamllint'

rules:
anchors: enable
braces: enable
brackets: enable
colons: enable
Expand Down
2 changes: 2 additions & 0 deletions yamllint/rules/__init__.py
Expand Up @@ -14,6 +14,7 @@
# along with this program. If not, see <http://www.gnu.org/licenses/>.

from yamllint.rules import (
anchors,
braces,
brackets,
colons,
Expand All @@ -39,6 +40,7 @@
)

_RULES = {
anchors.ID: anchors,
braces.ID: braces,
brackets.ID: brackets,
colons.ID: colons,
Expand Down
118 changes: 118 additions & 0 deletions yamllint/rules/anchors.py
@@ -0,0 +1,118 @@
# Copyright (C) 2023 Adrien Vergé
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

"""
Use this rule to report duplicated anchors and aliases referencing undeclared
anchors.

.. rubric:: Options

* Set ``forbid-undeclared-aliases`` to ``true`` to avoid aliases that reference
an anchor that hasn't been declared (either not declared at all, or declared
later in the document).
* Set ``forbid-duplicated-anchors`` to ``true`` to avoid duplications of a same
anchor.

.. rubric:: Default values (when enabled)

.. code-block:: yaml

rules:
anchors:
forbid-undeclared-aliases: true
forbid-duplicated-anchors: false

.. rubric:: Examples

#. With ``anchors: {forbid-undeclared-aliases: true}``

the following code snippet would **PASS**:
::

---
- &anchor
foo: bar
- *anchor

the following code snippet would **FAIL**:
::

---
- &anchor
foo: bar
- *unknown

the following code snippet would **FAIL**:
::

---
- &anchor
foo: bar
- <<: *unknown
extra: value

#. With ``anchors: {forbid-duplicated-anchors: true}``

the following code snippet would **PASS**:
::

---
- &anchor1 Foo Bar
- &anchor2 [item 1, item 2]

the following code snippet would **FAIL**:
::

---
- &anchor Foo Bar
- &anchor [item 1, item 2]
"""


import yaml

from yamllint.linter import LintProblem


ID = 'anchors'
TYPE = 'token'
CONF = {'forbid-undeclared-aliases': bool,
'forbid-duplicated-anchors': bool}
DEFAULT = {'forbid-undeclared-aliases': True,
'forbid-duplicated-anchors': False}


def check(conf, token, prev, next, nextnext, context):
if conf['forbid-undeclared-aliases'] or conf['forbid-duplicated-anchors']:
if isinstance(token, (yaml.StreamStartToken, yaml.DocumentStartToken)):
context['anchors'] = set()

if (conf['forbid-undeclared-aliases'] and
isinstance(token, yaml.AliasToken) and
token.value not in context['anchors']):
yield LintProblem(
token.start_mark.line + 1, token.start_mark.column + 1,
f'found undeclared alias "{token.value}"')

if (conf['forbid-duplicated-anchors'] and
isinstance(token, yaml.AnchorToken) and
token.value in context['anchors']):
yield LintProblem(
token.start_mark.line + 1, token.start_mark.column + 1,
f'found duplicated anchor "{token.value}"')

if conf['forbid-undeclared-aliases'] or conf['forbid-duplicated-anchors']:
if isinstance(token, yaml.AnchorToken):
context['anchors'].add(token.value)