-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add rule for unused anchors #537
feat: add rule for unused anchors #537
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @amimas, thanks for contributing!
Please don't commit any editor-specific conf (remove the first commit) and more generally anything that is not related to this new option. Also squash everything into one single commit, with a meaningful commit message. You need to run linters before pushing code (I saw a lot of style errors), also you can have a look at the CONTRIBUTING file for hints.
Then I'll test and review your pull request 👍
tests/rules/test_anchors.py
Outdated
|
||
from tests.common import RuleTestCase | ||
|
||
NORMAL_ANCHOR = '''--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't use these NORMAL_ANCHOR
, MISS_ANCHOR_POINTER
(I guess you've tried to reproduce the other PR), just inline the YAML code examples inside tests. All yamllint tests do this, the goal is to have tested YAML code close to the list of expected errors. Easier to read, easier to debug!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was following the other PR (#420 ). I agree having the yaml codes close to the expected error will probably be easier to read/debug.
Thanks for taking a peak at this PR. I will make sure to remove that commit next time I work on this. I think this PR should be worked on after #420 . There will be a lot of conflict. I will wait for that to be finished. Hopefully it'll be soon. That's also why I didn't run the linters; I know there's more work remaining.
Are you not able to squash merge when the PR is accepted or ready to merge? Do you prefer a single squashed commit for every push/update to the PR? |
According to the YAML specification [^1]: - > It is an error for an alias node to use an anchor that does not > previously occur in the document. The `forbid-undeclared-aliases` option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default. - > The alias refers to the most recent preceding node having the same > anchor. This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option `forbid-duplicated-anchors` allows that. It's disabled by default. - > It is not an error to specify an anchor that is not used by any > alias node. This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. `forbid-unused-anchors`) could be implemented in the future. See #537. Fixes #395 Closes #420 [^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
#420 was open 2 years ago and I'm not sure it will be finished soon. For this reason, I just implemented the new rule myself (see #550) so you can rebase on it and work on your new option more comfortably. In the test cases I wrote, I added some examples of unused anchors, so you could simply reuse the same YAML snippets for your new tests.
Yes, please 🙏, I prefer one single commit ready to merge, as per the contribution guidelines. |
According to the YAML specification [^1]: - > It is an error for an alias node to use an anchor that does not > previously occur in the document. The `forbid-undeclared-aliases` option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default. - > The alias refers to the most recent preceding node having the same > anchor. This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option `forbid-duplicated-anchors` allows that. It's disabled by default. - > It is not an error to specify an anchor that is not used by any > alias node. This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. `forbid-unused-anchors`) could be implemented in the future. See #537. Fixes #395 Closes #420 [^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
Hi @adrienverge - Got a little time and thought I'd take a look at this PR again, now that you've implemented couple of other options for the I see the latest code is using Do you have any suggestions/reference on how to implement |
Hi @amimas, I'm glad you liked #550! |
Thanks @adrienverge for the suggestion. This is what you suggested in #395 (comment) for the implementation:
I think that means reporting "unused anchors" doesn't need to refer to the exact line where the anchor is defined. If that's the case, then I believe the implementation is probably simple as per my earlier comment. But from usability/UX perspective, it'd be helpful if the linter could refer to the correct line. I'll look into it a bit more and see if correct lines of the anchors can be easily captured. Otherwise, that might be the next feature update. |
df299e3
to
721f4bd
Compare
@adrienverge - When you get a chance, please review. I managed to implement it so that unused anchor issues can report the correct line. I believe that will be helpful for end users. I didn't need a big refactor of existing implementation; just a minor change that doesn't use I didn't squash the commits yet in case more change is needed after your review. I try to keep my commits as organized/atomic as possible. You can do squash & merge if everything looks okay or I can push a final squashed commits if you insist. btw - should all the options in the rule be enabled in strict mode? Not sure yet where that is configured. |
721f4bd
to
32b02cc
Compare
Hi @adrienverge 👋 . what do you think about the latest update? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @amimas,
I didn't squash the commits yet
I have limited time and reviewing takes much, so I do insist, please use a ready-to-merge, unified commit message on every push, with a good description that is helpful for future git show
usage (what the rule does and why, an example of error report, etc.) Check out git log
on this project to see what is expected.
I managed to implement it so that unused anchor issues can report the correct line. I believe that will be helpful for end users.
This is smart!
But, I suspect it could cause problems to some users because the reported problems can now be interleaved / unordered, which has never been the case with yamllint. For example:
---
- &a unused
- k: v
will report errors in the wrong order:
3:5: too many spaces after hyphen (hyphens)
3:12: too many spaces after colon (colons)
3:14: trailing spaces (trailing-spaces)
2:3: found unused anchor a (anchors)
I propose to keep your implementation (which is cool), but be ready to change it if there are legitimate complaints (the other implementation being to print the line+column in the message string instead).
What do you think?
just a minor change that doesn't use
set
.
Using an array (list
) is particularly inefficient and could cause performance issues: if you can't use a set
, please use a mapping (dict
): context['anchors'] = {}
with the anchor name as the key.
It will be much faster to find/access an element within it.
Also, it will reduce the diff of this PR because you can keep using if token.value in context['anchors']
.
btw - should all the options in the rule be enabled in strict mode? Not sure yet where that is configured.
I'm not sure what you mean: are you refering to strict mode (returning a non-zero exit code upon warnings) or the default
set of rules? The latter is configured in the DEFAULT
variable. In my opinion (see #550 (comment)): yes for forbid-undeclared-aliases
, no for the others.
About tests cases: in #550 I tried to create one unified and reusable YAML snipped that would contain all types of errors, so it could be used for forbid-undeclared-aliases
, forbid-duplicated-anchors
and forbid-unused-anchors
. I see that you modified it, so the last one is different from the 2 first ones: removed parts, addition of b1
, i1
, etc. that are not anchors nor aliases... Unless there's a good reason, let's keep a unified YAML snippet. For info, this initial snippet would report:
2:3: found unused anchor i (anchors)
7:3: found unused anchor s (anchors)
14:3: found unused anchor f_s (anchors)
16:16: found unused anchor b_m (anchors)
19:18: found unused anchor b_m_bis (anchors)
Also it looks like the added tests don't reflect the recent commit e90e0a0, please check that.
yamllint/rules/anchors.py
Outdated
@@ -24,6 +24,8 @@ | |||
later in the document). | |||
* Set ``forbid-duplicated-anchors`` to ``true`` to avoid duplications of a same | |||
anchor. | |||
* Set ``forbid-unused-anchors`` to ``true`` to avoid anchors being declared but | |||
not used anywhere via alias. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest not used anywhere
→ not used anywhere in the YAML document
, so it's clear to users that:
---
key: &anchor "value"
---
use it in a new document: *anchor
would report an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated the document as suggested.
yamllint/rules/anchors.py
Outdated
"line": token.start_mark.line + 1, | ||
"column": token.start_mark.column + 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with other LintProblem
s and to avoid confusion/bugs, I would prefer keeping line → line
and column → column
, and add the +1
when using LintProblem()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Updated to follow the project convention.
yamllint/rules/anchors.py
Outdated
"value": token.value, | ||
"line": token.start_mark.line + 1, | ||
"column": token.start_mark.column + 1, | ||
"aliasCount": 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Python, snake_case is preferred 🙂
"aliasCount": 0 | |
"alias_count": 0, |
Since we don't need to know the exact number of usage, what about something simpler and more performant?
"aliasCount": 0 | |
"used": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Updated as suggested.
yamllint/rules/anchors.py
Outdated
# is an alias, increment the count of alias for associated anchor. | ||
# This can be used to check when end of document is reached and | ||
# evaluate if an anchor is unused. | ||
elif (isinstance(token, yaml.AliasToken)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove extra parentheses here, not needed in Python if the elif
fits one single line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Updated the code.
yamllint/rules/anchors.py
Outdated
if (isinstance(next, (yaml.StreamEndToken, | ||
yaml.DocumentStartToken, | ||
yaml.DocumentEndToken)) and | ||
len(context['anchors']) > 0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need this line, the for
below will have the same effect if the list is empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... good catch! Thanks for the suggestion. Updated the code.
yamllint/rules/anchors.py
Outdated
for anchor in context['anchors']: | ||
if anchor['value'] == token.value: | ||
anchor['aliasCount'] += 1 | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my proposal to use a dict
, you can simply:
for anchor in context['anchors']: | |
if anchor['value'] == token.value: | |
anchor['aliasCount'] += 1 | |
break | |
context['anchors'].get(token.value, {})['used'] = True |
(and remove the whole comment above, because this code is self-explaining)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Using dict
wasn't as difficult as I had initially thought. Updated the code.
32b02cc
to
c6800b6
Compare
@adrienverge - Thanks a lot for taking the time to review and all the feedbacks and suggestions. I believe I've addressed all the concerns and adopted your suggestions. When you get a chance, please review one more time.
Sure. I've squashed all my commits and followed the style of previous commit message used for implementing other options of the anchors rule. Hope it is following your expectation now.
Sure. Even in that case the error message for
Thank you for the suggestion and some examples. I managed to get it working with
I was referring to the "strict mode", but I mistakenly thought that mode enables every option. I thought it's an additional "configuration" similar to the default and
I hadn't realized that. Thanks for setting up the ground work. I've updated the test case for the new option based on your setup.
This was a newer change that was merged after I submitted the last change. I've pulled in the latest change now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed PR feedbacks and suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @amimas, thanks for this update! The commit message is great and the code looks good. I still have comments (mostly style and optimizations, nothing blocking) so the next one should be the merge one.
It might look odd to have line and column numbers reported in a different format than the rest. That might not be good reporting.
Yes, I agree. My main fear is that some scripts somewhere on the internet rely on yamllint's output ordering, and would break with this change. We'll see!
yamllint/rules/anchors.py
Outdated
yaml.StreamStartToken, | ||
yaml.DocumentStartToken, | ||
yaml.DocumentEndToken)): | ||
context['anchors'] = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's more usual to use the simple form {}
rather than dict()
(both in this project and in the Python world). Can you update this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Updated.
yamllint/rules/anchors.py
Outdated
@@ -113,6 +143,36 @@ def check(conf, token, prev, next, nextnext, context): | |||
token.start_mark.line + 1, token.start_mark.column + 1, | |||
f'found duplicated anchor "{token.value}"') | |||
|
|||
if conf['forbid-undeclared-aliases'] or conf['forbid-duplicated-anchors']: | |||
if (conf['forbid-unused-anchors']): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too, you can remove parentheses (and same for line 155).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... Sorry I missed it in the last update. They're fixed now in latest commit.
yamllint/rules/anchors.py
Outdated
if (isinstance(next, (yaml.StreamEndToken, | ||
yaml.DocumentStartToken, | ||
yaml.DocumentEndToken))): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For style consistency, I propose to drop this empty line.
yamllint/rules/anchors.py
Outdated
for anchor in context['anchors']: | ||
if context['anchors'][anchor]['used'] is False: | ||
yield LintProblem( | ||
context['anchors'][anchor]['line'] + 1, | ||
context['anchors'][anchor]['column'] + 1, | ||
f"found unused anchor {anchor}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than iterating on keys, you could iterate on keys + values to simplify the code:
for anchor in context['anchors']: | |
if context['anchors'][anchor]['used'] is False: | |
yield LintProblem( | |
context['anchors'][anchor]['line'] + 1, | |
context['anchors'][anchor]['column'] + 1, | |
f"found unused anchor {anchor}" | |
) | |
for anchor, info in context['anchors'].items(): | |
if not info['used']: | |
yield LintProblem(info['line'] + 1, | |
info['column'] + 1, | |
f"found unused anchor {anchor}") |
(and change is False
to not
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh... I was trying to do something like this initially but wasn't sure how. Still learning Python 😊 . Thanks for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙂
yamllint/rules/anchors.py
Outdated
anchor_details = { | ||
"line": token.start_mark.line, | ||
"column": token.start_mark.column, | ||
"used": False | ||
} | ||
context['anchors'][token.value] = anchor_details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anchor_details = { | |
"line": token.start_mark.line, | |
"column": token.start_mark.column, | |
"used": False | |
} | |
context['anchors'][token.value] = anchor_details | |
context['anchors'][token.value] = { | |
"line": token.start_mark.line, | |
"column": token.start_mark.column, | |
"used": False, | |
} |
According to the YAML specification [^1]: - > An anchored node need not be referenced by any alias nodes This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. `forbid-unused-anchors`) is implemented in this change. It is disabled by default. [^1]: https://yaml.org/spec/1.2.2/#692-node-anchors
c6800b6
to
4095b8e
Compare
I see what you mean. Let's see! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great. Thanks for contributing to yamllint @amimas!
Ah, I noticed (too late) that the error message missed quotes around the anchor name (other messages do that):
I just opened #570 to fix this. |
Sorry I missed it. Thanks for taking care of it. |
@adrienverge - Just checking if this feature is available for release. Please let me know if I missed. Thanks. |
@amimas this is now released in version 1.32.0 👍 |
Thanks for the update and release. |
This change implements one of the options discussed in #395 . A new option for
anchors
rule namedforbid-unused-anchors
. If the option is enabled, yamllint will report about anchors that has been declared but is not used. The option is disabled by default as yaml specs doesn't require anchors to be used.Rest of the options for this rule is being implemented by #420.