Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add rule for unused anchors #537

Merged
merged 1 commit into from
May 10, 2023

Conversation

amimas
Copy link
Contributor

@amimas amimas commented Feb 12, 2023

This change implements one of the options discussed in #395 . A new option for anchors rule named forbid-unused-anchors. If the option is enabled, yamllint will report about anchors that has been declared but is not used. The option is disabled by default as yaml specs doesn't require anchors to be used.

Rest of the options for this rule is being implemented by #420.

@coveralls
Copy link

coveralls commented Feb 25, 2023

Coverage Status

Coverage: 99.386% (+0.002%) from 99.384% when pulling 4095b8e on amimas:issue-395-unused-anchors into 98f2281 on adrienverge:master.

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @amimas, thanks for contributing!

Please don't commit any editor-specific conf (remove the first commit) and more generally anything that is not related to this new option. Also squash everything into one single commit, with a meaningful commit message. You need to run linters before pushing code (I saw a lot of style errors), also you can have a look at the CONTRIBUTING file for hints.

Then I'll test and review your pull request 👍


from tests.common import RuleTestCase

NORMAL_ANCHOR = '''---
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use these NORMAL_ANCHOR, MISS_ANCHOR_POINTER (I guess you've tried to reproduce the other PR), just inline the YAML code examples inside tests. All yamllint tests do this, the goal is to have tested YAML code close to the list of expected errors. Easier to read, easier to debug!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was following the other PR (#420 ). I agree having the yaml codes close to the expected error will probably be easier to read/debug.

@amimas
Copy link
Contributor Author

amimas commented Mar 19, 2023

Please don't commit any editor-specific conf (remove the first commit) and more generally anything that is not related to this new option.

You need to run linters before pushing code (I saw a lot of style errors), also you can have a look at the CONTRIBUTING file for hints.

Thanks for taking a peak at this PR. I will make sure to remove that commit next time I work on this. I think this PR should be worked on after #420 . There will be a lot of conflict. I will wait for that to be finished. Hopefully it'll be soon. That's also why I didn't run the linters; I know there's more work remaining.

Also squash everything into one single commit, with a meaningful commit message.

Are you not able to squash merge when the PR is accepted or ready to merge? Do you prefer a single squashed commit for every push/update to the PR?

adrienverge added a commit that referenced this pull request Mar 19, 2023
According to the YAML specification [^1]:

- > It is an error for an alias node to use an anchor that does not
  > previously occur in the document.

  The `forbid-undeclared-aliases` option checks that aliases do have a
  matching anchor declared previously in the document. Since this is
  required by the YAML spec, this option is enabled by default.

- > The alias refers to the most recent preceding node having the same
  > anchor.

  This means that having a same anchor repeated in a document is
  allowed. However users could want to avoid this, so the new option
  `forbid-duplicated-anchors` allows that. It's disabled by default.

- > It is not an error to specify an anchor that is not used by any
  > alias node.

  This means that it's OK to declare anchors but don't have any alias
  referencing them. However users could want to avoid this, so a new
  option (e.g. `forbid-unused-anchors`) could be implemented in the
  future. See #537.

Fixes #395
Closes #420

[^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
@adrienverge
Copy link
Owner

I think this PR should be worked on after #420 . There will be a lot of conflict. I will wait for that to be finished. Hopefully it'll be soon.

#420 was open 2 years ago and I'm not sure it will be finished soon. For this reason, I just implemented the new rule myself (see #550) so you can rebase on it and work on your new option more comfortably.

In the test cases I wrote, I added some examples of unused anchors, so you could simply reuse the same YAML snippets for your new tests.

Are you not able to squash merge when the PR is accepted or ready to merge? Do you prefer a single squashed commit for every push/update to the PR?

Yes, please 🙏, I prefer one single commit ready to merge, as per the contribution guidelines.

adrienverge added a commit that referenced this pull request Mar 22, 2023
According to the YAML specification [^1]:

- > It is an error for an alias node to use an anchor that does not
  > previously occur in the document.

  The `forbid-undeclared-aliases` option checks that aliases do have a
  matching anchor declared previously in the document. Since this is
  required by the YAML spec, this option is enabled by default.

- > The alias refers to the most recent preceding node having the same
  > anchor.

  This means that having a same anchor repeated in a document is
  allowed. However users could want to avoid this, so the new option
  `forbid-duplicated-anchors` allows that. It's disabled by default.

- > It is not an error to specify an anchor that is not used by any
  > alias node.

  This means that it's OK to declare anchors but don't have any alias
  referencing them. However users could want to avoid this, so a new
  option (e.g. `forbid-unused-anchors`) could be implemented in the
  future. See #537.

Fixes #395
Closes #420

[^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
@amimas
Copy link
Contributor Author

amimas commented Apr 3, 2023

Hi @adrienverge - Got a little time and thought I'd take a look at this PR again, now that you've implemented couple of other options for the anchors rule via #550 . I could use some advice as I'm new to python.

I see the latest code is using set for detecting forbid-undeclared-aliases and forbid-duplicated-anchors. As you may recall, forbid-unused-anchors requires going through entire document and only then we can evaluate whether any of the anchors are unused or not. I thought about creating a separate set for all the aliases. That way the difference between anchors and aliases sets will give us the anchors that are "unused". The problem is, how to report those anchors as problem with their line and column numbers. I believe I can't use a dict/list in a set (i.e. can't store additional info about anchors in the set). And if I go with other data types, then existing options will need to be refactored.

Do you have any suggestions/reference on how to implement forbid-unused-anchors option?

@adrienverge
Copy link
Owner

Hi @amimas, I'm glad you liked #550!
Sorry I don't have much time myself so I can't help you with Python, but you can inspire from other source files (especially rules/*.py) to create a near-perfect and ready-to-merge pull request. About the implementation, what about following the suggestion I made in our discussion at #395 (comment)?

@amimas
Copy link
Contributor Author

amimas commented Apr 5, 2023

Thanks @adrienverge for the suggestion. This is what you suggested in #395 (comment) for the implementation:

So, maybe a (not perfect) solution would be to output forbid-unused-anchors errors at the end of the YAML stream (StreamEndToken -- note that there can be multiple YAML documents inside the same file). The error report anchor "foo" defined but unused wouldn't match the line where &foo is defined, but would be positioned at the end of the YAML stream.

I think that means reporting "unused anchors" doesn't need to refer to the exact line where the anchor is defined. If that's the case, then I believe the implementation is probably simple as per my earlier comment. But from usability/UX perspective, it'd be helpful if the linter could refer to the correct line.

I'll look into it a bit more and see if correct lines of the anchors can be easily captured. Otherwise, that might be the next feature update.

@amimas amimas marked this pull request as ready for review April 7, 2023 22:18
@amimas
Copy link
Contributor Author

amimas commented Apr 7, 2023

@adrienverge - When you get a chance, please review. I managed to implement it so that unused anchor issues can report the correct line. I believe that will be helpful for end users. I didn't need a big refactor of existing implementation; just a minor change that doesn't use set.

I didn't squash the commits yet in case more change is needed after your review. I try to keep my commits as organized/atomic as possible. You can do squash & merge if everything looks okay or I can push a final squashed commits if you insist.

btw - should all the options in the rule be enabled in strict mode? Not sure yet where that is configured.

@amimas
Copy link
Contributor Author

amimas commented May 6, 2023

Hi @adrienverge 👋 . what do you think about the latest update?

Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @amimas,

I didn't squash the commits yet

I have limited time and reviewing takes much, so I do insist, please use a ready-to-merge, unified commit message on every push, with a good description that is helpful for future git show usage (what the rule does and why, an example of error report, etc.) Check out git log on this project to see what is expected.

I managed to implement it so that unused anchor issues can report the correct line. I believe that will be helpful for end users.

This is smart!
But, I suspect it could cause problems to some users because the reported problems can now be interleaved / unordered, which has never been the case with yamllint. For example:

---
- &a unused
-    k:     v  

will report errors in the wrong order:

3:5: too many spaces after hyphen (hyphens)
3:12: too many spaces after colon (colons)
3:14: trailing spaces (trailing-spaces)
2:3: found unused anchor a (anchors)

I propose to keep your implementation (which is cool), but be ready to change it if there are legitimate complaints (the other implementation being to print the line+column in the message string instead).
What do you think?

just a minor change that doesn't use set.

Using an array (list) is particularly inefficient and could cause performance issues: if you can't use a set, please use a mapping (dict): context['anchors'] = {} with the anchor name as the key.

It will be much faster to find/access an element within it.
Also, it will reduce the diff of this PR because you can keep using if token.value in context['anchors'].

btw - should all the options in the rule be enabled in strict mode? Not sure yet where that is configured.

I'm not sure what you mean: are you refering to strict mode (returning a non-zero exit code upon warnings) or the default set of rules? The latter is configured in the DEFAULT variable. In my opinion (see #550 (comment)): yes for forbid-undeclared-aliases, no for the others.

About tests cases: in #550 I tried to create one unified and reusable YAML snipped that would contain all types of errors, so it could be used for forbid-undeclared-aliases, forbid-duplicated-anchors and forbid-unused-anchors. I see that you modified it, so the last one is different from the 2 first ones: removed parts, addition of b1, i1, etc. that are not anchors nor aliases... Unless there's a good reason, let's keep a unified YAML snippet. For info, this initial snippet would report:

2:3: found unused anchor i (anchors)
7:3: found unused anchor s (anchors)
14:3: found unused anchor f_s (anchors)
16:16: found unused anchor b_m (anchors)
19:18: found unused anchor b_m_bis (anchors)

Also it looks like the added tests don't reflect the recent commit e90e0a0, please check that.

@@ -24,6 +24,8 @@
later in the document).
* Set ``forbid-duplicated-anchors`` to ``true`` to avoid duplications of a same
anchor.
* Set ``forbid-unused-anchors`` to ``true`` to avoid anchors being declared but
not used anywhere via alias.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest not used anywherenot used anywhere in the YAML document, so it's clear to users that:

---
key: &anchor "value"
---
use it in a new document: *anchor

would report an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated the document as suggested.

yamllint/rules/anchors.py Show resolved Hide resolved
Comment on lines 186 to 187
"line": token.start_mark.line + 1,
"column": token.start_mark.column + 1,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with other LintProblems and to avoid confusion/bugs, I would prefer keeping line → line and column → column, and add the +1 when using LintProblem().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Updated to follow the project convention.

"value": token.value,
"line": token.start_mark.line + 1,
"column": token.start_mark.column + 1,
"aliasCount": 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Python, snake_case is preferred 🙂

Suggested change
"aliasCount": 0
"alias_count": 0,

Since we don't need to know the exact number of usage, what about something simpler and more performant?

Suggested change
"aliasCount": 0
"used": False,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Updated as suggested.

# is an alias, increment the count of alias for associated anchor.
# This can be used to check when end of document is reached and
# evaluate if an anchor is unused.
elif (isinstance(token, yaml.AliasToken)):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove extra parentheses here, not needed in Python if the elif fits one single line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true. Updated the code.

if (isinstance(next, (yaml.StreamEndToken,
yaml.DocumentStartToken,
yaml.DocumentEndToken)) and
len(context['anchors']) > 0):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this line, the for below will have the same effect if the list is empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... good catch! Thanks for the suggestion. Updated the code.

Comment on lines 173 to 176
for anchor in context['anchors']:
if anchor['value'] == token.value:
anchor['aliasCount'] += 1
break
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my proposal to use a dict, you can simply:

Suggested change
for anchor in context['anchors']:
if anchor['value'] == token.value:
anchor['aliasCount'] += 1
break
context['anchors'].get(token.value, {})['used'] = True

(and remove the whole comment above, because this code is self-explaining)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Using dict wasn't as difficult as I had initially thought. Updated the code.

@amimas amimas force-pushed the issue-395-unused-anchors branch from 32b02cc to c6800b6 Compare May 8, 2023 00:07
@amimas
Copy link
Contributor Author

amimas commented May 8, 2023

@adrienverge - Thanks a lot for taking the time to review and all the feedbacks and suggestions. I believe I've addressed all the concerns and adopted your suggestions. When you get a chance, please review one more time.

I do insist, please use a ready-to-merge, unified commit message on every push, with a good description that is helpful for future git show usage (what the rule does and why, an example of error report, etc.) Check out git log on this project to see what is expected.

Sure. I've squashed all my commits and followed the style of previous commit message used for implementing other options of the anchors rule. Hope it is following your expectation now.

I propose to keep your implementation (which is cool), but be ready to change it if there are legitimate complaints (the other implementation being to print the line+column in the message string instead).
What do you think?

Sure. Even in that case the error message for forbid-unused-anchors option can only be reported at the end of the document. It might look odd to have line and column numbers reported in a different format than the rest. That might not be good reporting.

Using an array (list) is particularly inefficient and could cause performance issues: if you can't use a set, please use a mapping (dict): context['anchors'] = {} with the anchor name as the key.

Thank you for the suggestion and some examples. I managed to get it working with dict and I agree that it's a lot better.

btw - should all the options in the rule be enabled in strict mode? Not sure yet where that is configured.

I'm not sure what you mean: are you refering to strict mode (returning a non-zero exit code upon warnings) or the default set of rules? The latter is configured in the DEFAULT variable. In my opinion (see #550 (comment)): yes for forbid-undeclared-aliases, no for the others.

I was referring to the "strict mode", but I mistakenly thought that mode enables every option. I thought it's an additional "configuration" similar to the default and relaxed. Let's ignore that 😊 .

About tests cases: in #550 I tried to create one unified and reusable YAML snipped that would contain all types of errors, so it could be used for forbid-undeclared-aliases, forbid-duplicated-anchors and forbid-unused-anchors. I see that you modified it, so the last one is different from the 2 first ones: removed parts, addition of b1, i1, etc. that are not anchors nor aliases... Unless there's a good reason, let's keep a unified YAML snippet.

I hadn't realized that. Thanks for setting up the ground work. I've updated the test case for the new option based on your setup.

Also it looks like the added tests don't reflect the recent commit e90e0a0, please check that.

This was a newer change that was merged after I submitted the last change. I've pulled in the latest change now.

Copy link
Contributor Author

@amimas amimas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed PR feedbacks and suggestions.

@amimas amimas requested a review from adrienverge May 8, 2023 00:30
Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @amimas, thanks for this update! The commit message is great and the code looks good. I still have comments (mostly style and optimizations, nothing blocking) so the next one should be the merge one.

It might look odd to have line and column numbers reported in a different format than the rest. That might not be good reporting.

Yes, I agree. My main fear is that some scripts somewhere on the internet rely on yamllint's output ordering, and would break with this change. We'll see!

yaml.StreamStartToken,
yaml.DocumentStartToken,
yaml.DocumentEndToken)):
context['anchors'] = dict()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more usual to use the simple form {} rather than dict() (both in this project and in the Python world). Can you update this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Updated.

@@ -113,6 +143,36 @@ def check(conf, token, prev, next, nextnext, context):
token.start_mark.line + 1, token.start_mark.column + 1,
f'found duplicated anchor "{token.value}"')

if conf['forbid-undeclared-aliases'] or conf['forbid-duplicated-anchors']:
if (conf['forbid-unused-anchors']):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too, you can remove parentheses (and same for line 155).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... Sorry I missed it in the last update. They're fixed now in latest commit.

if (isinstance(next, (yaml.StreamEndToken,
yaml.DocumentStartToken,
yaml.DocumentEndToken))):

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For style consistency, I propose to drop this empty line.

Comment on lines 159 to 165
for anchor in context['anchors']:
if context['anchors'][anchor]['used'] is False:
yield LintProblem(
context['anchors'][anchor]['line'] + 1,
context['anchors'][anchor]['column'] + 1,
f"found unused anchor {anchor}"
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than iterating on keys, you could iterate on keys + values to simplify the code:

Suggested change
for anchor in context['anchors']:
if context['anchors'][anchor]['used'] is False:
yield LintProblem(
context['anchors'][anchor]['line'] + 1,
context['anchors'][anchor]['column'] + 1,
f"found unused anchor {anchor}"
)
for anchor, info in context['anchors'].items():
if not info['used']:
yield LintProblem(info['line'] + 1,
info['column'] + 1,
f"found unused anchor {anchor}")

(and change is False to not)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh... I was trying to do something like this initially but wasn't sure how. Still learning Python 😊 . Thanks for the suggestion.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙂

Comment on lines 173 to 178
anchor_details = {
"line": token.start_mark.line,
"column": token.start_mark.column,
"used": False
}
context['anchors'][token.value] = anchor_details
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
anchor_details = {
"line": token.start_mark.line,
"column": token.start_mark.column,
"used": False
}
context['anchors'][token.value] = anchor_details
context['anchors'][token.value] = {
"line": token.start_mark.line,
"column": token.start_mark.column,
"used": False,
}

According to the YAML specification [^1]:

- > An anchored node need not be referenced by any alias nodes

This means that it's OK to declare anchors but don't have any alias
referencing them. However users could want to avoid this, so a new
option (e.g. `forbid-unused-anchors`) is implemented in this change.
It is disabled by default.

[^1]: https://yaml.org/spec/1.2.2/#692-node-anchors
@amimas
Copy link
Contributor Author

amimas commented May 10, 2023

It might look odd to have line and column numbers reported in a different format than the rest. That might not be good reporting.

Yes, I agree. My main fear is that some scripts somewhere on the internet rely on yamllint's output ordering, and would break with this change. We'll see!

I see what you mean. Let's see!

@amimas amimas requested a review from adrienverge May 10, 2023 02:17
Copy link
Owner

@adrienverge adrienverge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great. Thanks for contributing to yamllint @amimas!

@adrienverge adrienverge merged commit f874b66 into adrienverge:master May 10, 2023
6 checks passed
@adrienverge
Copy link
Owner

Ah, I noticed (too late) that the error message missed quotes around the anchor name (other messages do that):

5:3       error    found unused anchor name not used  (anchors)

I just opened #570 to fix this.

@amimas
Copy link
Contributor Author

amimas commented May 10, 2023

Ah, I noticed (too late) that the error message missed quotes around the anchor name (other messages do that):

5:3       error    found unused anchor name not used  (anchors)

I just opened #570 to fix this.

Sorry I missed it. Thanks for taking care of it.

@amimas amimas deleted the issue-395-unused-anchors branch May 10, 2023 12:08
@amimas
Copy link
Contributor Author

amimas commented May 14, 2023

@adrienverge - Just checking if this feature is available for release. Please let me know if I missed. Thanks.

@adrienverge
Copy link
Owner

Hello @amimas, I would like to wait for #572 before creating the next release. Hopefully it will be integrated soon!

@adrienverge
Copy link
Owner

@amimas this is now released in version 1.32.0 👍

@amimas
Copy link
Contributor Author

amimas commented May 22, 2023

Thanks for the update and release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants