New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for missing or duplicated anchors #420
Check for missing or duplicated anchors #420
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @newrushbolt, thanks for contributing!
The change is great and I'm favorable to merging it, but before I have a few remarks and concerns.
-
According to the YAML specification:
-
It is an error for an alias node to use an anchor that does not previously occur in the document -- https://yaml.org/spec/1.2/spec.html#id2785586
Your commit implements that 👍
But it waits for the end of the document to report it, whereas it's not needed, and can output errors in a wrong order. Since anchors must be declared before aliases, you should report the problem as soon as the unknown alias is found. -
The alias refers to the most recent preceding node having the same anchor → it's allowed to override anchor -- https://yaml.org/spec/1.2-old/spec.html#id2786448
Your commit implements and documents "Duplicate anchors are not allowed", which disagrees with the spec.
-
-
To solve the previous point, and also to prepare the future (if we want to allow the rule to be extended later), I propose to make the rule configurable:
rules: anchors: forbid-unknown-aliases: true # true by default, because YAML spec fordids it forbid-duplicated-anchors: false # false by default, because YAML spec allows it
The Python file would look like:
CONF = {'forbid-unknown-aliases': bool, 'forbid-duplicated-anchors': bool} DEFAULT = {'forbid-unknown-aliases': True, 'forbid-duplicated-anchors': False} if conf['forbid-unknown-aliases']:
-
I'm not sure whether this new rule (at least
forbid-unknown-aliases
) should be enabled in thedefault
set of rules: it might break things for many users. On the other hand, broken anchors would mean broken YAML and shouldn't exist...An option could be to enable it, release a new yamllint version, and if this causes problems for users, you would quickly create a pull request to disable it by default.
What do you think?
b51b44e
to
b437665
Compare
Hello @adrienverge, thanks for the review. Sorry for the delay. First of all, I decided to split this into two separate rules.
Of course l'll squash all the commits after final approve. |
cad43ac
to
3ad948c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello Sergei,
Thanks for the good work!
It looks better but I see a problem now: you split the feature into 2 rules, whereas:
- they're both related to anchors,
- it feels strange to have a rule
anchor-duplicates
containing an option
with the same nameanchor-duplicates
. Same forunknown-aliases: {unknown-aliases: true}
.
I insist on the following conf, inside a united anchors
rule:
CONF = {'forbid-unknown-aliases': bool,
'forbid-duplicated-anchors': bool}
DEFAULT = {'forbid-unknown-aliases': True,
'forbid-duplicated-anchors': False}
if conf['forbid-unknown-aliases'] and AliasToken and token.value not in context['anchors']:
# anchor "{token.value}" is used before assignment
if conf['forbid-duplicated-anchors'] and AnchorToken and token.value in context['anchors']:
# duplicated anchor "{token.value}"
which is more in the spirit of yamllint. Moreover it would allow extending the rule with more options later.
You can see rules empty-values
and octal-values
for comparison.
Other than that, please address all my other remarks above and below, so we can merge this 👍
a52d0eb
to
aad8c2b
Compare
aad8c2b
to
542e05f
Compare
@adrienverge is everything ok now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for this PR. Looking forward seeing this merged and released.
In #395, a third option for this rule was also discussed (forbid-unused-anchors
). It's similar to the duplicated anchors option. Can that option be added into this PR. I feel that will make this whole.
Thanks again.
if token.value not in context['anchors']: | ||
yield LintProblem( | ||
token.start_mark.line + 1, token.start_mark.column + 1, | ||
f'anchor "{token.value}" is used before assignment') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this message also indicate a potentially reference to a non-existent anchor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yamllint parses files sequencially, so it does not know whether the anchor is defined later in the file or is totally missing from the document.
(Maybe I didn't get your point, in this case please give example of what you mean :))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message as written right now specifically says something like this:
anchor "foo-bar" is used before assignment
But, as you said already
it does not know whether the anchor is defined later in the file or is totally missing from the document.
That's why I was suggesting that the error message probably could be less specific because it's possible that the anchor hasn't been defined in the entire file. It could be something like:
anchor "foo-bar" is used before assignment/defined or does not exist
Hope that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrienverge - Trying to learn the code, as I'm new to this. Would like to contribute if I can.
Yamllint parses files sequencially, so it does not know whether the anchor is defined later in the file or is totally missing from the document.
Right now problems are reported from each line as tokens are parsed. Is it possible to save them in a global variable instead of reporting right away? Then if the "end of document token" is reached, report the problems? I didn't see any of the existing rule doing it. Also, can errors from multiple locations be reported? In this example: if an anchor is duplicated 3 times, report the 2nd and 3rd occurrences as problem. Not sure if this is valid:
def check(conf, token, prev, next, nextnext, context):
# Skipping codes here to keep this snippet short
yield LintProblem(....)
yield LintProblem(....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hope that makes sense.
Yes it does, thanks, I see what you mean now 🙏
What I like in the sentence anchor "foo-bar" is used before assignment
is that it's simpler (it counts) and isn't incompatible with "this anchor will actually never be assigned later in the file".
I see this sentence as a good trade-off between simplicity and understandability (compared to anchor "foo-bar" is used before assignment/defined or does not exist
)
Right now problems are reported from each line as tokens are parsed. Is it possible to save them in a global variable instead of reporting right away? Then if the "end of document token" is reached, report the problems? I didn't see any of the existing rule doing it.
That looks like a solution, but yamllint is designed to output errors right away, which allows:
- having them sorted naturally,
- being performant and avoid consuming too much RAM,
- not losing past errors if the script crashes at some point.
Changing this would need a major refactor, and I'm not sure this potentially adaptative error message (... used before assignment
/ ... never assigned
) is worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you're saying and it makes sense. I was just asking because I wanted to look at the forbid-unused-anchors
option. Without checking the entire file, not sure how to verify that an anchor isn't being used. Let me know if you have any suggestions.
Maybe we should discuss it in the feature request issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #395 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@newrushbolt not everything seems to be OK, it looks like the Python linter reports a dozen errors. Can you check it? You can run it locally on your machine with python -m flake8 .
.
Apart from that, I haven't re-reviewed the PR thoroughly yet since last year, but if all my remarks were handled it should be good to go.
EDIT: I see you did not apply my request for token.value in context['anchors']
, replacing it with context['anchors'].count(token.value)
, which creates a bug if there are ≥ 3 anchors. Maybe there are other problems. Please check all my remarks and follow contributing guidelines so it doesn't waste your time and mine too much. Thanks!
In #395, a third option for this rule was also discussed (forbid-unused-anchors). It's similar to the duplicated anchors option. Can that option be added into this PR. I feel that will make this whole.
@amimas this development has been open for more than a year, I feel it would be better to just finish and merde it. Then, you can implement this new forbid-unused-anchors
option if you'd like too.
clients: | ||
jack: &jack_client | ||
billing_id: 1234 | ||
bill: &bill_client | ||
billing_id: 5678 | ||
target_client: *jack_client | ||
|
||
the following code snippet would **FAIL**: | ||
:: | ||
|
||
clients: | ||
jack: &jack_client | ||
billing_id: 1234 | ||
bill: &jack_client | ||
billing_id: 5678 | ||
target_client: *jack_client |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All other examples use 2-space indentation: let's be consistent!
if token.value not in context['anchors']: | ||
yield LintProblem( | ||
token.start_mark.line + 1, token.start_mark.column + 1, | ||
f'anchor "{token.value}" is used before assignment') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yamllint parses files sequencially, so it does not know whether the anchor is defined later in the file or is totally missing from the document.
(Maybe I didn't get your point, in this case please give example of what you mean :))
DUPLICATED_ANCHOR = '''--- | ||
key1: &keyanchor a | ||
key2: &keyanchor b | ||
otherkey: *keyanchor | ||
''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a test for anchors duplicated 3 times. Latest version of the PR (commit 542e05f) seems to have this bug!
@newrushbolt - are you able to look into the remaining feedbacks? |
@newrushbolt we need feedback from you, because @amimas is implementing a new option for your new rule in #537. I know that yamllint requires high-quality contributions, so it's OK if you don't want to spend time on it. Someone else (maybe me) can implement this new rule in a new pull request. Please tell us what you prefer. |
According to the YAML specification [^1]: - > It is an error for an alias node to use an anchor that does not > previously occur in the document. The `forbid-undeclared-aliases` option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default. - > The alias refers to the most recent preceding node having the same > anchor. This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option `forbid-duplicated-anchors` allows that. It's disabled by default. - > It is not an error to specify an anchor that is not used by any > alias node. This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. `forbid-unused-anchors`) could be implemented in the future. See #537. Fixes #395 Closes #420 [^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
Due to lack of activity here, and the need of @amimas in #537, I've implemented this new rule in #550. The code is simpler, should be more performant (use of @newrushbolt @amimas could you test it and report whether it works for you? |
According to the YAML specification [^1]: - > It is an error for an alias node to use an anchor that does not > previously occur in the document. The `forbid-undeclared-aliases` option checks that aliases do have a matching anchor declared previously in the document. Since this is required by the YAML spec, this option is enabled by default. - > The alias refers to the most recent preceding node having the same > anchor. This means that having a same anchor repeated in a document is allowed. However users could want to avoid this, so the new option `forbid-duplicated-anchors` allows that. It's disabled by default. - > It is not an error to specify an anchor that is not used by any > alias node. This means that it's OK to declare anchors but don't have any alias referencing them. However users could want to avoid this, so a new option (e.g. `forbid-unused-anchors`) could be implemented in the future. See #537. Fixes #395 Closes #420 [^1]: https://yaml.org/spec/1.2.2/#71-alias-nodes
Basic anchor\aliases checks, also fix for #395.