Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloud init schema failures #1954

Merged

Conversation

Chris-Peterson444
Copy link
Contributor

Users attempting to do autoinstall may incorrectly send autoinstall directives as cloud-config, which will result in cloud-init schema validation errors. When loading autoinstall from cloud-config, we now check to see if there are any cloud-init schema validation errors and warn the user. Additionally, if the source of the error is from a known autoinstall error, we inform the user and halt the installation with a nonreportable AutoinstallError.

Requires #1945 and #1947.

Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment for now, more review tomorrow.

subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
@Chris-Peterson444
Copy link
Contributor Author

@blackboxsw could you please take a look at this as well?

Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris, I'm really glad to see this work done. I think it's going to help reduce user confusion. I have some items to follow-up on.

subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/server/server.py Outdated Show resolved Hide resolved
@Chris-Peterson444
Copy link
Contributor Author

Thanks for the review @dbungert. I've gone ahead and implemented your suggestions. I added an extra test case to capture the scenario in which autoinstall keys are present in the cloud-config and the autoinstall key itself is not.

Also, by removing the dependence on the recoverable errors I was able to test this successfully on a 20.04.6 image which has cloud-init version 22.4.2-0ubuntu0~20.04.2

Copy link
Member

@ogayot ogayot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! A few minor comments but nothing blocking

subiquity/cloudinit.py Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Show resolved Hide resolved
subiquity/server/tests/test_server.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
@Chris-Peterson444
Copy link
Contributor Author

Thanks @ogayot I've implemented all of your suggestions

Copy link
Collaborator

@dbungert dbungert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tweak please then LGTM.

subiquity/cloudinit.py Outdated Show resolved Hide resolved
@Chris-Peterson444
Copy link
Contributor Author

Thanks! Since I changed CloudInitSchemaValidationError to a NonReportableException, I also added some lines to the client code so the error overlay works for those error types. Could you give that a look over too @dbungert ?

Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping on this for our awareness.

While I think consuming the CLI and reading stderr is potentially fragile, I get that subiquity may not want to tightly couple to internal cloudinit python libraries and functions to process and iterate over SchemaValidationErrors
raised by cloudinit.config.schema.validate_cloudconfig_schema or cloudnit.config.schema.validate_cloudconfig_file.

To use these functions, subiquity would have to likely have to do something like the following:

from cloudinit.config.schema import validate_cloudconfig_file, get_schema, SchemaValidationError
cloudconfig_schema = get_schema()
try:
    validate_cloudconfig_file('/var/lib/cloud/instance/user-data.txt', schema=cloudconfig_schema)
except SchemaValidationError as e:
    return [schema_problem.path for schema_problem in e.schema_errors if "unexpected" in schema_problem.message]

Note that even the structured SchemaValidationError will not set a 'path' attribute if multiple unexpected keys exist on the object. So, you'd still need your pattern matching to schema_problem.message to extract the separate keys so it's nearly the same logic you have for parsing the stderr of cloud-init schema --system.

The fragility in CLI approach in this PR will be due to cloud-init relying on jsonschema for that error string as cloud-init just presents that error message directly without modification. If jsonschema module changes their error messaging format this could break subiquity parsing.

That said, I don't see this error output format from jsonschema being any different between jammy and noble and I think cloud-init would like to work on a machine-readable representation of cloud-init schema --system --format=yaml per your feature request canonical/cloud-init#5100 that will make this easier to process in the future.

Minor changes requested that you can take or leave as you see fit

  • better regex pattern match, splitting of the parsed jsonschema error messages
  • dropping unused functions/tests

We'll make sure we keep you informed when we start tackling canonical/cloud-init#5100. So this code can consume more friendly structured content when available.

subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/cloudinit.py Outdated Show resolved Hide resolved
# Matches:
# ('some-key' was unexpected)
# ('some-key', 'another-key' were unexpected)
pattern = r"\((?P<args>'[\w\-]+'(,\s'[\w\-]+')*)+ (?:was|were) unexpected\)"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts on this regex:

  1. Pattern should probably be more flexible as these keys can really have any characters in them. and could include underscores, periods etc. So, let's match each offensive key on [^']+
  2. Also we can push the leading and trailing single-quotes outside the P? match so we don't have to strip them later
  3. We can drop the trailing + outside the (P?<args>...)+ as your greedy matching and * should take care of all listed unexpected key matches.
Suggested change
pattern = r"\((?P<args>'[\w\-]+'(,\s'[\w\-]+')*)+ (?:was|were) unexpected\)"
pattern = r"\('(?P<args>[^']+(,\s'[^']+)*)' (?:was|were) unexpected\)"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review on this especially (Regex is hard!). I think we can definitely (1) replace [\w\-] with [^'] to be more flexible and (3) remove the trailing +, but (2) isolating the key names without the quotes is difficult. The suggested regex doesn't quite work and my attempts thus far have been insufficient.

I think I'm okay with another round of processing to strip the quotes and this is still an improvement on the parsing.

Comment on lines +135 to +114
args_list: list[str] = search_result.group("args").split(", ")
no_quotes: list[str] = [arg.strip("'") for arg in args_list]

return no_quotes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above pattern suggestion is acceptable. Then you can avoid having to strip single quotes and adapt your split instead.

Suggested change
args_list: list[str] = search_result.group("args").split(", ")
no_quotes: list[str] = [arg.strip("'") for arg in args_list]
return no_quotes
args_list: list[str] = search_result.group("args").split("', '")
return args_list

subiquity/cloudinit.py Outdated Show resolved Hide resolved
subiquity/tests/test_cloudinit.py Outdated Show resolved Hide resolved
async def get_schema_failure_sources() -> list[str]:
"""Retrieve the keys causing schema failure."""

cmd: list[str] = ["cloud-init", "schema", "--system"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that providing --system does report schema errors with any network-config or vendor-data as well as user-data provided to the instance. This is probably ok, and you probably want subiquity to raise any errors with any network, user-data or vendor-data provided to the instance. What you will not have with this approach is visibility to the file that generated the schema errors because you are processing stderr which only emits the specific error message from jsonschema for a given key, the file name (user-data.txt or network-config.txt) is represented only in the stdout at the moment. I don't think subiquity currently provides network-config to cloud-init, but users could provide this config via kernel commandline params ds=nocloud-net;http://some_url/

What you may want to do is specifically perform schema validation of only the user-data if it exists.

Suggested change
cmd: list[str] = ["cloud-init", "schema", "--system"]
if not os.path.exists("/var/lib/cloud/instance/user-data.txt"):
log.debug("No processed cloud-init user-data present")
return []
cmd: list[str] = ["cloud-init", "schema", "-c", "/var/lib/cloud/instance/user-data.txt"]

or network-config:

     if not os.path.exists("/var/lib/cloud/instance/network-config.json"):
        log.debug("No processed cloud-init network-config present")
        return []  
    cmd: list[str] = ["cloud-init", "schema", "-c", "/var/lib/cloud/instance/network-config.json", "--schema-type", "network-config"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we want to capture errors from all sources since we read the combined config when extracting the autoinstall config (e.g. is it possible someone sends autoinstall config in their network-config?), but capturing the file source would improve our error messaging for sure. What do you think about leaving this to future improvements?

Users attempting to do autoinstall may incorrectly send autoinstall
directives as cloud-config, which will result in cloud-init
schema validation errors. When loading autoinstall from cloud-config,
we now check to see if there are any cloud-init schema validation errors
and warn the user. Additionally, if the source of the error is from
a known autoinstall error, we inform the user and halt the installation
with a nonreportable AutoinstallError.
@Chris-Peterson444
Copy link
Contributor Author

@blackboxsw Thanks a lot for the extensive review! I went ahead and implemented most of your suggested changes (all but two, I left my reasoning in the comments and kept them unresolved). Based on your analysis I think we're okay with the potential fragility in the error parsing until we move to the structured output when it's available. I look forward to seeing more on canonical/cloud-init#5100!

@Chris-Peterson444 Chris-Peterson444 merged commit d77bfbe into canonical:main Apr 3, 2024
9 of 10 checks passed
@dbungert
Copy link
Collaborator

dbungert commented Apr 3, 2024

OK to merge, despite Noble test failures (unrelated archive problems)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants