Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(schema): no network validation on netplan systems without API #5254

Conversation

blackboxsw
Copy link
Collaborator

Proposed Commit Message

fix(schema): no network validation on netplan systems without API

Do not validation network config against cloud-init's network v2 schema on netplan systems because netplan schema supports keys not present in cloud-init's v2 schema which can result in schema warnings from cloud-init which are perfectly acceptable netplan config keys.

Given that cloud-init performs a clean passthrough of network version 2 directly to netplan without trying to process the network configuration, there is little value in providing such schema warnings unless cloud-init's network v2 schema is aligned with the specific netplan schema supported for each release.

On mantic and later, cloud-init will call netplan's python API to validate schema with netplan itself, but prior to Ubuntu Mantic no python API exists, so we cannot validate network v2 against netplan.

Update skip messaging and integration tests.

Additional Context

Jenkins jammy integration test failures which should "skip" network schema validation https://jenkins.canonical.com/server-team/view/cloud-init/job/cloud-init-integration-jammy-lxd_container/440/testReport/

Test Steps

for series in focal jammy mantic; do

 CLOUD_INIT_CLOUD_INIT_SOURCE=IN_PLACE CLOUD_INIT_OS_IMAGE=$series tox -e integration-tests -- tests/integration_tests/test_networking.py::test_invalid_network_v2_netplan tests/integration_tests/cmd/test_schema.py::TestSchemaDeprecations::test_network_config_schema_validation

done

Checklist

Merge type

  • Squash merge using "Proposed Commit Message"
  • Rebase and merge unique commits. Requires commit messages per-commit each referencing the pull request number (#<PR_NUM>)

Do not validation network config against cloud-init's network v2
schema on netplan systems because netplan schema supports keys not
present in cloud-init's v2 schema which can result in schema warnings
from cloud-init which are perfectly acceptable netplan config keys.

Given that cloud-init performs a clean passthrough of network
version 2 directly to netplan without trying to process the network
configuration, there is little value in providing such schema warnings
unless cloud-init's network v2 schema is aligned with the specific
netplan schema supported for each release.

On mantic and later, cloud-init will call netplan's python API
to validate schema with netplan itself, but prior to Ubuntu Mantic
no python API exists, so we cannot validate network v2 against netplan.

Update skip messaging and integration tests.
Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changeset solves the issue. I left some inline comments which are probably not meant to be solved as part of this PR, that could improve this module/functionality from a developer POV.

In addition to them, I would like to add that, at least, validate_cloudconfig_schema, validate_cloudconfig_file and cloud.config (within those functions) symbols do not refer anymore to only cloud-configs as they can also act upon network-configs, I'd like to see them renamed to reflect this fact. Problably a s/cloudconfig/config/ on them? Again, this is not blocking this PR, but it is something that I think developers would benefit from.

# This may result in schema warnings for valid netplan config
# which would be successfully rendered by netplan but doesn't
# adhere to cloud-init's network v2.
return False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds a new False exit point with a new meaning, invalidating the print statement in

" Jsonschema dependency missing."
.

At a minimum, could we extend that print message? This is not needed because validate_cloudconfig_file manually calls netplan_validate_network_schema and repeat this plumbing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but this function's documentation is not immediately clear. If I read the documentation, without reading the code, I do not understand that the validation is materialized as log warnings and that the boolean return value represents if the validation was performed or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change adds a new False exit point with a new meaning, invalidating the print statement in

" Jsonschema dependency missing."

.

We don't actually get to 1177 when on netplan and no netplan API is present because of the new segment that I introduced 10 lines above that

elif netplan_available():
print(
"Skipping network-config schema validation for version: 2."
" No netplan API available."
)
return False

At a minimum, could we extend that print message? This is not needed because validate_cloudconfig_file manually calls netplan_validate_network_schema and repeat this plumbing.

I do agree though that validate_cloudconfig_schema & _file functions need a significant refactor to avoid this complexity of schema handling. output formatting differences and duplication of logic. Maybe we take this as a separate issue to resolve at a later time?

@@ -1140,6 +1154,12 @@ def validate_cloudconfig_file(
network_config=cloudconfig, strict=True, annotate=annotate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the code stands, validate_cloudconfig_file partially reimplements what validate_cloudconfig_schema does, with no apparent reason but printing message instead of logging them (because validate_cloudconfig_file is only called to handle the cloud-init schema command). Unless I am missing another reason, and if so it should be documented, why don't we simplify this like:

diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index 407cade0d..7468fb27b 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -1150,16 +1150,6 @@ def validate_cloudconfig_file(
         network_version = network_schema_version(cloudconfig)
         if network_version == 2:
             schema_type = SchemaType.NETWORK_CONFIG_V2
-            if netplan_validate_network_schema(
-                network_config=cloudconfig, strict=True, annotate=annotate
-            ):
-                return True  # schema validation performed by netplan
-            elif netplan_available():
-                print(
-                    "Skipping network-config schema validation for version: 2."
-                    " No netplan API available."
-                )
-                return False
         elif network_version == 1:
             schema_type = SchemaType.NETWORK_CONFIG_V1
             # refresh schema since NETWORK_CONFIG defaults to V2

and handle the printing-message-problem by returning an enum in validate_cloudconfig_schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either returning an enum from validate_cloudconfig_schema, or maybe raising specialized errors from validate_cloudconfig_schema which would allow us to differentiate no JSON schema importable from no netplan API importable. I think this warrants inclusion in a general refactor of these two functions that we should carry separate from this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes more sense to refactor with different return codes for: netplan validation (good or bad), networkv2 validation (good or bad), and skip-because-no-netplan-api-lack-of-validation

@blackboxsw
Copy link
Collaborator Author

In addition to them, I would like to add that, at least, validate_cloudconfig_schema, validate_cloudconfig_file and cloud.config (within those functions) symbols do not refer anymore to only cloud-configs as they can also act upon network-configs, I'd like to see them renamed to reflect this fact. Problably a s/cloudconfig/config/ on them? Again, this is not blocking this PR, but it is something that I think developers would benefit from.

I agree wit hthe renames suggested and documentation improvements in code. Since it touches almost all config module test files, I'll followup this PR with a non-functional rename/doc PR that is separate to ensure we get fresh eyes on the medium-length changeset needed to rename.

@aciba90
Copy link
Contributor

aciba90 commented May 2, 2024

In addition to them, I would like to add that, at least, validate_cloudconfig_schema, validate_cloudconfig_file and cloud.config (within those functions) symbols do not refer anymore to only cloud-configs as they can also act upon network-configs, I'd like to see them renamed to reflect this fact. Problably a s/cloudconfig/config/ on them? Again, this is not blocking this PR, but it is something that I think developers would benefit from.

I agree wit hthe renames suggested and documentation improvements in code. Since it touches almost all config module test files, I'll followup this PR with a non-functional rename/doc PR that is separate to ensure we get fresh eyes on the medium-length changeset needed to rename.

Many thanks, could we create a ticket representing this effort to not forget about it?

Copy link
Collaborator

@catmsred catmsred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no problem with the code as is, but I am curious when we use features vs in-code release checks as we're doing here?

@@ -1140,6 +1154,12 @@ def validate_cloudconfig_file(
network_config=cloudconfig, strict=True, annotate=annotate
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes more sense to refactor with different return codes for: netplan validation (good or bad), networkv2 validation (good or bad), and skip-because-no-netplan-api-lack-of-validation

@blackboxsw
Copy link
Collaborator Author

blackboxsw commented May 2, 2024

I have no problem with the code as is, but I am curious when we use features vs in-code release checks as we're doing here?

Thanks @catmsred, I'm misunderstanding the question a bit I think, but I'll take a stab at it. Cloud-init's features.py module is typically reserved for hard-coding behavior of a downstream build of cloud-init to behave a certain way regardless of the environment around it when the behavior may differ from upstream's defaults

In this case, our in-code validation is intending to make the same decision in all environments without exception:

  • If netplan is installed, as indicated by cloudinit.net.netplan.available, we always want to avoid using cloud-init's network v2 schema because our static schema is not a complete/correct representation of what the netplan version installed in that environment supports. It will have schema drift depending on what version of netplan is installed.

The reason I didn't opt for the cloudinit.features approach here is because this is not specifically a behavior we want to hard-code or limit on just on Ubuntu Focal and Jammy downstreams. Instead, we want to avoid this behavior in any environment or distribution which has netplan installed, including some Debian images. This allows us to decouple cloud-init's behavior in schema validation from any features that netplan may introduce into a stable release on behalf of netplan as well as not force a strict behavior via a build feature-flag.

To summarize:

  • features: used to provide a behavior toggle at package build time when we expect downstreams to desire a build-time behavior that will always differ from upstream behavior for a given image. This is frequently used when we know we are SRUing backward incompatible changes back to stable releases that we want to avoid altering on that stable release
  • in-line environmental checks: when we expect this behavior is common to all environments and determined based on some external feature or condition outside of cloud-init's control.

Maybe I totally misunderstood your 'in-code release checks` comment though, please correct me if I've missed the question intent.

@blackboxsw blackboxsw merged commit 9454e2f into canonical:main May 2, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants