Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema: do not manipulate draft4 metaschema on jsonschema 2.6.0 #2098

Merged

Conversation

blackboxsw
Copy link
Collaborator

@blackboxsw blackboxsw commented Mar 27, 2023

Proposed Commit Message

    schema: do not manipulate draft4 metaschema for jsonschema 2.6.0
    
    Only set additionalProperties = False in unittest
    draft4 schema definition because cloud-init globally
    registers its draft4 extensions as the primary validator
    for any draft4 schemas in the python process.
    
    This affects solutions such as subiquity and
    ubuntu-desktop-installer which invoke jsonschema.validate
    in the same process at runtime just after calling
    cloudinit.schema.get_jsonschema_validator.
    
    The resulting Tracebacks are seen as something like:
     jsonschema.exceptions.SchemaError:
     {'$ref': '#/definitions/ref_id'} is not valid under any of the
     given schema
    
    Background:
    cloud-init needs to extend draft4 schema to better
    validate and warn 'deprecated' properties in draft4-based
    cloud-init schema definitions. Our unittests also attempt
    to strictly validate any meta schema definitions for the
    cc_* config modules.
    
    To accomplish strict meta schema validation cloud-init makes
    a copy of the draft4 meta schema and adds an
    'additionalProperties' = True to that schema to raise specific
    errors and catch typos in cc_ module schema definitions.
    
    Given that cloud-init at runtime extends and registers
    a draft4 schema validator, any external consumers
    of jsonschema.validate with draft4-base schemas are
    exposed to cloud-init's validator extentions so let's
    limit our risk exposure.
    
    For python 2.6.0, we cannot specify make draft4 schema
    strict because any "$ref" keys are not yet resolved
    to their actual #/defintions/<id> values so the traceback above
    will always be generated in 'strict' mode for complex schemas.
    
    This does not affect jsonschema 3.0+ which appears to resolve
    schema $refs values before schema validation.

Additional Context

python3-jsonschema 2.6.0 in ubuntu-desktop-installer causing probs for cloud-init/curtin canonical/ubuntu-desktop-installer#1714

Test Steps

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unit tests need to be updated to say strict_metaschema=True, correct?

@blackboxsw blackboxsw force-pushed the json-meta-schema-strict-in-unittests branch from 0a33741 to 2f9e958 Compare March 27, 2023 19:26
@blackboxsw
Copy link
Collaborator Author

Unit tests need to be updated to say strict_metaschema=True, correct?

Nope the unittests already provided that strict_metaschema = True here I just plumbed that param into get_jsonschema_validator to finally determine whether to set additionalProperties = False

@blackboxsw
Copy link
Collaborator Author

rebased for mypy/lint fixes

Copy link
Member

@holmanb holmanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@blackboxsw Thanks for this fix.

+1 to get this fix in quick, I guess, but long term I think we need to remove the metaschema manipulation or fix it - it no longer works as designed.

The whole point of the metaschema changes (as implemented in bedac77), was to prevent invalid schema entries, such as the various useless required: []'s that were in the schema at the point of implementation.

Today, the metaschema no longer catches invalid entries. So what value does it add? Does it actually catch anything anymore?

With the following, test_schema.py passes, which it shouldn't.

diff --git a/cloudinit/config/schemas/schema-cloud-config-v1.json b/cloudinit/config/schemas/schema-cloud-config-v1.json
index 2a2d8631c..929f196e9 100644
--- a/cloudinit/config/schemas/schema-cloud-config-v1.json
+++ b/cloudinit/config/schemas/schema-cloud-config-v1.json
@@ -696,6 +696,7 @@
         "apk_repos": {
           "type": "object",
           "minProperties": 1,
+          "required": [],
           "additionalProperties": false,
           "properties": {
             "preserve_repositories": {

I can't find a way that it helps, and it adds a lot of complexity now, so I think this is a net negative at this point.

@holmanb
Copy link
Member

holmanb commented Mar 27, 2023

I can't find a way that it helps, and it adds a lot of complexity now, so I think this is a net negative at this point.

I haven't bothered to bisect to find what broke the current metaschema implementation, but much has changed in the codebase wrt schema since that was implemented.

@TheRealFalcon
Copy link
Member

I was assuming the unit tests would still be catching our metaschema changes...but that's not true anymore?

@blackboxsw blackboxsw force-pushed the json-meta-schema-strict-in-unittests branch 2 times, most recently from 345a881 to 77f41ce Compare March 27, 2023 22:56
Only set additionalProperties = False in unittest
draft4 schema definition because cloud-init globally
registers its draft4 extensions as the primary validator
for any draft4 schemas in the python process.

This affects solutions such as subiquity and
ubuntu-desktop-installer which invoke jsonschema.validate
in the same process at runtime just after calling
cloudinit.schema.get_jsonschema_validator.

The resulting Tracebacks are seen as something like:
 jsonschema.exceptions.SchemaError:
 {'$ref': '#/definitions/ref_id'} is not valid under any of the
 given schema

Background:
cloud-init needs to extend draft4 schema to better
validate and warn 'deprecated' properties in draft4-based
cloud-init schema definitions. Our unittests also attempt
to strictly validate any meta schema definitions for the
cc_* config modules.

To accomplish strict meta schema validation cloud-init makes
a copy of the draft4 meta schema and adds an
'additionalProperties' = True to that schema to raise specific
errors and catch typos in cc_ module schema definitions.

Given that cloud-init at runtime extends and registers
a draft4 schema validator, any external consumers
of jsonschema.validate with draft4-base schemas are
exposed to cloud-init's validator extentions so let's
limit our risk exposure.

For python 2.6.0, we cannot specify make draft4 schema
strict because any "$ref" keys are not yet resolved
to their actual #/defintions/<id> values so the traceback above
will always be generated in 'strict' mode for complex schemas.

This does not affect jsonschema 3.0+ which appears to resolve
schema $refs values before schema validation.
@blackboxsw blackboxsw force-pushed the json-meta-schema-strict-in-unittests branch from 77f41ce to edbf0b8 Compare March 28, 2023 00:28
@blackboxsw blackboxsw changed the title schema: do not manipulate draft4 metaschema for unit tests schema: do not manipulate draft4 metaschema on jsonschema 2.6.0 Mar 28, 2023
Copy link
Contributor

@aciba90 aciba90 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are sure that the problem only happens with jsonschema==2.6.0, I think we should leave a note explaining it to make it easy to revert/refactor this workaround.

Additionally, a minimized reproducible example would be a nice breadcrumb for future maintainers (it could be just a comment on the PR).

An idea that I think it is be worth investigating is:

If we do not want our schema to conflict with the global draft4 validator, as we explicitly create and consume our custom validator

validator = cloudinitValidator(schema, format_checker=FormatChecker())

and we do not need it to be global, then maybe a possible better solution could be to register it for a custom version in:

version="draft4",

something like:

    cloudinitValidator = create(
        meta_schema=strict_metaschema,
        validators=validators,
        version="cloud-init-draft4",
        **validator_kwargs,
    )

could potentially solve the global state issue for jsonschema==2.6.0 but I did not manage to reproduce it.

Other jsonschema things that we modify globally are:

type_checker = Draft4Validator.TYPE_CHECKER.redefine(

types = Draft4Validator.DEFAULT_TYPES # pylint: disable=E1101

validators = dict(Draft4Validator.VALIDATORS)

@blackboxsw
Copy link
Collaborator Author

@aciba90 thanks for the thoughts. here's a reproducer. I like the idea of changing the registered global schema validation from draft4 to something else, but we have public references to cloud-init schema direct from github in json schema store and that gets pulled in directly into schema validators plugins such as VSCode so I don't think we can publicly diverge. That said, our packaging or cloud-init runtime could manipulate this schema validation registration to something private so as to avoid trampling on other jsonschema consumers locally on the system. I still think we want to take that action as a followup as we have other open issues to sort properly. What do you think?

cat > schema-failure.yaml <<EOF
#cloud-config
packages: [curtin]



## reproducer on bionic attempt to validate curtin.STORAGE_CONFIG_SCHEMA
# before and after setting up cloud-init's jsonschema validator

# base64 decoded content below:
# from cloudinit.config.schema import get_jsonschema_validator
# import jsonschema
# from curtin import storage_config
# 
# def main():
#     schema = storage_config.STORAGE_CONFIG_SCHEMA
#     jsonschema.validate({"version": 1, "config": []}, schema)
#     val = get_jsonschema_validator()
#     jsonschema.validate({"version": 1, "config": []}, schema)
# 
# 
# main()


write_files:
 - path: /root/repro.py
   encoding: b64
   content: |
    ZnJvbSBjbG91ZGluaXQuY29uZmlnLnNjaGVtYSBpbXBvcnQgZ2V0X2pzb25zY2hlbWFfdmFsaWRh
    dG9yCmltcG9ydCBqc29uc2NoZW1hCmZyb20gY3VydGluIGltcG9ydCBzdG9yYWdlX2NvbmZpZwoK
    ZGVmIG1haW4oKToKICAgIHNjaGVtYSA9IHN0b3JhZ2VfY29uZmlnLlNUT1JBR0VfQ09ORklHX1ND
    SEVNQQogICAganNvbnNjaGVtYS52YWxpZGF0ZSh7InZlcnNpb24iOiAxLCAiY29uZmlnIjogW119
    LCBzY2hlbWEpCiAgICB2YWwgPSBnZXRfanNvbnNjaGVtYV92YWxpZGF0b3IoKQogICAganNvbnNj
    aGVtYS52YWxpZGF0ZSh7InZlcnNpb24iOiAxLCAiY29uZmlnIjogW119LCBzY2hlbWEpCgoKbWFp
    bigpCg==

runcmd:
   - python3 /root/repro.py
EOF
lxc launch ubuntu-daily:bionic -c user.user-data="$(cat schema-failure.yaml)" reproduce-curtin-schema-failure
lxc exec reproduce-curtin-schema-failure -- cloud-init status --wait --long
lxc exec reproduce-curtin-schema-failure -- cat /var/log/cloud-init-output.log

@aciba90
Copy link
Contributor

aciba90 commented Mar 29, 2023

Thanks @blackboxsw, I do agree, it sounds good to me to try that approach as a follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants