Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use jsonschema package to implement filtering of data based on schema #707

Closed
mauvilsa opened this issue Jul 20, 2020 · 4 comments
Closed

Comments

@mauvilsa
Copy link

There are a few use cases in which it is needed to filter out parts of some json data based on what is valid according to a json schema. See https://github.com/uber/json-schema-filter , https://stackoverflow.com/questions/40226596/how-to-filter-json-via-jsonschema-in-python , https://stackoverflow.com/questions/57378110/filter-json-data-against-json-schema-in-python . From what I have seen there is no python package that provides this.

To implement this filtering I figure that one would need a json schema validator to identify what parts are invalid to then remove them. So I ask it as a question here. Is there any easy way to use the jsonschema package just to identify which parts of a json object does not validate against the schema?

@willson-chen
Copy link
Contributor

willson-chen commented Jul 21, 2020

@mauvilsa Maybe following code could get what you want.

import jsonschema
import re
schema = {
  "type": "object",
  "required": [],
  "properties": {
    "system": {
        "id": "system",
        "required": ["id"],
        "type": "object",
        "properties": {
            "state": {
                "id": "state",
                "required": ["id"],
                "type": "string"
            },
            "id": {
                "id": "id",
                "required": ["id"],
                "type": "number"
            }
        },
        "additionalProperties": False
    }
  }
}

doc = {
    "system": {
        "state": "enabled",
        "id": 5,
        "keys": [
            {"key_id": 12, "key": "filename.key" }
        ],
        "others": [
            {"xxx_id": 20, "key": "filename.key" }
        ]
    }
}

if __name__ == '__main__':

    try:
        jsonschema.validate(doc, schema)

    except jsonschema.exceptions.ValidationError as e:
        print("relative_path : ", e.relative_path)
        print("path          : ", e.path)
        print("relative_schema_path : ", e.relative_schema_path)
        print("schema_path   : ", e.schema_path)
        print("message       : ", e.message)

        matchObj = re.match(r'(Additional properties are not allowed \()(.*)( were unexpected\))', e.message)
        if matchObj:
            print("matchObj.group() : ", matchObj.group())
            print("matchObj.group(1) : ", matchObj.group(1))
            print("matchObj.group(2) : ", matchObj.group(2))
            print("matchObj.group(3) : ", matchObj.group(3))

            keys = matchObj.group(2).replace("'", "").split(", ")
            print("keys: ", keys)
            newdoc = doc
            for key in e.path:
                newdoc = newdoc[key]

            for key in keys:
                del newdoc[key]

        else:
            print("No match!!")

        print("doc: ", doc)#{'system': {'state': 'enabled', 'id': 5}}

@Julian
Copy link
Member

Julian commented Jul 26, 2020

Hi. I think you're looking for either the documentation on what information is available for each error or possibly are asking about something like Seep, which was a POC for doing data transformation.

@Julian Julian closed this as completed Jul 26, 2020
@mauvilsa
Copy link
Author

@willson-chen @Julian thank you both for the responses. With them I do see how the filtering could be implemented even in a more general sense than what the code snippet shows. Regarding https://github.com/Julian/Seep from what I see it is mostly an empty project which was planned to be something much more general than just filtering, but was not worked on. Is this the case? If so, then if we want the filtering we need to implement it anyways.

@Julian
Copy link
Member

Julian commented Jul 29, 2020

Yes, Seep was mostly demonstrative for how such things could be done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants