Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate JSON schema from trafarets #118

Open
ErikOrjehag opened this issue Jun 29, 2021 · 4 comments
Open

Generate JSON schema from trafarets #118

ErikOrjehag opened this issue Jun 29, 2021 · 4 comments

Comments

@ErikOrjehag
Copy link

Hi,

Would it be possible to generate a JSON schema (https://json-schema.org/) from the trafaret definition? This would be really useful because the schema can be used in the IDE (vscode in my case) to autocomplete the yaml file as you are typing (https://github.com/redhat-developer/yaml-language-server). Any thoughts on this?

@Deepwalker
Copy link
Owner

Hello. As I can see it, it is impossible to do. Trafaret are functions signature and collection of combinators, thats all, nothing more. Trafaret does not represent any schema, this is transformer of data. Nor it is an validator etc.
But you can use trafaret-schema package to use actual json schema with trafaret transformation abilities. So in my opinion one should use json schema first and trafaret as convenient tool to work with described schema. Of maybe some other json-schema tool for python, if it is match best your requirements.

@ErikOrjehag
Copy link
Author

Thanks for the reply, and for the awesome package. I use trafaret together with trafaret_config and click (cli framework). I was hoping that it would be possible to go the other way around, meaning from trafaret -> json schema. Maybe not because it's the best way to do it but because I already have a pretty large trafaret setup (parsing a special purpose database query language that I invented and that uses yaml as it's syntax) and I really don't want to spend too much time defining the whole schema again in json-schema. You say trafaret is not a schema but it kind of represent's a schema doesn't it? I would be able to traverse my big trafaret from the root and produce a json-schema from it?

I do have some recursion in my trafarets, to parse things like {"not": {"not": {"not": true}}} -> Not(Not(Not(true))), idk if that complicates things.

@Deepwalker
Copy link
Owner

It is possible of course, you can traverse trafarets like List and Dict. Or custom ones, that you know about. But not the most straightforward task. I can help if you will have some questions down the road.

@ErikOrjehag
Copy link
Author

Turns out trafaret itself was the perfect tool for the job. I was able to create a trafaret that takes as it's input a trafaret and transforms it into a json schema. Also handling recursive trafarets by putting $defs in the json schema:

def ref(defs: dict, trafaret_instance: trafaret, schema_fn: tp.Callable) -> dict:
    trafaret_id = id(trafaret_instance)
    if trafaret_id not in defs:
        defs[trafaret_id] = 'lazy'
        defs[trafaret_id] = schema_fn()
    return {'$ref': f'#/$defs/{trafaret_id}'}

def to_json_schema(traf: trafaret) -> dict:
    defs = dict()  # Dictionary of json schema definitions ($defs)

    json_schema = trafaret.Or()

    # Simple Float, String, Regexp, ToDateTime and Null schemas are small so we put the schemas
    # inline without creating definitions that we then reference.
    float_schema = trafaret.Type(trafaret.Float) >> (lambda x: {
        'type': 'number'
    })
    string_schema = trafaret.Type(trafaret.String) >> (lambda x: {
        'type': 'string'
    })
    regexp_schema = trafaret.Type(trafaret.Regexp) >> (lambda x: {
        'type': 'string',
        'pattern': x.regexp.pattern
    })
    to_datetime_schema = trafaret.Type(trafaret.ToDateTime) >> (lambda x: {
        'type': 'string'
    })
    null_schema = trafaret.Type(trafaret.Null) >> (lambda x: {'type': 'null'})

    # Special case And(t, Callable) is used when a function is put after the trafaret using
    # the >> operator, in this case we should evaluate only the trafaret and ignore the callable.
    and_schema = trafaret.Type(trafaret.And) >> (lambda x: json_schema.check(x.trafaret))

    # Enums can be large so we want to create definitions for them and reference the definitions.
    # VSCode auto completion works better for 'oneOf': 'const' instead of using 'enum' directly...
    enum_schema = trafaret.Type(trafaret.Enum) >> (lambda x: ref(defs, x, lambda: {
        'oneOf': [{'const': name} for name in x.variants]
    }))

    # Lists, Or and Dictionaries can contain them selfs recursively so we want to create definitions
    # for them that we can reference in order to prevent infinite recursion depth.
    list_schema = trafaret.Type(trafaret.List) >> (lambda x: ref(defs, x, lambda: {
        'type': 'array',
        'items': json_schema.check(x.trafaret),
        'minItems': x.min_length,
        'maxItems': x.max_length,
    }))

    or_schema = trafaret.Type(trafaret.Or) >> (lambda x: ref(defs, x, lambda: {
        'oneOf': [json_schema.check(t) for t in x.trafarets]
    }))

    dict_schema = trafaret.Type(trafaret.Dict) >> (lambda x: ref(defs, x, lambda: {
        'type': 'object',
        'additionalProperties': False,
        'properties': {
            k.name: json_schema.check(k.trafaret) for k in x.keys
        },
        'required': [k.name for k in x.keys if not k.optional]
    }))

    json_schema.trafarets = [
        float_schema, string_schema, regexp_schema, to_datetime_schema, null_schema,
        and_schema, enum_schema, list_schema, or_schema, dict_schema,
    ]

    res = json_schema.check(traf)

    schema = {
        **res,
        '$defs': defs,
    }

    return schema

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants