# JSON Schema Parser

Notes on the JSON schema / Traitlets package

**Goal**: write a function that, given a JSON Schema, will generate code for traitlets objects which provide equivalent validation.

## Links

- [JSON Schema Validation Information](http://json-schema.org/latest/json-schema-validation.html)
- [``jsonschema`` Python package](https://pypi.python.org/pypi/jsonschema)
- [Altair 1.0 parsing code](https://github.com/altair-viz/altair/blob/master/tools/generate_schema_interface.py)

## By-Hand Example

First confirm that we're doing things correctly with the ``jsonschema`` package:

In [1]:
import json
import jsonschema

simple_schema = {
    "type": "object",
    "properties": {
        "foo": {"type": "string"},
        "bar": {"type": "number"}
    }
}

In [2]:
good_instance = {
    "foo": "hello world",
    "bar": 3.141592653,
}

In [3]:
bad_instance = {
    "foo" : 42,
    "bar" : "string"
}

In [4]:
# Should succeed
jsonschema.validate(good_instance, simple_schema)

In [5]:
# Should fail
try:
    jsonschema.validate(bad_instance, simple_schema)
except jsonschema.ValidationError as err:
    print(err)

42 is not of type 'string'

Failed validating 'type' in schema['properties']['foo']:
    {'type': 'string'}

On instance['foo']:
    42


OK, now let's write a traitlets class that does the same thing:

In [6]:
import traitlets as T

class SimpleInstance(T.HasTraits):
    foo = T.Unicode()
    bar = T.Float()

In [7]:
# Should succeed

SimpleInstance(**good_instance)

<__main__.SimpleInstance at 0x106c9c048>

In [8]:
# Should fail

try:
    SimpleInstance(**bad_instance)
except T.TraitError as err:
    print(err)

The 'foo' trait of a SimpleInstance instance must be a unicode string, but a value of 42 <class 'int'> was specified.


## Roadmap

1. Start by recognizing all simple JSON types in the schema ("string", "number", "integer", "boolean", "null")

2. Next recognize objects containing simple types

2. Next recognize compound simple types (i.e. where type is a list of simple types)

3. Next recognize arrays & enums

4. Next recognize "$ref" definitions

5. Next recognize "anyOf", "oneOf", "allOf" definitions... first is essentially a traitlets Union, second is a Union where only one must match, and "allOf" is essentially a composite object (not sure if traitlets has that...)

6. Catalog all validation keywords... Implement custom traitlets that support all the various validation keywords for each type.

7. Use [hypothesis](https://hypothesis.readthedocs.io) for testing?

### Differences

- JSONSchema ignores any keys/properties which are undefined. Traitlets warns, and in the future will raise an error for undefined keys/properties

### Interface

- root schema and all definitions should become their own ``T.HasTraits`` class
- Objects defined inline should also have their own class with a generated anonymous name
- Use Jinja templating; allow output to one file or multiple files with relative imports
- root object *must* have type="object"... this differs from jsonschema

### Testing

- test cases should be an increasingly complicated set of jsonschema objects, with test cases that should pass and fail. Perhaps store these in a JSON structure? (With a schema?)

## Some basic code

Let's try generating some traitlets classes for simple cases

In [16]:
import jinja2


OBJECT_TEMPLATE = """
{%- for import in cls.imports %}
{{ import }}
{%- endfor %}

class {{ cls.classname }}({{ cls.baseclass }}):
    {%- for (name, prop) in cls.wrapped_properties().items() %}
    {{ name }} = {{ prop.trait_code }}
    {%- endfor %}
"""

class JSONSchema(object):
    """A class to wrap JSON Schema objects and reason about their contents"""
    
    simple_types = ['string', 'number', 'integer', 'boolean', 'null']
    valid_types = simple_types + ['object', 'array']
    object_template = OBJECT_TEMPLATE
    
    def __init__(self, schema, context=None, parent=None, name=None):
        self.schema = schema
        self.context = context or schema
        self.parent = parent
        self.name = name
        
    def make_child(self, schema, name=None):
        """Make a child class"""
        return self.__class__(schema, context=self.context, parent=self, name=name)
        
    @property
    def type(self):
        # TODO: should the default type be considered object?
        return self.schema.get('type', 'object')
        
    def is_simple_type(self):
        return self.type in self.simple_types
    
    @property
    def properties(self):
        # TODO: raise an error if not type='object'?
        return self.schema.get('properties', {})
    
    @property
    def trait_code(self):
        type_dict = {'string': 'T.Unicode()',
                     'number': 'T.Float()',
                     'integer': 'T.Integer()',
                     'boolean': 'T.Bool()'}
        if self.type not in type_dict:
            raise NotImplementedError()
        return type_dict[self.type]
    
    @property
    def classname(self):
        if self.name:
            return self.name
        elif self.context is self.schema:
            return "RootInstance"
        else:
            raise NotImplementedError("Anonymous class name")
            
    @property
    def baseclass(self):
        return "T.HasTraits"
    
    @property
    def imports(self):
        return ["import traitlets as T"]
    
    def wrapped_properties(self):
        """Return property dictionary wrapped as JSONSchema objects"""
        return {key: self.make_child(val)
                for key, val in self.properties.items()}
    
    def object_code(self):
        return jinja2.Template(self.object_template).render(cls=self)

### Trying it out...

In [17]:
code = JSONSchema(simple_schema).object_code()
print(code)


import traitlets as T

class RootInstance(T.HasTraits):
    foo = T.Unicode()
    bar = T.Float()


### Testing the result

In [18]:
exec(code)
RootInstance(**good_instance)

<__main__.RootInstance at 0x106c8bfd0>

In [29]:
# Context manager to make sure an error is raised

from contextlib import contextmanager

@contextmanager
def assert_raises(*errorclasses):
    the_error = None
    try:
        yield
    except errorclasses as err:
        the_error = err
    if not isinstance(the_error, errorclasses):
        names = {e.__name__ for e in errorclasses}
        raise AssertionError(f"Expression did not raise one of {names}")

In [30]:
exec(code)

In [31]:
# Good instance should validate correctly
RootInstance(**good_instance)

<__main__.RootInstance at 0x106fe8128>

In [32]:
# Bad instance should raise a TraitError
with assert_raises(T.TraitError):
    RootInstance(**bad_instance)

Seems to work 😀

We'll start with something like this in the package, and then build from there.