### Project 1

In this project our goal is to validate one dictionary structure against a template dictionary.

A typical example of this might be working with JSON data inputs in an API. You are trying to validate this received JSON against some kind of template to make sure the received JSON conforms to that template (i.e. all the keys and structure are identical - value types being important, but not the value itself - so just the structure, and the data type of the values).

To keep things simple we'll assume that values can be either single values (like an integer, string, etc), or a dictionary, itself only containing single values or other dictionaries, recursively. In other words, we're not going to deal with lists as possible values. Also, to keep things simple, we'll assume that all keys are **required**, and that no extra keys are permitted.

In practice we would not have these simplifying assumptions, and although we could definitely write this ourselves, there are many 3rd party libraries that already exist to do this (such as `jsonschema`, `marshmallow`, and many more, some of which I'll cover lightly in some later videos.)

For example you might have this template:

In [27]:
template = {
    'user_id': int,
    'name': {
        'first': str,
        'last': str
    },
    'bio': {
        'dob': {
            'year': int,
            'month': int,
            'day': int
        },
        'birthplace': {
            'country': str,
            'city': str
        }
    }
}

So, a JSON document such as this would match the template:

In [28]:
john = {
    'user_id': 100,
    'name': {
        'first': 'John',
        'last': 'Cleese'
    },
    'bio': {
        'dob': {
            'year': 1939,
            'month': 11,
            'day': 27
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Weston-super-Mare'
        }
    }
}

But this one would **not** match the template (missing key):

In [6]:
eric = {
    'user_id': 101,
    'name': {
        'first': 'Eric',
        'last': 'Idle'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 3,
            'day': 29
        },
        'birthplace': {
            'country': 'United Kingdom'
        }
    }
}

And neither would this one (wrong data type):

In [29]:
michael = {
    'user_id': 102,
    'name': {
        'first': 'Michael',
        'last': 'Palin'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 'May',
            'day': 5
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Sheffield'
        }
    }
}

Write a function such this:

In [5]:
def validate(data, template):
    # implement
    # and return True/False
    # in the case of False, return a string describing 
    # the first error encountered
    # in the case of True, string can be empty
    return state, error

That should return this:
* `validate(john, template) --> True, ''`
* `validate(eric, template) --> False, 'mismatched keys: bio.birthplace.city'`
* `validate(michael, template) --> False, 'bad type: bio.dob.month'`

Better yet, use exceptions instead of return codes and strings!

In [15]:
def validate(data, template):
    
    return state, error

def json_walk(data, key=None):
    if key is None:
        key = ''
    for k, v in data.items():
        if isinstance(v, dict):
            json_walk(v, key= key + k)
        else:
            print(key+k, v)
            data.pop(k)

json_walk(john)

user_id 100


RuntimeError: dictionary changed size during iteration

In [39]:
def match_keys(data, valid, path):
    data_keys = data.keys()
    valid_keys = valid.keys()
    
    extra_keys = data_keys - valid_keys
    missing_keys = valid_keys - data_keys

    if missing_keys or extra_keys:
        missing_msg = ('missing_keys: ' + 
                       ', '.join([path + '/' + str(key) for key in missing_keys])) \
        if missing_keys else ''
        extras_msg = ('extra_keys: ' + 
                      ', '.join([path + '/' + str(key) for key in extra_keys])) \
        if extra_keys else ''
        return False, ' '.join((missing_msg, extras_msg))
    else:
        return True, None

In [40]:
is_ok, err_msg = match_keys(michael, template, '')
print(is_ok, err_msg)

True None


In [41]:
d= dict(zip('abcd', range(1,5)))
is_ok, err_msg = match_keys(d, template, '')
print(is_ok, err_msg)

False missing_keys: /bio, /name, /user_id extra_keys: /d, /b, /c, /a


In [54]:
def match_types(data, template, path):
    for key, value in template.items():
        if isinstance(value, dict):
            template_type = dict
        else:
            template_type = value
        data_value = data.get(key,object())
        if not isinstance(data_value, template_type):
            err_msg = ('incorrect_type: ' + path + '/' + key + 
                      ' -> expected ' + template_type.__name__  + 
                      ', found ' + type(data_value).__name__)
            return False, err_msg
    return True, None

In [55]:
is_ok, err_msg = match_types(john, template, '')
print(is_ok, err_msg)

True None


In [62]:
t = dict(a=int, b=str, c=dict(d=int))
d = dict(a=100, d=object(), c=dict(f=124))

is_ok, err_msg = match_types(d, t, '')
print(is_ok, err_msg)

False incorrect_type: /b -> expected str, found object


In [63]:
t = dict(a=int, b=str, c=dict(d=int))
d = dict(a=100, b='abc', c=object())

is_ok, err_msg = match_types(d, t, '/path')
print(is_ok, err_msg)

False incorrect_type: /path/c -> expected dict, found object


In [64]:
def recurse_validate(data, template, path=None):
    is_ok, err_msg = match_keys(data, template, path)
    if not is_ok:
        return False, err_msg
    
    is_ok, err_msg = match_types(data, template, path)
    if not is_ok:
        return False, err_msg
    
    dictionary_type_keys = {key for key, value in  template.items()
                           if isinstance(value, dict)}
    
     for key in dictionary_type_keys:
        sub_path = path + '/' + str(key)
        sub_template = template[key]
        sub_data = data[key]
        is_ok, err_msg = recurse_validate(sub_data, sub_template, sub_path)
        if not is_ok:
            return False, err_msg
        
    return True, None

In [67]:
is_ok, err_msg = recurse_validate(eric, template, '')
print(is_ok, err_msg)

False missing_keys: /bio/birthplace/city 


In [68]:
is_ok, err_msg = recurse_validate(michael, template, '')
print(is_ok, err_msg)

False incorrect_type: /bio/dob/month -> expected int, found str


In [69]:
def validate(data, template):
    return recurse_validate(data, template, '')

In [72]:
persons = ((john, 'John'), (eric, 'Eric'), (michael, 'Michael'))


In [79]:
for person, name in persons:
    is_ok, err_msg = validate(person, template)
    print(f'{name}: valid={is_ok} : {err_msg}')

John: valid=True : None
Eric: valid=False : missing_keys: /bio/birthplace/city 
Michael: valid=False : incorrect_type: /bio/dob/month -> expected int, found str


In [80]:
class SchemaError(Exception):
    pass

In [81]:
def validate(data, template):
    is_ok, err_msg =  recurse_validate(data, template, '')
    if not is_ok:
        raise SchemaError(err_msg)

In [84]:
for person, name in persons:
    validate(person, template)

SchemaError: missing_keys: /bio/birthplace/city 

In [86]:
for person, name in persons:
    try:
        validate(person, template)
    except SchemaError as ex:
        print('Validation failed', str(ex))

Validation failed missing_keys: /bio/birthplace/city 
Validation failed incorrect_type: /bio/dob/month -> expected int, found str


In [87]:
class SchemaError(Exception):
    pass

class SchemaKeyMismatch(SchemaError):
    pass

class SchemaTypeMismatch(SchemaError, TypeError):
    pass

In [88]:
def match_keys(data, valid, path):
    data_keys = data.keys()
    valid_keys = valid.keys()
    
    extra_keys = data_keys - valid_keys
    missing_keys = valid_keys - data_keys

    if missing_keys or extra_keys:
        missing_msg = ('missing_keys: ' + 
                       ', '.join([path + '/' + str(key) for key in missing_keys])) \
        if missing_keys else ''
        extras_msg = ('extra_keys: ' + 
                      ', '.join([path + '/' + str(key) for key in extra_keys])) \
        if extra_keys else ''
        raise SchemaKeyMismatch(' '.join((missing_msg, extras_msg)))
    

In [89]:
def match_types(data, template, path):
    for key, value in template.items():
        if isinstance(value, dict):
            template_type = dict
        else:
            template_type = value
        data_value = data.get(key,object())
        if not isinstance(data_value, template_type):
            err_msg = ('incorrect_type: ' + path + '/' + key + 
                      ' -> expected ' + template_type.__name__  + 
                      ', found ' + type(data_value).__name__)
            raise SchemaTypeMismatch(err_msg)

In [91]:
def recurse_validate(data, template, path=None):
    match_keys(data, template, path)
    match_types(data, template, path)
    
    dictionary_type_keys = {key for key, value in  template.items()
                           if isinstance(value, dict)}
    
    for key in dictionary_type_keys:
        sub_path = path + '/' + str(key)
        sub_template = template[key]
        sub_data = data[key]
        recurse_validate(sub_data, sub_template, sub_path)

In [92]:
def validate(data, template):
    recurse_validate(data, template, '')

In [93]:
validate(john, template)

In [94]:
validate(eric, template)

SchemaKeyMismatch: missing_keys: /bio/birthplace/city 

In [95]:
validate(michael, template)

SchemaTypeMismatch: incorrect_type: /bio/dob/month -> expected int, found str

In [96]:
try:
    validate(eric, template)
except SchemaError as ex:
    print(ex)

missing_keys: /bio/birthplace/city 


In [98]:
try:
    validate(michael, template)
except TypeError as ex:
    print(ex)

incorrect_type: /bio/dob/month -> expected int, found str


In [99]:
try:
    validate(michael, template)
except SchemaError as ex:
    print(ex)

incorrect_type: /bio/dob/month -> expected int, found str


In [100]:
try:
    validate(michael, template)
except SchemaTypeMismatch as ex:
    print(ex)

incorrect_type: /bio/dob/month -> expected int, found str


In [103]:
try:
    validate(michael, template)
except SchemaKeyMismatch as ex:
    print('handling a key mismatch exception', ex)
except SchemaTypeMismatch as ex:
    print('handling a type mismatch exception', ex)
except SchemaError as ex:
    print('handling some general schema exception', ex)
except TypeError as ex:
    print('handling some general type exception', ex)

handling a type mismatch exception incorrect_type: /bio/dob/month -> expected int, found str
