### Sets Project 1

In this project our goal is to validate one dictionary structure against a template dictionary.

A typical example of this might be working with JSON data inputs in an API. You are trying to validate this received JSON against some kind of template to make sure the received JSON conforms to that template (i.e. all the keys and structure are identical - value types being important, but not the value iteself - so just the structure, and the data type of the values).

To keep things simple, we'll assum that values can be ither single calues (like and integer, string, etc), or a dictionry, itself only containing single values or other dictionaries, recursively. In other words, we're not going to deal with lists as possible values. Also, to kep things simple, we'll assum that all keys are required, and that no extra keys are permitted.

Inp ractice we would not have these simplifying assumptions, an although we could definitely write this ourselves, there are many 3rd part libraries that already exist to do this (such as **jsonschema, marshmallow** and many more, some of which are covered in later notes).

For example, you might have this template:

In [1]:
template = {
    'user_id': int,
    'name': {
        'first': str,
        'last': str
    },
    'bio': {
        'dob': {
            'year': int,
            'month': int,
            'day': int
        },
        'birthplace': {
            'country': str,
            'city': str
        }
    }
}

So, a JSON document such as this would match the template:

In [2]:
john = {
    'user_id': 100,
    'name': {
        'first': 'John',
        'last': 'Cleese'
    },
    'bio': {
        'dob': {
            'year': 1939,
            'month': 11,
            'day': 27
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Weston-super-Mare'
        }
    }
}

But this one would **not** match the template (missing key):

In [3]:
eric = {
    'user_id': 101,
    'name': {
        'first': 'Eric',
        'last': 'Idle'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 3,
            'day': 29
        },
        'birthplace': {
            'country': 'United Kingdom',
        }
    }
}

And neither would this one (wrong data type):

In [4]:
michael = {
    'user_id': 102,
    'name': {
        'first': 'Michael',
        'last': 'Palin'
    },
    'bio': {
        'dob': {
            'year': 1943,
            'month': 'May',
            'day': 5
        },
        'birthplace': {
            'country': 'United Kingdom',
            'city': 'Sheffield'
        }
    }
}

Write a function like this:

In [7]:
def validate(data, template):
    #implement
    # and return True/False
    # in the case of False, return a string describing
    # the first error encountered
    # in the case of True, string can be empty
    return state, error

That should return this:
- validate(john, template) --> True, ''
- validate(eric, template) --> False, 'mismatched keys: bio.birthplace.city'
- validate(michael, template) --> False, 'bad type: bio.deb.month'

Better yet, use exception instead of return codes and strings!

#### Solution

In [42]:
def match_keys(data, valid, path):
    data_keys = data.keys()
    valid_keys = valid.keys()
    
    extra_keys = data_keys - valid_keys
    missing_keys = valid_keys - data_keys
    
    if missing_keys or extra_keys:
        missing_msg = ('missing keys: ' +
                       ', '.join({path + '.' + str(key) 
                                 for key in missing_keys})
                      ) if missing_keys else ''
        extras_msg = ('extra keys: ' +
                      ', '.join({path + '.' + str(key)
                                for key in extra_keys})
                     ) if extra_keys else ''
        return False, ' '.join((missing_msg, extras_msg))
    else:
        return True, None

In [43]:
t = {'a': int, 'b': int, 'c': int, 'd': {}}
d = {'a': 'wrong type', 'b': 100, 'c':200, 'd': {'wrong', 'type'}}
is_ok, err_msg = match_keys(d, t, 'some.path')
print(is_ok, err_msg)

True None


In [44]:
d = {'a': None, 'b': None, 'c': None}
is_ok, err_msg = match_keys(d, t, 'some.path')
print(is_ok, err_msg)

False missing keys: some.path.d 


In [45]:
d = {'a': None, 'b': None, 'c': None, 'd': None, 'e': None}
is_ok, err_msg = match_keys(d, t, 'some.path')
print(is_ok, err_msg)

False  extra keys: some.path.e


In [46]:
d = {'a': None, 'b': None, 'c': None, 'e': None}
is_ok, err_msg = match_keys(d, t, 'some.path')
print(is_ok, err_msg)

False missing keys: some.path.d extra keys: some.path.e


Looks like this is working properly

In [47]:
d = {'a': None, 'b': None, 'e': None, 'f': None}
is_ok, err_msg = match_keys(d, t, 'some.path')
print(is_ok, err_msg)

False missing keys: some.path.c, some.path.d extra keys: some.path.e, some.path.f


In [50]:
def match_types(data, template, path):
    for key, value in template.items():
        if isinstance(value, dict):
            template_type = dict
        else:
            template_type = value
        data_value = data.get(key, object())
        if not isinstance(data_value, template_type):
            err_msg = ('incorrect type: ' + path + '.' + key + 
                       ' -> expected ' + template_type.__name__ +
                       ', found ' + type(data_value).__name__)
            return False, err_msg
    return True, None

In [51]:
t = {'a': int, 'b': str, 'c': {'d': int}}
d = {'a': 100, 'b': 'test', 'c': {'some': 'value'}}
match_types(d, t, 'some.path')

(True, None)

In [52]:
d = {'a': 100, 'b': 'test', 'c': 'unexpected'}
match_types(d, t, 'some.path')

(False, 'incorrect type: some.path.c -> expected dict, found str')

In [53]:
d = {'a': 100, 'b': 200, 'c': {}}
match_types(d, t, 'some.path')

(False, 'incorrect type: some.path.b -> expected str, found int')

In [54]:
d = {'a': '100', 'b': 200, 'c': {}}
match_types(d, t, 'some.path')

(False, 'incorrect type: some.path.a -> expected int, found str')

In [57]:
def recurse_validate(data, template, path):
    is_ok, err_msg = match_keys(data, template, path)
    if not is_ok:
        return False, err_msg
    
    is_ok, err_msg = match_types(data, template, path)
    if not is_ok:
        return False, err_msg
    
    dictionary_type_keys = {key for key, value in template.items()
                            if isinstance(value, dict)}
    
    for key in dictionary_type_keys:
        sub_path = path + '.' + str(key)
        sub_template = template[key]
        sub_data = data[key]
        is_ok, err_msg = recurse_validate(sub_data, sub_template, sub_path)
        if not is_ok:
            return False, err_msg
        
    return True, None

In [72]:
is_ok, err_msg = recurse_validate(john, template, 'root')
print(is_ok, err_msg)

True None


In [60]:
is_ok, err_msg = recurse_validate(eric, template, 'root')
print(is_ok, err_msg)

False missing keys: root.bio.birthplace.city 


In [61]:
is_ok, err_msg = recurse_validate(michael, template, 'root')
print(is_ok, err_msg)

False incorrect type: root.bio.dob.month -> expected int, found str


Am technically done now!

In [69]:
def validate(data, template):
    return recurse_validate(data, template, '')

In [70]:
persons = ((john, 'John'), (eric, 'Eric'), (michael, 'Michael'))

In [73]:
for person, name in persons:
    is_ok, err_msg = validate(person, template)
    print(f'{name}: valid={is_ok}: {err_msg}')

John: valid=True: None
Eric: valid=False: missing keys: .bio.birthplace.city 
Michael: valid=False: incorrect type: .bio.dob.month -> expected int, found str


In [74]:
class SchemaError(Exception):
    pass

In [76]:
def validate(data, template):
    is_ok, err_msg = recurse_validate(data, template, '')
    if not is_ok:
        raise SchemaError(err_msg)

In [79]:
validate(john, template)

In [80]:
validate(eric, template)

SchemaError: missing keys: .bio.birthplace.city 

In [81]:
validate(michael, template)

SchemaError: incorrect type: .bio.dob.month -> expected int, found str

In [83]:
try:
    for person, name in persons:
        validate(person, template)
except SchemaError as ex:
    print('Validation failed', str(ex)) 

Validation failed missing keys: .bio.birthplace.city 
