### Custom JSON Decoding

#### Loading JSON

We have seen how to serialize Python objects to JSON

Now we need to look at deserializing JSON to Python objects -> load / loads

We can handle standard JSON data types "out of the box", but we should go beyond that to handle new types too!

For instance, if we had:

In [1]:
j = '{"createdAt:": "2020-12-31T23:59:59"}'

And we wanted to deserialize this and get the (obvious) datetime object, we would only get a string!

In [2]:
import json
d = json.loads(j)
print(d)

{'createdAt:': '2020-12-31T23:59:59'}


#### One Approach

Use some custom encoding scheme to define both the value and the type of some entry in the JSON file

For example, when encoding a timestamp we could di it as follows:

In [3]:
j = '''
    { "createdAt:":
        {
            "objecttype": "isodatetime",
            "value" : "2020-12-31T23:59:59"
        }
    }
'''

Now, we can loat the JSON string into a Python dictionary
- and iteratre through dictionary (recursively), to find object with an *objecttype == isodatetime*
  - then replace createdAt with the converted timestamp

But this is very tedious, we need to load JSON, iterate recursively through the dictionary, and convert as needed

#### A Slight Improvement

load and loads have an argument namged object_hook
- loads(j_string, object_hook=my_func)
 - my_func is called for every object in the JSON data

For example:

In [10]:
j = '''
    { 
        "a": 1.0,
        "b": {
            "sub1": [1, 2, 3],
            "sub2": {
                "x": 100,
                "y": 200
            }
        }
    }
'''

loads first parses JSON into a dictionary

object_hook will call for every dictionary (object) in the dictionary
- b dictionary
- sub2 dictionary
- root dictionary (called last)

That dictionary is then replaced by return value of my_func (it basically handles the recursion for us)

#### Schemas
Deserializing custom JSON types and object is difficult
- in general, we need to know the structure of the JSON data in order to custom deserialize
- this is referred to as the schema
 - a pre-defined agreement on how the JSON is going to be structured or serialized
- If JSON has a pre-determined schema, then we can handle custom deserialization
- schema might be for the entire JSON, or for a sub-component only

For example:

In [5]:
{ "createdAt:":
    {
        "objecttype": "isodatetime",
        "value" : "2020-12-31T23:59:59"
    }
}

{'createdAt:': {'objecttype': 'isodatetime', 'value': '2020-12-31T23:59:59'}}

#### Overriding Basic Type Serializations

Notice that **object_hook** only allows us to customize deserialization of objects

What about numbers? -> by default floats for real numbers and ints for whole numbers

What if we want Decimal instead of float, or binary representation for integers?

- We can override the way these data types are handled by using some extra arguments in load/loads
 - parse_float
 - parse_int
 - parse_constant

The above arguments provide a custom callable which has a single argument. The argument will be the original string in the JSON. It then returns the parsed value

Note that there are no overrides for strings!

Example

In [11]:
from decimal import Decimal

def make_decimal(arg):
    return Decimal(arg)

print(json.loads(j, parse_float=make_decimal))

{'a': Decimal('1.0'), 'b': {'sub1': [1, 2, 3], 'sub2': {'x': 100, 'y': 200}}}


Notice that a is now a Decimal type

#### Another argument - object_pairs_hook

- is related to object_hook
- we cannot use both at the same time (if both are specified, then object_hook is ignored)


object_hook passes the deserialized dictionary to the callable
- there is no guarantee of the order of elements in the dictionary

What is order of elements in JSON is important? -> lists perserve order
- instead of callable receiving a dictionary it receives a list of the key/value pairs
- key/value pairs are provided as a tuple with two elements

- object_hook -> {"a': 1, "b": 2}
- object_pairs_hook -> [("a", 1), ("b",2)]

#### Mixing Basic Type Overrides and Object Hooks

We can specify both parse_int, parse_float, ... etc and object_hook 

Remember that object_hook (and object_pairs_hook) callables receive a parsed object

This means that the parse_...(if specified) is used first, before we receive the parsed object in the hooks

#### Code Examples

In [12]:
import json

In [14]:
j = '''
{
    "name": "Python",
    "age": 27,
    "versions": ["2.x", "3.x"]
}
'''

In [15]:
json.loads(j)

{'name': 'Python', 'age': 27, 'versions': ['2.x', '3.x']}

In [16]:
p = '''
{
    "time": "2018-10-21T09:14:00",
    "message": "created this json string"
}
'''

In [17]:
json.loads(p)

{'time': '2018-10-21T09:14:00', 'message': 'created this json string'}

In [18]:
p = '''
{
    "time": {
        "objecttype": "datetime",
        "value": "2018-10-21T09:14:00"
        },
    "message": "created this json string"
}
'''

In [19]:
d = json.loads(p)

In [21]:
from pprint import pprint
pprint(d)

{'message': 'created this json string',
 'time': {'objecttype': 'datetime', 'value': '2018-10-21T09:14:00'}}


In [22]:
from datetime import datetime

In [24]:
for key, value in d.items():
    if(isinstance(value, dict) and
      "objecttype" in value and
      value["objecttype"] == "datetime"
      ):
        d[key] = datetime.strptime(value['value'], '%Y-%m-%dT%H:%M:%S')

In [25]:
d

{'time': datetime.datetime(2018, 10, 21, 9, 14),
 'message': 'created this json string'}

In [26]:
j = '''
{
    "cake": "yummy chocolate cake",
    "myShare" :{
    "objecttype": "fraction",
    "numerator": 1,
    "denominator": 8
    }
}
'''

In [27]:
d = json.loads(j)

In [28]:
d

{'cake': 'yummy chocolate cake',
 'myShare': {'objecttype': 'fraction', 'numerator': 1, 'denominator': 8}}

In [29]:
from fractions import Fraction

for key, value in d.items():
    if(isinstance(value, dict) and
      "objecttype" in value and
      value["objecttype"] == "fraction"
      ):
        numerator = value['numerator']
        denominator = value['denominator']
        d[key] = Fraction(numerator, denominator)

In [30]:
d

{'cake': 'yummy chocolate cake', 'myShare': Fraction(1, 8)}

In [31]:
def custom_decoder(arg):
    print('decoding: ', arg)
    return arg

In [33]:
j = '''
{
    "a": 1,
    "b": 2,
    "c": {
        "c.1": 1,
        "c.2": 2,
        "c.3": {
            "c3.1": 1,
            "c3.2": 2
        }
    }
        
}
'''

In [34]:
d = json.loads(j, object_hook=custom_decoder)

decoding:  {'c3.1': 1, 'c3.2': 2}
decoding:  {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}}
decoding:  {'a': 1, 'b': 2, 'c': {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}}}


In [35]:
j = '''
{
    "time": {
        "objecttype": "datetime",
        "value": "2018-10-21T09:14:00"
        },
    "message": "created this json string"
}
'''

In [54]:
def custom_decoder(arg):
    if "objecttype" in arg and arg['objecttype'] == 'datetime':
        return datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
    else:
        return arg

In [55]:
custom_decoder(dict(objecttype="datetime", value="2018-10-01T13:30:45"))

datetime.datetime(2018, 10, 1, 13, 30, 45)

In [56]:
custom_decoder({'a': 1})

{'a': 1}

In [57]:
json.loads(j, object_hook=custom_decoder)

{'time': datetime.datetime(2018, 10, 21, 9, 14),
 'message': 'created this json string'}

In [58]:
j = '''
{
    "times": {
        "created": {
            "objecttype": "datetime",
            "value": "2018-10-21T09:14:15"
            },
        "updated": {
            "objecttype": "datetime",
            "value": "2018-10-22T10:00:05"
            }
        },
        "message": "log message here..."
}
'''

In [59]:
json.loads(j, object_hook=custom_decoder)

{'times': {'created': datetime.datetime(2018, 10, 21, 9, 14, 15),
  'updated': datetime.datetime(2018, 10, 22, 10, 0, 5)},
 'message': 'log message here...'}

In [60]:
def custom_decoder(arg):
    ret_value = arg
    if 'objecttype' in arg:
        if arg['objecttype'] == 'datetime':
            ret_value = datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype'] == 'fraction':
            ret_value = Fraction(arg['numerator'], arg['denominator'])
    return ret_value

In [61]:
def custom_decoder(arg):
    if 'objecttype' in arg:
        if arg['objecttype'] == 'datetime':
            return datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype'] == 'fraction':
            return Fraction(arg['numerator'], arg['denominator'])
        return arg
    return arg

In [62]:
j = '''
{
    "cake": "yummy chocolate cake",
    "myShare" :{
    "objecttype": "fraction",
    "numerator": 1,
    "denominator": 8
    },
    "eaten": {
        "at": {
            "objecttype": "datetime",
            "value": "2018-10-21T21:30:00"
            },
        "time_taken": "30 seconds"
    }
}
'''

In [63]:
json.loads(j, object_hook=custom_decoder)

{'cake': 'yummy chocolate cake',
 'myShare': Fraction(1, 8),
 'eaten': {'at': datetime.datetime(2018, 10, 21, 21, 30),
  'time_taken': '30 seconds'}}

In [64]:
class Person:
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn
        
    def __repr__(self):
        return f'Person(name={self.name}, ssn={self.ssn})'

In [65]:
j = '''
{
    "accountHolder": {
        "objecttype": "person",
        "name": "Eric Idle",
        "ssn": 100
    },
    "created": {
        "objecttype": "datetime",
        "value": "2018-10-21T03:00:00"
    }
}
'''

In [66]:
def custom_decoder(arg):
    if 'objecttype' in arg:
        if arg['objecttype'] == 'datetime':
            return datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype'] == 'fraction':
            return Fraction(arg['numerator'], arg['denominator'])
        elif arg['objecttype'] == 'person':
            return Person(arg['name'], arg['ssn'])
        return arg
    return arg

In [67]:
d = json.loads(j, object_hook=custom_decoder)

In [68]:
d

{'accountHolder': Person(name=Eric Idle, ssn=100),
 'created': datetime.datetime(2018, 10, 21, 3, 0)}

In [69]:
class Person:
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn
        
    def __repr__(self):
        return f'Person(name={self.name}, ssn={self.ssn})'
    
    def toJSON(self):
        return dict(objecttype='person', name=self.name, ssn=self.ssn)

In [78]:
def custom_decoder(arg):
    if 'objecttype' in arg:
        if arg['objecttype'] == 'datetime':
            return datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype'] == 'fraction':
            return Fraction(arg['numerator'], arg['denominator'])
        elif arg['objecttype'] == 'person':
            return Person.toJSON
        return arg
    return arg

In [79]:
d = json.loads(j, object_hook=custom_decoder)

In [80]:
d

{'accountHolder': <function __main__.Person.toJSON(self)>,
 'created': datetime.datetime(2018, 10, 21, 3, 0)}

In [81]:
from decimal import Decimal

In [99]:
def make_decimal(arg):
    print('Float received: ', type(arg), arg)
    return Decimal(arg)

In [100]:
j = '''
{
    "a": 100,
    "b": 0.2,
    "c": 0.5
}
'''

In [101]:
d = json.loads(j, parse_float=make_decimal)

Float received:  <class 'str'> 0.2
Float received:  <class 'str'> 0.5


In [102]:
d

{'a': 100, 'b': Decimal('0.2'), 'c': Decimal('0.5')}

In [103]:
d = json.loads(j, parse_int=make_decimal)

Float received:  <class 'str'> 100


In [104]:
d

{'a': Decimal('100'), 'b': 0.2, 'c': 0.5}

In [105]:
def make_int_binary(arg):
    print('Int received: ', type(arg), arg)
    return bin(int(arg))

In [106]:
d = json.loads(j, parse_int=make_int_binary, parse_float=make_decimal)

Int received:  <class 'str'> 100
Float received:  <class 'str'> 0.2
Float received:  <class 'str'> 0.5


In [107]:
d

{'a': '0b1100100', 'b': Decimal('0.2'), 'c': Decimal('0.5')}

In [108]:
def make_constant_none(arg):
    print('Constant received: ', type(arg), arg)
    return None

In [109]:
j = '''
{
    "a": Infinity,
    "b": true,
    "c": null
}
'''

In [111]:
d = json.loads(j, parse_float=make_decimal, parse_constant=make_constant_none)

Constant received:  <class 'str'> Infinity


In [112]:
d

{'a': None, 'b': True, 'c': None}

In [113]:
help(json.loads)

Help on function loads in module json:

loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
    Deserialize ``s`` (a ``str``, ``bytes`` or ``bytearray`` instance
    containing a JSON document) to a Python object.
    
    ``object_hook`` is an optional function that will be called with the
    result of any object literal decode (a ``dict``). The return value of
    ``object_hook`` will be used instead of the ``dict``. This feature
    can be used to implement custom decoders (e.g. JSON-RPC class hinting).
    
    ``object_pairs_hook`` is an optional function that will be called with the
    result of any object literal decoded with an ordered list of pairs.  The
    return value of ``object_pairs_hook`` will be used instead of the ``dict``.
    This feature can be used to implement custom decoders.  If ``object_hook``
    is also defined, the ``object_pairs_hook`` takes priority.
    
    ``parse

In [115]:
def custom_decoder(arg):
    print('decoding: ', arg)
    return arg

In [116]:
j = '''
{
    "a": 1,
    "b": 2,
    "c": {
        "c.1": 1,
        "c.2": 2,
        "c.3": {
            "c3.1": 1,
            "c3.2": 2
        }
    }
        
}
'''

In [118]:
json.loads(j, object_hook=custom_decoder)

decoding:  {'c3.1': 1, 'c3.2': 2}
decoding:  {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}}
decoding:  {'a': 1, 'b': 2, 'c': {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}}}


{'a': 1, 'b': 2, 'c': {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}}}

In [119]:
def custom_pairs_decoder(arg):
    print('decoding: ', arg)
    return arg

In [121]:
json.loads(j, object_pairs_hook=custom_pairs_decoder)

decoding:  [('c3.1', 1), ('c3.2', 2)]
decoding:  [('c.1', 1), ('c.2', 2), ('c.3', [('c3.1', 1), ('c3.2', 2)])]
decoding:  [('a', 1), ('b', 2), ('c', [('c.1', 1), ('c.2', 2), ('c.3', [('c3.1', 1), ('c3.2', 2)])])]


[('a', 1),
 ('b', 2),
 ('c', [('c.1', 1), ('c.2', 2), ('c.3', [('c3.1', 1), ('c3.2', 2)])])]

In [124]:
def custom_pairs_decoder(arg):
    print('decoding: ', arg)
    return {k: v for k, v in arg}

In [125]:
json.loads(j, object_pairs_hook=custom_pairs_decoder)

decoding:  [('c3.1', 1), ('c3.2', 2)]
decoding:  [('c.1', 1), ('c.2', 2), ('c.3', {'c3.1': 1, 'c3.2': 2})]
decoding:  [('a', 1), ('b', 2), ('c', {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}})]


{'a': 1, 'b': 2, 'c': {'c.1': 1, 'c.2': 2, 'c.3': {'c3.1': 1, 'c3.2': 2}}}

In [126]:
j = '''
{
    "a": [1, 2, 3, 4, 5],
    "b": 100,
    "c": 10.5,
    "d": NaN,
    "e": null,
    "f": "python"
}
'''

In [127]:
def float_handler(arg):
    print('float handler', type(arg), arg)
    return float(arg)

In [135]:
def int_handler(arg):
    print('int handler', type(arg), arg)
    return int(arg)

In [136]:
def const_handler(arg):
    print('const handler', type(arg), arg)
    return None

In [137]:
def obj_hook(arg):
    print('obj hook ', arg)
    return arg

In [138]:
json.loads(j)

{'a': [1, 2, 3, 4, 5], 'b': 100, 'c': 10.5, 'd': nan, 'e': None, 'f': 'python'}

In [139]:
json.loads(j, 
           object_hook=obj_hook,
           parse_float=float_handler,
           parse_int=int_handler,
           parse_constant=const_handler
          )

int handler <class 'str'> 1
int handler <class 'str'> 2
int handler <class 'str'> 3
int handler <class 'str'> 4
int handler <class 'str'> 5
int handler <class 'str'> 100
float handler <class 'str'> 10.5
const handler <class 'str'> NaN
obj hook  {'a': [1, 2, 3, 4, 5], 'b': 100, 'c': 10.5, 'd': None, 'e': None, 'f': 'python'}


{'a': [1, 2, 3, 4, 5],
 'b': 100,
 'c': 10.5,
 'd': None,
 'e': None,
 'f': 'python'}

In [140]:
j = '''
{
    "a": [1, 2],
    "b": {
        "c": 10.5,
        "d": NaN
        }
}
'''

In [141]:
json.loads(j, 
           object_hook=obj_hook,
           parse_float=float_handler,
           parse_int=int_handler,
           parse_constant=const_handler
          )

int handler <class 'str'> 1
int handler <class 'str'> 2
float handler <class 'str'> 10.5
const handler <class 'str'> NaN
obj hook  {'c': 10.5, 'd': None}
obj hook  {'a': [1, 2], 'b': {'c': 10.5, 'd': None}}


{'a': [1, 2], 'b': {'c': 10.5, 'd': None}}