### JSON serialization

The concept of serializing/deserializing is also often called **marshalling**.

In JSON only this types are supported:

* **Strings**: delimited by double quotes
* **Booleans**: `true` / `false`
* **Numbers**: integers and floats (including exponential notation, e.g. `1.3E2`) - all considered floats
* **Arrays**: an **ordered** collection of zero or more items of any valid JSON type
* **Objects**: an **unordered** collection of `key:value` pairs, keys must be strings
* **NULL**: a null object, denoted by `null` and equivalent to `None` in Python

Serializing a dictionary to JSON is done by functions `dump` and `dumps`: `dumps` serializes to a string, while `dump` writes the serialization to a file (*more accurately, a stream*).

Deserialization is done by functions `load` and `loads`, analogically.

In [1]:
import json
from decimal import Decimal
from fractions import Fraction
from datetime import datetime

In [2]:
d1 = {'a': 100, 'b': 200}
d1_json = json.dumps(d1)

print(type(d1_json))
d1_json

<class 'str'>


'{"a": 100, "b": 200}'

We can obtain a better looking JSON string by specifying an indent for the `dump` or `dumps` functions:

In [3]:
print(json.dumps(d1, indent=2))

{
  "a": 100,
  "b": 200
}


<br>

Deserialize the JSON string to dictionary:

In [4]:
d2 = json.loads(d1_json)

print(type(d2))
d2

<class 'dict'>


{'a': 100, 'b': 200}

In [5]:
# check of identity
print(d1 == d2)
print(d1 is d2)

True
False


**Caveat!** Remember that the JSON keys are always interpreted as strings.<br>
Python tuples are serialized into JSON lists and are deserialized into Python lists.<br>
So it is not always true that `d == loads(dumps(d))`.

**Caveat!** Order of keys is not guaranteed - JSON object is an unordered collection.

<br>

Some specific data types are not serialized into JSON by default.

In [6]:
try:
    json.dumps({'a': Decimal('0.1')})
except TypeError as ex:
    print(ex)

Object of type Decimal is not JSON serializable


In [7]:
try:
    json.dumps({"a": 1+1j})
except TypeError as ex:
    print(ex)

Object of type complex is not JSON serializable


Sometimes it is possible to get around the problem using the string representation of the object:

In [8]:
json.dumps({"a": str(Decimal(0.5))})

'{"a": "0.5"}'

But pay attention that we got back a string, not a number. And the result can be correct for some data but improper for other data:

In [9]:
json.dumps({"a": str(Decimal(0.1))})

'{"a": "0.1000000000000000055511151231257827021181583404541015625"}'

<br>

Serialization of custom objects.<br>
One approach is to write a custom JSON serializer in the class itself.

In [10]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def toJSON(self):
        return dict(name=self.name, age=self.age)

In [11]:
p = Person('John', 130)

print(p.toJSON())

{'name': 'John', 'age': 130}


Now we can serialize the object as follows:

In [12]:
print(json.dumps({"john": p.toJSON()}, indent=2))

{
  "john": {
    "name": "John",
    "age": 130
  }
}


<br>
Often we can make our life a little easier by using the `vars` function (or the `__dict__` attribute) to return a dictionary of our object attributes:

In [13]:
vars(p)

{'name': 'John', 'age': 130}

In [14]:
p.__dict__

{'name': 'John', 'age': 130}

In [15]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def toJSON(self):
        return vars(self)   # self.__dict__
    

print(json.dumps({"john": p.toJSON()}, indent=2))

{
  "john": {
    "name": "John",
    "age": 130
  }
}


This works but this approach is cumbersome.

<br>

<br>

### Custom JSON serialization

For datetimes the most common format is the **ISO 8601** format - you can read up about it at https://en.wikipedia.org/wiki/ISO_8601.<br>
Basically the format is:

*YYYY-MM-DD* **T** *HH:MM:SS*

In [16]:
current = datetime.utcnow()

current

datetime.datetime(2020, 11, 15, 5, 14, 52, 610100)

In [17]:
str(current)

'2020-11-15 05:14:52.610100'

This is not quite ISO-8601. We could write a custom formatter ourselves:

In [18]:
def format_iso(dt):
    return dt.strftime('%Y-%m-%dT%H:%M:%S')

(more info at https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior)

In [19]:
format_iso(current)

'2020-11-15T05:14:52'

Though for this particular case there is even a simpler approach:

In [20]:
current.isoformat(timespec='seconds')

'2020-11-15T05:14:52'

Now we can serialize our `datetime` object to JSON (*via preliminary conversion time data to string*):

In [21]:
log_record = {'time': current.isoformat(),
              'message': 'testing'}

json.dumps(log_record)

'{"time": "2020-11-15T05:14:52.610100", "message": "testing"}'

But that means that we must preliminary convert all necessary data to string.<br>
If we deal with many dictionaries and arbitrary structures, this can quickly become tedious and unmanageable.

The simplest way is to specify a function that `dump`/`dumps` will call when it encounters something it cannot serialize:

In [22]:
def format_iso(dt):
    return dt.isoformat()

json.dumps(log_record, default=format_iso)

'{"time": "2020-11-15T05:14:52.610100", "message": "testing"}'

<br>

But here we have limitations.<br>
The more universal way is to use a dispatcher. See https://docs.python.org/3/library/functools.html#functools.singledispatch

In [23]:
def custom_json_formatter(arg):
    if isinstance(arg, datetime):
        return arg.isoformat()
    elif isinstance(arg, set):
        return list(arg)

In [24]:
log_record = {
    'time': datetime.utcnow(),
    'message': 'Testing...',
    'other': {'a', 'b', 'c'}
}

In [25]:
json.dumps(log_record, default=custom_json_formatter)

'{"time": "2020-11-15T05:14:53.031563", "message": "Testing...", "other": ["a", "c", "b"]}'

<br>

More complicated example.

In [26]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return f'Point(x={self.x}, y={self.y})'


pt1 = Point(10, 10)
vars(pt1)

{'x': 10, 'y': 10}

In [27]:
log_record = dict(time=datetime.utcnow(),
                  message='Created new point',
                  point=pt1,
                  created_by=p)

In [28]:
def custom_json_formatter(arg):
    if isinstance(arg, datetime):
        return arg.isoformat()
    elif isinstance(arg, set):
        return list(arg)
    else:
        try:
            return arg.toJSON()
        except AttributeError:
            try:
                return vars(arg)
            except TypeError:
                return str(arg)

In [29]:
print(json.dumps(log_record, default=custom_json_formatter, indent=2))

{
  "time": "2020-11-15T05:14:53.164268",
  "message": "Created new point",
  "point": {
    "x": 10,
    "y": 10
  },
  "created_by": {
    "name": "John",
    "age": 130
  }
}


<br>

<br>

### Custom JSON encoding using JSONEncoder

Python know how to encode the "standard" types (`str`, `int`, `float`, `list`, `dict`, *etc*) using a special class - `JSONEncoder`.<br>
This class supports the following encodings:

|Python |JSON  |
|:----|:---|
| `dict` | object `{...}`|
| `list`, `tuple` | array `[...]` |
| `str`  | string `"..."`|
| `int`, `float` | number |
| `int` or `float` `Enums` | number |
| `bool` | `true` or `false` |
| `None` | `null` |

(see https://docs.python.org/3/library/json.html#json.JSONEncoder)

Anything beyond those Python types causes `TypeError` exception.

We can see how this class encodes objects by calling an instance of it directly:

In [30]:
default_encoder = json.JSONEncoder()
default_encoder.encode((1, 2, 3))

'[1, 2, 3]'

In [31]:
# for non-supported objects:
# default_encoder.encode({1, 2, 3})  #> TypeError: Object of type set is not JSON serializable
# default_encoder.encode(1+1j)       #> TypeError: Object of type complex is not JSON serializable

We can extend this `JSONEncoder` class and override the `default` method. 

In [32]:
class CustomJSONEncoder(json.JSONEncoder):
    def default(self, arg):
        if isinstance(arg, datetime):
            return arg.isoformat()
        else:
            super().default(arg)


custom_encoder = CustomJSONEncoder()

In [33]:
custom_encoder.encode(True)

'true'

In [34]:
custom_encoder.encode(datetime.utcnow())

'"2020-11-15T05:14:53.387720"'

And we can now use this custom encoder by specifying it when we use `dump`/`dumps`:

In [35]:
json.dumps(dict(name='test', time=datetime.utcnow()), cls=CustomJSONEncoder)

'{"name": "test", "time": "2020-11-15T05:14:53.422429"}'

One thing to note is that for both the `default` approach, and the `cls` approach, our method / encoder will only be used for types that Python cannot already serialize on its own (strings, integers, lists, *etc*).

<br>

<br>

### Custom JSON decoding

simple case:

In [36]:
j1 = '''
    {
        "name": "Python",
        "appeared": 1991,
        "versions": ["2.x", "3.x"]
    }
'''

json.loads(j1)

{'name': 'Python', 'appeared': 1991, 'versions': ['2.x', '3.x']}

<br>

More complicated case. Suppose we have a JSON object where any object that contains the key/value pair `"objecttype": "datetime"` is guaranteed to contain another key called `"value"` containing a date time in the format %Y-%m-%dT%H:%M:%S.

In [37]:
j2 = '''
{
    "time": {
        "objecttype": "datetime",
        "value": "2020-12-31T23:59:59"
    },
    "message": "created this json string"
}
'''

Here the easiest approach is to run through the dictionary and convert the datetime structures (schema) into actual datetime objects.

In [38]:
d = json.loads(j2)
d

{'time': {'objecttype': 'datetime', 'value': '2020-12-31T23:59:59'},
 'message': 'created this json string'}

In [39]:
for key, value in d.items():
    if (isinstance(value, dict) and 
        'objecttype' in value and 
        value['objecttype'] == 'datetime'):
        d[key] = datetime.strptime(value['value'], '%Y-%m-%dT%H:%M:%S')
        
d

{'time': datetime.datetime(2020, 12, 31, 23, 59, 59),
 'message': 'created this json string'}

But this approach has limitations.<br>
It is better to use an optional argument named `object_hook` that can reference a callable. This is very similar to the `default` argument we saw in the `dump`/`dumps` functions - but works for decoding instead of encoding. That callable, if specified, will be called for every value in the JSON object that is itself an object (including the root object). That dictionary will then be replaced by whatever that decoder returns.

In [40]:
def custom_decoder(arg):
    if 'objecttype' in arg and arg['objecttype'] == 'datetime':
        return datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
    else:
        return arg  # important, otherwise we lose anything that's not a date!


d = json.loads(j2, object_hook=custom_decoder)
d

{'time': datetime.datetime(2020, 12, 31, 23, 59, 59),
 'message': 'created this json string'}

We can extend this custom decoder to include other structures (schemas). E.g. fraction decoder:

In [41]:
def custom_decoder(arg):
    r = arg
    if 'objecttype' in arg:
        if r['objecttype'] == 'datetime':
            r = datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype'] == 'fraction':
            r = Fraction(arg['numerator'], arg['denominator'])
    return r

In [42]:
j3 = '''
    {
        "cake": "yummy chocolate cake",
        "myShare": {
            "objecttype": "fraction",
            "numerator": 1,
            "denominator": 8
        },
        "eaten": {
            "at": {
                "objecttype": "datetime",
                "value": "2018-10-21T21:30:00"
                },
            "time_taken": "30 seconds"
        }
    }
'''


d = json.loads(j3, object_hook=custom_decoder)
d

{'cake': 'yummy chocolate cake',
 'myShare': Fraction(1, 8),
 'eaten': {'at': datetime.datetime(2018, 10, 21, 21, 30),
  'time_taken': '30 seconds'}}

<br>

In case of having a class:

In [43]:
class Person:
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn
        
    def __repr__(self):
        return f'Person(name={self.name}, ssn={self.ssn})'

In [44]:
j = '''
    {
        "accountHolder": {
            "objecttype": "person",
            "name": "Eric Idle",
            "ssn": 100
        },
        "created": {
            "objecttype": "datetime",
            "value": "2018-10-21T03:00:00"
        }
    }
'''

In [45]:
def custom_decoder(arg):
    ret_value = arg
    if 'objecttype' in arg:
        if arg['objecttype'] == 'datetime':
            ret_value = datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype'] == 'fraction':
            ret_value = Fraction(arg['numerator'], arg['denominator'])
        elif arg['objecttype'] == 'person':
            ret_value = Person(arg['name'], arg['ssn'])
    return ret_value

In [46]:
d = json.loads(j, object_hook=custom_decoder)

d

{'accountHolder': Person(name=Eric Idle, ssn=100),
 'created': datetime.datetime(2018, 10, 21, 3, 0)}

<br>

<br>

### Using JSONDecoder

Just like we can use a subclass of `JSONEncoder` to customize our json encodings, we can use a subclass of the default `JSONDecoder` class to customize decoding our json strings.

It works quite differently from the `JSONEncoder` subclassing though.

When we subclass `JSONEncoder` we override the `default` method which then allows us to intercept encoding of specific types of objects, and delegate back to the parent class what we don't want to handle specifically.

With the `JSONDecoder` class we override the `decode` function which passes us the **entire** JSON as a **string** and we have to return whatever Python object we want. There's no delegating anything back to the parent class unless we want to completely skip customizing the output.

In [47]:
j = '''
    {
        "a": 100,
        "b": [1, 2, 3],
        "c": "python",
        "d": {
            "e": 4,
            "f": 5.5
        }
    }
'''

In [49]:
class CustomDecoder(json.JSONDecoder):
    def decode(self, arg):
        print("decode:", type(arg), arg)
        return "a simple string object"

    
json.loads(j, cls=CustomDecoder)

decode: <class 'str'> 
    {
        "a": 100,
        "b": [1, 2, 3],
        "c": "python",
        "d": {
            "e": 4,
            "f": 5.5
        }
    }



'a simple string object'

<br>

In [50]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return f'Point(x={self.x}, y={self.y})'

In [51]:
j_points = '''
{
    "points": [
        [10, 20],
        [-1, -2],
        [0.5, 0.5]
    ]
}
'''

j_other = '''
{
    "a": 1,
    "b": 2
}
'''

In [None]:
class CustomDecoder(json.JSONDecoder):
    def decode(self, arg):
        obj = json.loads(arg)
        if 'points' in obj:  # top level
            obj['points'] = [Point(x, y) 
                             for x, y in obj['points']]
        return obj

In [55]:
json.loads(j_points, cls=CustomDecoder)

'parsing object for points'

In [56]:
json.loads(j_other, cls=CustomDecoder)

{'a': 1, 'b': 2}

Decoding JSON into custom objects is not an easy task, that's why there are several 3rd party libraries that allow to serialize and deserialize JSON objects that follow a certain schema.