<a href="https://colab.research.google.com/github/andysingal/python-advanced/blob/main/basics/serde/notebooks/jsonserde1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# JSON Serialization
As we saw in the lecture, JSON is an extremely popular format for data interchange. Unlike pickling it is safe, because JSON data is basically just text. It's human readable too, which is a plus.

There are other formats too, such as XML - but XML does not translate directly to Python dictionaries like JSON does. JSON is a far more natural fit with Python - in fact, when we view the contents of a Python dictionary it reminds us of JSON.

In [1]:
d = {
    "name": {
        "first": "...",
        "last": "..."
    },
    "contact": {
        "phone": [
            {"type": "...", "number": "..."},
            {"type": "...", "number": "..."},
            {"type": "...", "number": "..."},
        ],
        "email": ["...", "...", "..."]
    },
    "address": {
        "line1": "...",
        "line2": "...",
        "city": "...",
        "country": "..."
    }
}

This is a standard Python dictionary, but if you look at the format, it is also technically JSON.

A JSON object contains key/value pairs, nested objects and arrays - just like a Python dictionary.

The big difference is that JSON is basically just one big string, while a Python dictionary is an object containing other objects.

So the big question when we want to "convert" (serialize) a Python object to JSON is how to represent Python objects as strings.

Conversely, if we want to load a JSON object into a Python dictionary, how do we "convert" (deserialize) the JSON value strings into a Python object.

By the way this concept of serializing/deserializing is also often called marshalling.

JSON has just a few data types it supports:

- Strings: must be delimited by double quotes
- Booleans: the values true and false
- Numbers: can be integers, or floats (including exponential notation, 1.3E2 for example), but are all considered floats in the standard
- Arrays: an ordered collection of zero or more items of any valid JSON type
Objects: an unordered collection of key:value pairs - the keys must be strings (so delimited by double quotes), and the values can be any valid JSON type.
NULL: a null object, denoted by null and equivalent to None in Python.
This means that the data types supported by JSON are relatively limited - but it turns out, as we'll see later, that it's not really a limitation.

Any object can be serialized into a string (think of the __repr__ method we've used often throughout this course) - in fact, any piece of information in your computer is a series of bits, as are characters - so theoretically any piece of information can be represented using characters. We'll come back to this in a later video. For now, we're going to stick with the basic data types supported by JSON and see what Python provides us for marshalling JSON.

We are going to use the json module:

In [2]:
import json

In [3]:
d1 = {"a": 100, "b": 200}
d1_json = json.dumps(d1)

In [4]:
d_json = '''
{
    "name": "John Cleese",
    "age": 82,
    "height": 1.96,
    "walksFunny": true,
    "sketches": [
        {
        "title": "Dead Parrot",
        "costars": ["Michael Palin"]
        },
        {
        "title": "Ministry of Silly Walks",
        "costars": ["Michael Palin", "Terry Jones"]
        }
    ],
    "boring": null    
}
'''

In [7]:
d = json.loads(d_json)
d

{'name': 'John Cleese',
 'age': 82,
 'height': 1.96,
 'walksFunny': True,
 'sketches': [{'title': 'Dead Parrot', 'costars': ['Michael Palin']},
  {'title': 'Ministry of Silly Walks',
   'costars': ['Michael Palin', 'Terry Jones']}],
 'boring': None}

In [8]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def toJSON(self):
        return dict(name=self.name, age=self.age)

In [9]:
p = Person('John', 82)

In [12]:
p.toJSON()

{'name': 'John', 'age': 82}

In [13]:
print(json.dumps({"john": p.toJSON()}, indent=2))

{
  "john": {
    "name": "John",
    "age": 82
  }
}


In fact, often we can make our life a little easier by using the vars function (or the __dict__ attribute) to return a dictionary of our object attributes:

In [14]:
p.__dict__

{'name': 'John', 'age': 82}

In [17]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def toJSON(self):
        return vars(self)

In [18]:
json.dumps(dict(john=p.toJSON()))

'{"john": {"name": "John", "age": 82}}'

#Custom JSON Serialization

As we saw in the previous video, certain data types cannot be serialized to JSON using Python's defaults. Here's a simple example of this:

As we can see Python raises a TypeError exception, stating that datetime objects are not JSON serializable.

So, we'll need to come up with our own serialization format.

For datetimes, the most common format is the ISO 8601 format - you can read up more about it here (https://en.wikipedia.org/wiki/ISO_8601), but basically the format is:

YYYY-MM-DD T HH:MM:SS

There are some variations for encoding timezones, but to keep things simple I am going to use timezone naive timestamps, and just use UTC everywhere.

We could use Python's string representation for datetimes:

In [19]:
from datetime import datetime
current = datetime.utcnow()

def format_iso(dt):
    return dt.strftime('%Y-%m-%dT%H:%M:%S')

format_iso(current)

'2023-05-17T05:39:02'

In [20]:
log_record = {'time': datetime.utcnow().isoformat(), 'message': 'testing'}

In [21]:
json.dumps(log_record)

'{"time": "2023-05-17T05:39:27.289337", "message": "testing"}'

In [22]:
log_record = {'time': datetime.utcnow(), 'message': 'testing'}

The problem is that log_record is now not JSON serializable!

What we have to do is write custom code to replace non-JSON serializable objects in our dictionary with custom representations. This can quickly become tedious and unmanageable if we deal with many dictionaries, and arbitrary structures.

Fortunately, Python's dump and dumps functions have some ways for us to define general serializations for non-standard JSON objects.

The simplest way is to specify a function that dump/dumps will call when it encounters something it cannot serialize:

In [23]:
def format_iso(dt):
    return dt.isoformat()

In [24]:
json.dumps(log_record, default=format_iso)

'{"time": "2023-05-17T05:41:40.754724", "message": "testing"}'

In [25]:
log_record = {
    'time1': datetime.utcnow(),
    'time2': datetime.utcnow(),
    'message': 'Testing...'
}

In [26]:
json.dumps(log_record, default=format_iso)

'{"time1": "2023-05-17T05:44:48.465930", "time2": "2023-05-17T05:44:48.465932", "message": "Testing..."}'

In [27]:
log_record = {
    'time': datetime.utcnow(),
    'message': 'Testing...',
    'other': {'a', 'b', 'c'}
}