# Serialization and Deserialization
Notes from [Deep Dive 3](https://www.udemy.com/course/python-3-deep-dive-part-3/) section 7. Topics covered:

1\. [Pickling](#pickling)

2\. [JSON Serialization](#json-serialization)

3\. [Custom JSON Encoding](#custom-json-encoding)

* [Specify default function to handle a non-serializable type](#handle-non-serializable-data)
* [Encoding classes with toJSON](#encoding-classes)
* [Generalize default function](#generalize-default-function)
* [singledispatch decorator](#singledispatch)

4\. [JSONEncoder class](#jsonencoder-class)

5\. [Custom JSON Decoding](#custom-json-decoding)

* [Decoding sequence of nested dicts](#decoding-sequence-of-nested-dicts)
* [objecttype to identify encoded objects](#objecttype)
* [Decoding using specific parse parameters](#parse-parameters)
* [Overriding Basic Type Serializations](#overriding-basic-type-serializations)

6\. [JSONDecoder class](#jsondecoder-class)

7\. [JSON Schema](#json-schema)

8\. [Marshmallow](#marshmallow)


<hr>

<a id='pickling'></a>
## 1. Pickling
Serialize and deserialize using `pickle` module.

Pickle is a `binary` serialization (by default)

Focus on dictionaries, but can be used on other object types.

Unpickling (deserialization) **can execute code**, hence unpickle **only trusted data**.

**Usage**:
* `dump` - pickle to file
* `load` - unpickle from file
* `dumps` - returns a (string) pickled representation 
* `loads` - unpickle from supplied argument

<hr>

**Example 1**:

Serialize and deserialize a dictionary with different values types (int, tuple, list, datetime).<br>
Compare id and eqality between origin and deserialized objects.

In [1]:
import pickle
from datetime import datetime

d = {
    'a': 100,
    'b': (1, 2, 3),
    'c': [1, 2, 3],
    'd': {'x': 1+1j, 'y': datetime.utcnow()}
}

In [2]:
d

{'a': 100,
 'b': (1, 2, 3),
 'c': [1, 2, 3],
 'd': {'x': (1+1j), 'y': datetime.datetime(2021, 7, 10, 22, 19, 26, 893150)}}

In [3]:
ser = pickle.dumps(d)
ser

b'\x80\x04\x95\x8b\x00\x00\x00\x00\x00\x00\x00}\x94(\x8c\x01a\x94Kd\x8c\x01b\x94K\x01K\x02K\x03\x87\x94\x8c\x01c\x94]\x94(K\x01K\x02K\x03e\x8c\x01d\x94}\x94(\x8c\x01x\x94\x8c\x08builtins\x94\x8c\x07complex\x94\x93\x94G?\xf0\x00\x00\x00\x00\x00\x00G?\xf0\x00\x00\x00\x00\x00\x00\x86\x94R\x94\x8c\x01y\x94\x8c\x08datetime\x94\x8c\x08datetime\x94\x93\x94C\n\x07\xe5\x07\n\x16\x13\x1a\r\xa0\xde\x94\x85\x94R\x94uu.'

In [4]:
deser = pickle.loads(ser)
deser

{'a': 100,
 'b': (1, 2, 3),
 'c': [1, 2, 3],
 'd': {'x': (1+1j), 'y': datetime.datetime(2021, 7, 10, 22, 19, 26, 893150)}}

<hr>

**Equality and identity**:

In [5]:
print(d)
print(deser)

{'a': 100, 'b': (1, 2, 3), 'c': [1, 2, 3], 'd': {'x': (1+1j), 'y': datetime.datetime(2021, 7, 10, 22, 19, 26, 893150)}}
{'a': 100, 'b': (1, 2, 3), 'c': [1, 2, 3], 'd': {'x': (1+1j), 'y': datetime.datetime(2021, 7, 10, 22, 19, 26, 893150)}}


In [6]:
print(id(d))
print(id(deser))

2085078577664
2085078516352


Both objects are equal but they are different objects:

In [7]:
d == deser

True

In [8]:
deser is d

False

<hr>

**Example 2**:

When Python serializes the dictionary, it behaves very similarly to serializing a deep copy of the dictionary. <br>
The same thing happens with other collections types such as lists, sets, and tuples.

In [9]:
d1 = {'a': 10, 'b': 20}
d2 = {'x': 100, 'y': d1, 'z': d1}

In [10]:
ser = pickle.dumps(d2)
d3 = pickle.loads(ser)

In [11]:
print(d2)
print(d3)

{'x': 100, 'y': {'a': 10, 'b': 20}, 'z': {'a': 10, 'b': 20}}
{'x': 100, 'y': {'a': 10, 'b': 20}, 'z': {'a': 10, 'b': 20}}


In [12]:
print(d2 == d3)
print(d2['y'] == d3['y'])
print(d2['y'] == d2['z'])
print(d3['y'] == d3['z'])
print(d2['y'] == d1)
print(d3['y'] == d1)

True
True
True
True
True
True


<hr>

Update `d1` does not affect serialized/deserialized object:

In [13]:
d1['c'] = 3

In [14]:
print(d2)
print(d3)

{'x': 100, 'y': {'a': 10, 'b': 20, 'c': 3}, 'z': {'a': 10, 'b': 20, 'c': 3}}
{'x': 100, 'y': {'a': 10, 'b': 20}, 'z': {'a': 10, 'b': 20}}


In [15]:
print(d2 == d3)
print(d2['y'] == d3['y'])
print(d2['y'] == d2['z'])
print(d3['y'] == d3['z'])
print(d2['y'] == d1)
print(d3['y'] == d1)

False
False
True
True
True
False


<hr>

__Example 3__:

Check the relationship of an object appearing multiple times within a dictionary after deserialization. 

In [16]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    
    def __eq__(self, other):
        ''' 
        For test eqality between two objects of Person class.
        Based only on name and age.
        It should test if other is an instance but skipped for simplicity
        '''
        return self.name == other.name and self.age == other.age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'

In [17]:
john = Person('John Cleese', 79)
eric = Person('Eric Idle', 75)
michael = Person('Michael Palin', 75)

In [18]:
parrot_sketch = {
    'title': 'Parrot Sketch',
    'actors': [john, michael]
}

ministry_sketch = {
    'title':  'The Ministry of Silly Walks',
    'actors': [john, michael]
}

joke_sketch = {
    'title': 'Funniest Joke in the World',
    'actors': [eric, michael]
}

In [19]:
fan_favorites = {
    'user_1': [parrot_sketch, joke_sketch],
    'user_2': [parrot_sketch, ministry_sketch]
}

In [20]:
from pprint import pprint
pprint(fan_favorites)

{'user_1': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=Eric Idle, age=75),
                        Person(name=Michael Palin, age=75)],
             'title': 'Funniest Joke in the World'}],
 'user_2': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'The Ministry of Silly Walks'}]}


In [21]:
ser = pickle.dumps(fan_favorites)
new_fan_favorites = pickle.loads(ser)
pprint(new_fan_favorites)

{'user_1': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=Eric Idle, age=75),
                        Person(name=Michael Palin, age=75)],
             'title': 'Funniest Joke in the World'}],
 'user_2': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'The Ministry of Silly Walks'}]}


In [22]:
fan_favorites == new_fan_favorites

True

__Verify relative pointers in both dictionaries:__

Both user_1 and user_2 are pointing to the same object of parrot_sketch<br>
Serialization and deserialization maintain the relative reference.<br>
Both new objects are the same as well, but they are different objects than the original.

In [23]:
id(fan_favorites['user_1'][0]), id(fan_favorites['user_2'][0])

(2085078517184, 2085078517184)

In [24]:
id(new_fan_favorites['user_1'][0]), id(new_fan_favorites['user_2'][0])

(2085078548608, 2085078548608)

<hr>

<a id='json-serialization'></a>
## 2. JSON serialization

`J`ava<br>`S`cript<br>`O`bject<br>`N`otation

JSON is an extremely popular format for data interchange. <br>
Unlike pickling it is safe, because JSON data is basically just text. <br>
It's human readable too.

JSON has just a few data types it supports:

* **Strings**: must be delimited by double quotes
* **Booleans**: the values `true` and `false`
* **Numbers**: can be integers, or floats (including exponential notation, `1.3E2` for example), but are all considered floats in the standard
* **Arrays**: an **ordered** collection of zero or more items of any valid JSON type
* **Objects**: an **unordered** collection of `key:value` pairs - the keys must be strings (so delimited by double quotes), and the values can be any valid JSON type.
* **NULL**: a null object, denoted by `null` and equivalent to `None` in Python.

**JSON limitations**:
* JSON keys must be strings (Python dictionary keys just need to be hashable)
    * JSON keys have to be strings or will get converted to strings, hence deserialized object will no longer match origin
* JSON doesn't recognise tuples and turns them into lists
* JSON doesn't recognise sets, decimals, dates

<hr> 

**Example 4**:

Serialize and deserialize a simple dictionary.

In [25]:
import json

In [26]:
d1 = {"a": 100, "b": 200}
d1_json = json.dumps(d1)
d1_json

'{"a": 100, "b": 200}'

In [27]:
d2 = json.loads(d1_json)
d1 == d2

True

<hr>

JSON keys must be strings or will get converted into strings:

In [28]:
d1 = {1: 100, 2: 200}
d1_json = json.dumps(d1)
d1_json

'{"1": 100, "2": 200}'

In [29]:
d2 = json.loads(d1_json)
print(d1)
print(d2)

{1: 100, 2: 200}
{'1': 100, '2': 200}


<hr>

<a id='custom-json-encoding'></a>
## 3. Custom JSON Encoding

Certain data types cannot be serialized to JSON using Python's defaults.

It is possible to serialize them with a custom serialization format provided to `dump`/`dumps` function argument `default`.

When probided Python will call `default` if the encouders a type it **cannot serialize**.

Requirement is that the argument provided to default is `callable`. 

<hr>

<a id='handle-non-serializable-data'></a>
### 3.1 Specify default function to handle a non-serializable type

**Example 5**:

Encoding datetime which by default is not serializable.

In [30]:
import json
from datetime import datetime

current = datetime.utcnow()
current

datetime.datetime(2021, 7, 10, 22, 19, 27, 362893)

In [31]:
try:
    json.dumps(current)
except TypeError as ex:
    print(f'TypeError: {ex}')

TypeError: Object of type datetime is not JSON serializable


<hr>

*Solution*: <br>
`dump`/`dumps` allow to specify a default method how to treat unknown objects.

Callable function will return string which contains date and time in ISO 8601 format: "YYYY-MM-DD**T**hh:mm:ss"

In [32]:
def format_iso(dt):
    """ Transforms datetime to string in standardized ISO format """
    return dt.strftime('%Y-%m-%dT%H:%M:%S')

format_iso(current)

'2021-07-10T22:19:27'

In [33]:
# Similar result could be achieved with build up datatime function (if fractional miliseconds are acceptable)
current.isoformat()

'2021-07-10T22:19:27.362893'

In [34]:
json.dumps(current, default=format_iso)

'"2021-07-10T22:19:27"'

<hr>

Serialize a dictionary with datetime value:

In [35]:
log_record = {'time': current,
              'message': 'testing'}

In [36]:
json.dumps(log_record, default=format_iso)

'{"time": "2021-07-10T22:19:27", "message": "testing"}'

<hr>

**Example 6**:

Encoding multiple non-serializable objects (datetime and set).

This requires the dumps `default` argument to handle both types.

In [37]:
log_record = {'time': datetime.utcnow().isoformat(),
              'message': 'testing',
              'args': {1, 2, 3}}

In [38]:
def custom_json_formatter(arg):
    """ If argumet is datetime convert to str, if set than convert to list """
    if isinstance(arg, datetime):
        return arg.isoformat()
    elif isinstance(arg, set):
        return list(arg)

In [39]:
json.dumps(log_record, default=custom_json_formatter)

'{"time": "2021-07-10T22:19:27.474595", "message": "testing", "args": [1, 2, 3]}'

<hr>

<a id='encoding-classes'></a>
### 3.2 Encoding classes with toJSON

**Example 7**:

Encoding class objects:<br>
Custom class object is not serializable with JSON.

*Solution*: `toJSON()` serializer method will return the JSON representation of the Object. i.e., It will convert custom Python Object to __JSON string__.\
Point the JSON serializer to this method when dealing with the Object.

In [40]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        self.create_df = datetime.utcnow()
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def toJSON(self):
        return {
            'name': self.name,
            'age': self.age, 
            'create_df': self.create_df  # default function will support to handle datetime
        }

In [41]:
p = Person('Monty', 100)

In [42]:
log_record = dict(time=datetime.utcnow(),
                  message='Created new person record',
                  person=p)

In [43]:
print(json.dumps(log_record, default=custom_json_formatter, indent=4))

{
    "time": "2021-07-10T22:19:27.554383",
    "message": "Created new person record",
    "person": null
}


<hr>

Default method specified to handle unknown objects doesn't have specified how to handle Person object. <br>
The function executed without returning anything hence after json serialization it stores `null`.

Updated custom default function:

In [44]:
def custom_json_formatter(arg):
    """ If argumet is datetime convert to str, if set than convert to list """
    if isinstance(arg, datetime):
        return arg.isoformat()
    elif isinstance(arg, set):
        return list(arg)
    elif isinstance(arg, Person):
        return arg.toJSON()

In [45]:
json.dumps(log_record, default=custom_json_formatter)

'{"time": "2021-07-10T22:19:27.554383", "message": "Created new person record", "person": {"name": "Monty", "age": 100, "create_df": "2021-07-10T22:19:27.538425"}}'

<hr>

Method `toJSON` returns dictionary with all attributes of the class object. <br> 
The same can be achieved using `vars` function:

In [46]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        self.create_df = datetime.utcnow()
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'
    
    def toJSON(self):
        return vars(self)

In [47]:
p = Person('Python', 45)

In [48]:
log_record = dict(time=datetime.utcnow(),
                  message='Created new person record',
                  person=p
                 )
log_record

{'time': datetime.datetime(2021, 7, 10, 22, 19, 27, 651123),
 'message': 'Created new person record',
 'person': Person(name=Python, age=45)}

In [49]:
print(json.dumps(log_record, default=custom_json_formatter, indent=4))

{
    "time": "2021-07-10T22:19:27.651123",
    "message": "Created new person record",
    "person": {
        "name": "Python",
        "age": 45,
        "create_df": "2021-07-10T22:19:27.634168"
    }
}


<hr>

<a id='generalize-default-function'></a>
### 3.3 Generalize default function

**Example 8**:

Using `vars` as in previous example we can now generalize default function to handle a custom class object that doesn't include `toJSON` method.

Finally if unknown objects found during serialization, will get converted to string.

In [50]:
def custom_json_formatter(arg):
    """ If argumet is datetime convert to str, if set than convert to list """
    if isinstance(arg, datetime):
        return arg.isoformat()
    elif isinstance(arg, set):
        return list(arg)
    else:
        ''' Try to execute object toJSON method '''
        try:
            return arg.toJSON()
        except AttributeError:
            ''' Try to print object attributes '''
            try:
                return vars(arg)
            except TypeError:
                ''' Convert object to string '''
                return str(arg)

In [51]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __repr__(self):
        return f'Point(x={self.x}, y={self.y})'

In [52]:
from decimal import Decimal

pt1 = Point(1, 2)
pt2 = Point(Decimal(10.5), Decimal ('100.5')) # Decimal is not recognizable by JSON, custom_json_formatter with convert to str
p = Person('John', 18)

log_record = dict(time=datetime.utcnow(),
                  message='Create new point',
                  point=pt1, 
                  point_2=pt2,
                  created_by=p
                 )

In [53]:
print(json.dumps(log_record, default=custom_json_formatter, indent=4))

{
    "time": "2021-07-10T22:19:27.729911",
    "message": "Create new point",
    "point": {
        "x": 1,
        "y": 2
    },
    "point_2": {
        "x": "10.5",
        "y": "100.5"
    },
    "created_by": {
        "name": "John",
        "age": 18,
        "create_df": "2021-07-10T22:19:27.729911"
    }
}


<hr>

<a id='singledispatch'></a>
### 3.4 singledispatch decorator 

Use single dispatch generic decorator from the `functools` module

In [54]:
from functools import singledispatch

**Example 9**:

Our default approach is going to:
* first try to use `toJSON`, if not it will 
* try to use `vars`, and it that still fails  
* use the string representation, whatever that happens to be

In [55]:
@singledispatch
def json_format(arg):
    """ Default function for handling arbitrary objects """
    print(arg)
    try:
        print('\ttrying to use toJSON...')
        return arg.toJSON()
    except AttributeError:
        try:
            print('\tfailed - trying to use vars...')
            return vars(arg)
        except TypeError:
            print('\tfailed - using string repr...')
            return str(arg)

'Register' other datatypes

In [56]:
@json_format.register(datetime)
def _(arg):
    return arg.isoformat()

In [57]:
@json_format.register(set)
def _(arg):
    return list(arg)

In [58]:
@json_format.register(Decimal)
def _(arg):
    return f'Decimal({str(arg)})'

In [59]:
from fractions import Fraction

d = dict(a=1+1j,
         b=Decimal('0.5'),
         c=Fraction(1,3),
         p=Person('Python', 27),
         pt=Point(0,3),
         time=datetime.utcnow()
        )

In [60]:
d

{'a': (1+1j),
 'b': Decimal('0.5'),
 'c': Fraction(1, 3),
 'p': Person(name=Python, age=27),
 'pt': Point(x=0, y=3),
 'time': datetime.datetime(2021, 7, 10, 22, 19, 27, 844605)}

In [61]:
print(f'\nResult:\n{json.dumps(d, default=json_format, indent=4)}')

(1+1j)
	trying to use toJSON...
	failed - trying to use vars...
	failed - using string repr...
1/3
	trying to use toJSON...
	failed - trying to use vars...
	failed - using string repr...
Person(name=Python, age=27)
	trying to use toJSON...
Point(x=0, y=3)
	trying to use toJSON...
	failed - trying to use vars...

Result:
{
    "a": "(1+1j)",
    "b": "Decimal(0.5)",
    "c": "1/3",
    "p": {
        "name": "Python",
        "age": 27,
        "create_df": "2021-07-10T22:19:27.844605"
    },
    "pt": {
        "x": 0,
        "y": 3
    },
    "time": "2021-07-10T22:19:27.844605"
}


<hr>

<a id='jsonencoder-class'></a>
## 4. JSONEncoder class

Define a custom class which inherits from existing json encoding class (instead of custom function).

Additional `dump` arguments that allow control serialization:
* `skipkeys` (bool: False) - if `True` will skip the key that is not basic type
* `indent` (int: None) - for human readability when print
* `separators` (tuple: (', ', ': ')) - customizes how JSON is rendered
* `sort_keys` (boolean: False) - if `True` dictionary keys will be sorted

**How to create a custom `JSONEncoder`**:
* subclass `JSONEncoder`
* customize init (optional)
* override the `default` method
    * handle what we want to handle ourselves
    * otherwise delegate back to parrent class

<hr>

**Example 10**:

Try `dump` arguments:

In [62]:
d = {'name': 'Python', 'age': 27, 'created_by': 'Guido van Rossum', 'list': [1, 2, 3], 10: "int", 10.5: "float", 
     1+1j: "complex"}

print(json.dumps(d, skipkeys=True,  indent='---', separators=(' $', ' = ')))

{
---"name" = "Python" $
---"age" = 27 $
---"created_by" = "Guido van Rossum" $
---"list" = [
------1 $
------2 $
------3
---] $
---"10" = "int" $
---"10.5" = "float"
}


<hr>

**Example 11**:

Define custom encoder class

If `skipkeys` is true (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a `TypeError`.

In [63]:
class CustomEncoder(json.JSONEncoder):
    def __init__(self, *args, **kwargs):
        super().__init__(skipkeys=True, allow_nan=False, indent='---', separators =(',', ':'))
    
    def default(self, arg):
        if isinstance(arg, datetime):
            return arg.isoformat()
        else:
            """ If default doesn't recognise the type then let the parent class handle it """
            return super().default(arg)

In [64]:
d = {
    'time': datetime.utcnow(),
    1+1j: "complex",
    'name': 'Python'
}

In [65]:
print(json.dumps(d, cls=CustomEncoder))

{
---"time":"2021-07-10T22:19:27.932372",
---"name":"Python"
}


<hr> 

**Example 12**:

Define custom class to handle datetime

In [66]:
class CustomEncoder(json.JSONEncoder):
    def default(self, arg):
        if isinstance(arg, datetime):
            obj = dict(
                datatype="datetime",
                iso=arg.isoformat(),
                date=arg.date().isoformat(),
                time=arg.time().isoformat(),
                year=arg.year,
                month=arg.month,
                day=arg.day,
                hour=arg.hour,
                minute=arg.minute,
                second=arg.second
            )
            return obj
        else:
            return super().default(arg)

In [67]:
d = {
    'time': datetime.utcnow(),
    'message': 'Testing...'
}
d

{'time': datetime.datetime(2021, 7, 10, 22, 19, 27, 981240),
 'message': 'Testing...'}

In [68]:
print(json.dumps(d, cls=CustomEncoder, indent=2))

{
  "time": {
    "datatype": "datetime",
    "iso": "2021-07-10T22:19:27.981240",
    "date": "2021-07-10",
    "time": "22:19:27.981240",
    "year": 2021,
    "month": 7,
    "day": 10,
    "hour": 22,
    "minute": 19,
    "second": 27
  },
  "message": "Testing..."
}


<hr>

<a id='custom-json-decoding'></a>
## 5. Custom JSON Decoding
Similarly to encoding the decoding of JSON doesn't recognise certain datatypes and requires customization

<a id='decoding-sequence-of-nested-dicts'></a>
### 5.1 Decoding sequence of nested dicts
* `load` and `loads` have argument **object_hook** which is equivalent to default argument specified for encoding
* `object_hook` if specified is recurrent function and starts from the most deep dictionary and finishes at root

**Example 13**:

Check decoding sequence

In [69]:
def custom_decoder(arg):
    print("decoding: ", arg, type(arg))
    return arg

In [70]:
j = '''
{
    "a": 1,
    "b": 2,
    "c": {
        "c.1": 1,
        "c.2": 2,
        "c.3": {
            "c.3.1": 1,
            "c.3.2": 2
        }
    }
}
'''

In [71]:
d = json.loads(j, object_hook=custom_decoder)

decoding:  {'c.3.1': 1, 'c.3.2': 2} <class 'dict'>
decoding:  {'c.1': 1, 'c.2': 2, 'c.3': {'c.3.1': 1, 'c.3.2': 2}} <class 'dict'>
decoding:  {'a': 1, 'b': 2, 'c': {'c.1': 1, 'c.2': 2, 'c.3': {'c.3.1': 1, 'c.3.2': 2}}} <class 'dict'>


<hr>

<a id='objecttype'></a>
### 5.2 `objecttype` to identify encoded objects 

Requires an `objecttype` key in encoded JSON and dedicated set of keys for each object type to exist  

**Example 14**:

Decode dictionary using custom decoder and identify objects with `objecttype`

In [72]:
class Person:
    def __init__(self, name, ssn):
        self.name = name
        self.ssn = ssn
    
    def __repr__(self):
        return f'Person(name={self.name}, ssn={self.ssn})'
    
    def toJSON(self):
        return dict(objecttype='person', name=self.name, ssn=self.ssn)

In [73]:
j = """
    {
        "accountHolder": {
            "objecttype": "person",
            "name": "Eric Idle",
            "ssn": 100
        },
        "created": {
            "objecttype": "datetime",
            "value": "2020-12-31T23:59:59"
        },
        "message": "created this json string"
    }

"""

In [74]:
def custom_decoder(arg):
    if 'objecttype' in arg:
        if arg['objecttype']=="datetime":
            return datetime.strptime(arg['value'], '%Y-%m-%dT%H:%M:%S')
        elif arg['objecttype']=="fraction":
            return Fraction(arg['numerator'], arg['denominator'])
        elif arg['objecttype']=="person":
            return Person(arg['name'], arg['ssn'])
        return arg
    return arg

In [75]:
d = json.loads(j, object_hook=custom_decoder)

In [76]:
pprint(d)

{'accountHolder': Person(name=Eric Idle, ssn=100),
 'created': datetime.datetime(2020, 12, 31, 23, 59, 59),
 'message': 'created this json string'}


<hr>

<a id='parse-parameters'></a>
### 5.3 Decoding using specific parse parameters

**Example 15**:

In [77]:
def make_decimal(arg):
    print('Float Received: ', type(arg), arg)
    return Decimal(arg)

def make_int_binary(arg):
    print('Int Received: ', type(arg), arg)
    return bin(int(arg))

In [78]:
j = '''
{
    "a": 100,
    "b": 0.2,
    "c": 0.5
}
'''

In [79]:
d = json.loads(j, parse_int=make_int_binary, parse_float=make_decimal)

Int Received:  <class 'str'> 100
Float Received:  <class 'str'> 0.2
Float Received:  <class 'str'> 0.5


In [80]:
d

{'a': '0b1100100', 'b': Decimal('0.2'), 'c': Decimal('0.5')}

<hr>

In [81]:
def make_constant_none(arg):
    print('Constant Received:', type(arg), arg)
    return None

In [82]:
j = '''
{
    "a": Infinity,
    "b": true,
    "c": null
}
'''

In [83]:
d = json.loads(j, parse_constant=make_constant_none)

Constant Received: <class 'str'> Infinity


In [84]:
d

{'a': None, 'b': True, 'c': None}

<hr>

<a id='overriding-basic-type-serializations'></a>
### 5.4 Overriding Basic Type Serializations

`object_hook` only allows us to customize deserialization of objects

`load`/`loads` arguments to **override** data types handling:
* `parse_float`
* `parse_int`
* `parse_constant`

No overrides for strings

In [85]:
j = '''
{
    "a": [1, 2, 3, 4, 5],
    "b": 100,
    "c": 10.5,
    "d": NaN,
    "e": null,
    "f": "python"
}
'''

In [86]:
def float_handler(arg):
    print('float handler', type(arg), arg)
    return float(arg)

def int_handler(arg):
    print('int handler', type(arg), arg)
    return int(arg)

def const_handler(arg):
    print('const handler', type(arg), arg)
    return None

def obj_hook(arg):
    print('obj hook', type(arg), arg)
    return arg

In [87]:
json.loads(j)

{'a': [1, 2, 3, 4, 5], 'b': 100, 'c': 10.5, 'd': nan, 'e': None, 'f': 'python'}

In [88]:
json.loads(j, 
           object_hook=obj_hook, 
           parse_float=float_handler, 
           parse_int=int_handler
          )

int handler <class 'str'> 1
int handler <class 'str'> 2
int handler <class 'str'> 3
int handler <class 'str'> 4
int handler <class 'str'> 5
int handler <class 'str'> 100
float handler <class 'str'> 10.5
obj hook <class 'dict'> {'a': [1, 2, 3, 4, 5], 'b': 100, 'c': 10.5, 'd': nan, 'e': None, 'f': 'python'}


{'a': [1, 2, 3, 4, 5], 'b': 100, 'c': 10.5, 'd': nan, 'e': None, 'f': 'python'}

<hr>

<a id='jsondecoder-class'></a>
## 6. JSONDecoder class

Create a custom `JSONDecoder` class and specify it with the `cls` argument.
* inherit from `JSONDecoder`
* **override** the `decode` function
* `decode` function receives **entire** JSON **string**
* we have to **fully parse** and return whatever object we want


**Example 16**:

Custom decoder, assuming that points will be a top level node in the JSON object:

In [89]:
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def __repr__(self):
        return f'Point(x={self.x}, y={self.y})'

In [90]:
j_points = '''
{
    "points": [
        [10, 20],
        [-1, -2],
        [0.5, 0.5]
    ]
}
'''

In [91]:
class CustomDecoder(json.JSONDecoder):
    def decode(self, arg):
        obj = json.loads(arg)
        if 'points' in obj:  # top level
            obj['points'] = [Point(x, y) for x, y in obj['points']]
        return obj

In [92]:
json.loads(j_points, cls=CustomDecoder)

{'points': [Point(x=10, y=20), Point(x=-1, y=-2), Point(x=0.5, y=0.5)]}

<hr>

In [93]:
j_other = '''
{
    "a": 1,
    "b": 2
}
'''

In [94]:
json.loads(j_other, cls=CustomDecoder)

{'a': 1, 'b': 2}

**Example 17**:

Use Regular Expressions to identify specific object type.

Schema used for this example: `{"_type": "point", "x": x-coord, "y": y-coord}`

In [95]:
j = '''
{
    "a": 100,
    "b": 0.5,
    "rectangle": {
        "corners": {
            "b_left": {"_type": "point", "x": -1, "y": -1},
            "b_right": {"_type": "point", "x": 1, "y": -1},
            "t_left": {"_type": "point", "x": -1, "y": 1},
            "t_right": {"_type": "point", "x": 1, "y": 1}
        },
        "rotate": {"_type" : "point", "x": 0, "y": 0},
        "interior_pts": [
            {"_type": "point", "x": 0, "y": 0},
            {"_type": "point", "x": 0.5, "y": 0.5}
        ]
    }
}
'''

In [96]:
import re

class CustomDecoder(json.JSONDecoder):
    def decode(self, arg):
        obj = json.loads(arg)
        pattern = r'"_type"\s*:\s*"point"'
        if re.search(pattern, arg):
            obj = self.make_pts(obj)
        return obj
    
    def make_pts(self, obj):
        if isinstance(obj, dict):
            if obj.get('_type', None) == 'point':
                obj = Point(obj['x'], obj['y'])
            else:
                for key, value in obj.items():
                    obj[key] = self.make_pts(value)
        elif isinstance(obj, list):
            for index, item in enumerate(obj):
                obj[index] = self.make_pts(item)
        return obj        

In [97]:
json.loads(j)

{'a': 100,
 'b': 0.5,
 'rectangle': {'corners': {'b_left': {'_type': 'point', 'x': -1, 'y': -1},
   'b_right': {'_type': 'point', 'x': 1, 'y': -1},
   't_left': {'_type': 'point', 'x': -1, 'y': 1},
   't_right': {'_type': 'point', 'x': 1, 'y': 1}},
  'rotate': {'_type': 'point', 'x': 0, 'y': 0},
  'interior_pts': [{'_type': 'point', 'x': 0, 'y': 0},
   {'_type': 'point', 'x': 0.5, 'y': 0.5}]}}

In [98]:
pprint(json.loads(j, cls=CustomDecoder))

{'a': 100,
 'b': 0.5,
 'rectangle': {'corners': {'b_left': Point(x=-1, y=-1),
                           'b_right': Point(x=1, y=-1),
                           't_left': Point(x=-1, y=1),
                           't_right': Point(x=1, y=1)},
               'interior_pts': [Point(x=0, y=0), Point(x=0.5, y=0.5)],
               'rotate': Point(x=0, y=0)}}


<hr>

Alternative way to use `JSONDecoder` class is to specify object and override `default` attribute

In [99]:
CustomDecoder = json.JSONDecoder(parse_float=Decimal)

In [100]:
pprint(CustomDecoder.decode(j))

{'a': 100,
 'b': Decimal('0.5'),
 'rectangle': {'corners': {'b_left': {'_type': 'point', 'x': -1, 'y': -1},
                           'b_right': {'_type': 'point', 'x': 1, 'y': -1},
                           't_left': {'_type': 'point', 'x': -1, 'y': 1},
                           't_right': {'_type': 'point', 'x': 1, 'y': 1}},
               'interior_pts': [{'_type': 'point', 'x': 0, 'y': 0},
                                {'_type': 'point',
                                 'x': Decimal('0.5'),
                                 'y': Decimal('0.5')}],
               'rotate': {'_type': 'point', 'x': 0, 'y': 0}}}


<hr>

<a id='json-schema'></a>
## 7. JSON Schema

JSON data, the way the data is formatted, often conforms to some very precise specification for JSON input and output.

One of these is the JSON Schema standard:
https://json-schema.org/

We can then determine if the JSON is valid.

**Example 18**:

Sample schema:
* type object requires `"firstName"` and `"lastName"` to be provided
* has specified type for each field
* has specified value lenght for each field 
* field `"eyeColor"` which must contain (if provided) one of a few specific values: `amber`, `blue`, `brown`, `gray`, `green`, `hazel`, `red`, or `violet`.

In [101]:
person_schema = {
    "type": "object",
    "properties": {
        "firstName": {
            "type": "string",
            "minLength": 1
        },
        "middleInitial": {
            "type": "string",
            "minLength": 1,
            "maxLength": 1
        },
        "lastName": {
            "type": "string",
            "minLength": 1
        },
        "age": {
            "type": "integer", 
            "minimum": 0
        },
        "eyeColor": {
            "type": "string",
            "enum": ["amber", "blue", "brown", "gray", 
                     "green", "hazel", "red", "violet"]
        }
    },
    "required": ["firstName", "lastName"]
}

<hr>

Requires to install `jsonschema` library (usually `pip install jsonschema`)

In [102]:
from jsonschema import validate
from jsonschema.exceptions import ValidationError
from json import loads, dumps, JSONDecodeError

In [103]:
p1 = '''
    {
        "firstName": "John",
        "middleInitial": "M",
        "lastName": "Cleese",
        "age": 79
    }
'''

In [104]:
try:
    validate(loads(p1), person_schema)
except JSONDecodeError as ex:
    print(f'Invalid JSON: {ex}')
except ValidationError as ex:
    print(f'Validation error: {ex}')
else:
    print('JSON is valid')

JSON is valid


In [105]:
p2 = '''
    {
        "firstName": "John",
        "middleInitial": 100,
        "lastName": "Cleese",
        "age": "Unknown"
    }
'''

In [106]:
try:
    validate(loads(p2), person_schema)
except JSONDecodeError as ex:
    print(f'Invalid JSON: {ex}')
except ValidationError as ex:
    print(f'Validation error: {ex}')
else:
    print('JSON is valid')

Validation error: 100 is not of type 'string'

Failed validating 'type' in schema['properties']['middleInitial']:
    {'maxLength': 1, 'minLength': 1, 'type': 'string'}

On instance['middleInitial']:
    100


In [107]:
p3 = '''
    {
        "firstName": "John",
        "age": -10.5
    }
'''

In [108]:
try:
    validate(loads(p3), person_schema)
except JSONDecodeError as ex:
    print(f'Invalid JSON: {ex}')
except ValidationError as ex:
    print(f'Validation error: {ex}')
else:
    print('JSON is valid')

Validation error: 'lastName' is a required property

Failed validating 'required' in schema:
    {'properties': {'age': {'minimum': 0, 'type': 'integer'},
                    'eyeColor': {'enum': ['amber',
                                          'blue',
                                          'brown',
                                          'gray',
                                          'green',
                                          'hazel',
                                          'red',
                                          'violet'],
                                 'type': 'string'},
                    'firstName': {'minLength': 1, 'type': 'string'},
                    'lastName': {'minLength': 1, 'type': 'string'},
                    'middleInitial': {'maxLength': 1,
                                      'minLength': 1,
                                      'type': 'string'}},
     'required': ['firstName', 'lastName'],
     'type': 'object'}

On instance:
   

<hr>

**Example 19**:

The validator only returns the first validation error it encounters. <br>
This can be changed to run the entire validation and return all the validation errors (if any).

In [109]:
from jsonschema import Draft4Validator

validator = Draft4Validator(person_schema)

In [110]:
for error in validator.iter_errors(loads(p2)):
    print(error, end='\n-----------\n')

100 is not of type 'string'

Failed validating 'type' in schema['properties']['middleInitial']:
    {'maxLength': 1, 'minLength': 1, 'type': 'string'}

On instance['middleInitial']:
    100
-----------
'Unknown' is not of type 'integer'

Failed validating 'type' in schema['properties']['age']:
    {'minimum': 0, 'type': 'integer'}

On instance['age']:
    'Unknown'
-----------


In [111]:
p4 = '''
    {
        "firstName": "John",
        "middleInitial": null,
        "lastName": "Cleese",
        "eyeColor": "blue-gray"
    }
'''

In [112]:
for error in validator.iter_errors(loads(p4)):
    print(error, end='\n-----------\n')    

None is not of type 'string'

Failed validating 'type' in schema['properties']['middleInitial']:
    {'maxLength': 1, 'minLength': 1, 'type': 'string'}

On instance['middleInitial']:
    None
-----------
'blue-gray' is not one of ['amber', 'blue', 'brown', 'gray', 'green', 'hazel', 'red', 'violet']

Failed validating 'enum' in schema['properties']['eyeColor']:
    {'enum': ['amber',
              'blue',
              'brown',
              'gray',
              'green',
              'hazel',
              'red',
              'violet'],
     'type': 'string'}

On instance['eyeColor']:
    'blue-gray'
-----------


<hr>

<a id='marshmallow'></a>
## 8. Marshmallow

Marshmallow is a library that can be used to "translate" objects to and from complex data types.<br>
At the same time, it can also perform validation and is very customizable.

[https://marshmallow.readthedocs.io/en/3.0/](https://marshmallow.readthedocs.io/en/3.0/)

**Example 20**:

Person serialization with custom object and namedtuple.

Serialize objects by passing them to your schema’s `dump` method, which returns the formatted result.<br>
You can also serialize to a JSON-encoded string using `dumps`.

In [113]:
!pip install marshmallow



In [114]:
from marshmallow import Schema, fields

In [115]:
class Person:
    def __init__(self, first_name, last_name, dob, height):
        self.first_name = first_name
        self.last_name = last_name
        self.dob = dob
        self.height = height
    
    def __repr__(self):
        return f'Person({self.first_name}, {self.last_name}, {self.dob}, {self.height})'

In [116]:
from datetime import date

p1 = Person('John', 'Cleese', date(1939, 10, 27), 182)
p1

Person(John, Cleese, 1939-10-27, 182)

In [117]:
class PersonSchema(Schema):
    first_name = fields.Str()
    last_name = fields.Str()
    dob = fields.Date()
    height = fields.Int()

In [118]:
person_schema = PersonSchema()

In [119]:
person_schema.dump(p1)

{'dob': '1939-10-27',
 'last_name': 'Cleese',
 'height': 182,
 'first_name': 'John'}

<hr>

In [120]:
from collections import namedtuple

PT = namedtuple('PT', 'first_name, last_name, dob, height')
p2 = PT('Eric', 'Idle', date(1943, 3, 29), 178)
p2

PT(first_name='Eric', last_name='Idle', dob=datetime.date(1943, 3, 29), height=178)

In [121]:
person_schema.dump(p2)

{'dob': '1943-03-29', 'last_name': 'Idle', 'height': 178, 'first_name': 'Eric'}

<hr>

Specify attributes to use:

In [122]:

person_partial = PersonSchema(only=('first_name', 'last_name'))

person_partial.dumps(p2)

'{"last_name": "Idle", "first_name": "Eric"}'

In [123]:
person_partial = PersonSchema(exclude=['dob'])

person_partial.dumps(p2)

'{"last_name": "Idle", "height": 178, "first_name": "Eric"}'

<hr>

**Example 21**:

In [124]:
class Movie:
    def __init__(self, title, year, actors):
        self.title = title
        self.year = year
        self.actors = actors

In [125]:
class MovieSchema(Schema):
    title = fields.Str()
    year = fields.Int()
    actors = fields.Nested(PersonSchema, many=True)

In [126]:
p1, p2

(Person(John, Cleese, 1939-10-27, 182),
 PT(first_name='Eric', last_name='Idle', dob=datetime.date(1943, 3, 29), height=178))

In [127]:
parrot = Movie('Parrot Sketch', 1989, [p1, PT('Michael', 'Palin', date(1943, 5, 5), 177)])

In [128]:
MovieSchema().dumps(parrot)

'{"title": "Parrot Sketch", "actors": [{"dob": "1939-10-27", "last_name": "Cleese", "height": 182, "first_name": "John"}, {"dob": "1943-05-05", "last_name": "Palin", "height": 177, "first_name": "Michael"}], "year": 1989}'

<hr>

**Example 22**:

Person deserialize with Schema

In [129]:
person_schema = PersonSchema()

In [130]:
person_schema.load(dict(first_name='John',
                        last_name='Cleese',
                        dob='1939-10-27',
                        height=182
                       ))

{'dob': datetime.date(1939, 10, 27),
 'last_name': 'Cleese',
 'height': 182,
 'first_name': 'John'}

To allow deserialize methods to create a Person objects instead a dictionary, a dedicated method is definded in the schema object with marshmallow's `post_load` as decorator.

In [131]:
from marshmallow import post_load

In [132]:
class PersonSchema(Schema):
    first_name = fields.Str()
    last_name = fields.Str()
    dob = fields.Date()
    height = fields.Int()
    
    @post_load
    def make_person(self, data, **kwargs):
        return Person(**data)

In [133]:
person_schema = PersonSchema()
person_schema.load(dict(first_name='John',
                        last_name='Cleese',
                        dob='1939-10-27',
                        height=182))

Person(John, Cleese, 1939-10-27, 182)

<hr>

In [134]:
class MovieSchema(Schema):
    title = fields.Str()
    year = fields.Integer()
    actors = fields.Nested(PersonSchema, many=True)
    
    @post_load
    def make_movie(self, data, **kwargs):
        return Movie(**data)

    
movie_schema = MovieSchema()

In [135]:
json_data = '''
{"actors": [
    {"first_name": "John", "last_name": "Cleese", "dob": "1939-10-27", "height": 182}, 
    {"first_name": "Michael", "last_name": "Palin", "dob": "1943-05-05", "height": 177}], 
"title": "Parrot Sketch", 
"year": 1989}
'''

In [136]:
movie = movie_schema.loads(json_data)
movie

<__main__.Movie at 0x1e578893130>

In [137]:
movie.title, movie.year

('Parrot Sketch', 1989)

In [138]:
movie.actors

[Person(John, Cleese, 1939-10-27, 182),
 Person(Michael, Palin, 1943-05-05, 177)]