# Serializaing Python Objects

## Diving In

On the surface, the concept of serialization is simple. You have a data structure in memory that you want to save, reuse, or send to someone else. The data is only meant to be used by the same program that created it, never sent over a network, and never read by anything other than the program that created it. The pickle module is ideal for this use case, its a standard python library with the bulk of it written in C.

What can pickle store?:
* all native python datatypes
* combos of native python datatypes
* functions, classes, and instances of classes (with caveats)

## Saving Data to a Pickle File

In [1]:
# Creating a dictionary to pickle
import time

entry = {}
entry['title'] = 'Dive into history, 2009 edition'
entry['article_link'] = 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
entry['comments_link'] = None
entry['internal_id'] = b'\xDE\xD5\xB4\xF8'
entry['tags'] = ('diveintopython', 'docbook', 'html')
entry['published'] = True
entry['published_date'] = time.strptime('Fri Mar 27 22:20:42 2009')

print(entry['published_date'])

time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)


In [3]:
# pickle our dictionary
# pickle is python specific format

import pickle

with open('33_serializing_python_objects_demos/entry.pickle', 'wb') as f:
    pickle.dump(entry, f)

## Loading Data from a Pickle File

In [4]:
with open('33_serializing_python_objects_demos/entry.pickle', 'rb') as f:
    entry = pickle.load(f)
    
print(entry)

{'title': 'Dive into history, 2009 edition', 'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition', 'comments_link': None, 'internal_id': b'\xde\xd5\xb4\xf8', 'tags': ('diveintopython', 'docbook', 'html'), 'published': True, 'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1)}


## Pickling without a File

In [6]:
# pickling a bytes object

# Load our pickled object
with open('33_serializing_python_objects_demos/entry.pickle', 'rb') as f:    
     entry = pickle.load(f)  

# pickle our loaded object
b = pickle.dumps(entry)

# type check
print(type(b))

# re-pickle object and confirm it matches our loaded object
repickled_entry = pickle.loads(b)
print(repickled_entry==entry)

<class 'bytes'>
True


## Bytes and Strings Rear Their Ugly Heads Again

Python 3.0 introduced a new pickle protocol with explicit support for bytes objects and byte arrays. It is a binary format. This python 3 version is compatible with the python 2 version, but not visa versa.

## Debugging Pickle Files

In [9]:
# printing pickle files from cmd can be messy

!cat 33_serializing_python_objects_demos/entry.pickle

��J      }�(�title��Dive into history, 2009 edition��article_link��Jhttp://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition��comments_link�N�internal_id�C�մ���tags��diveintopython��docbook��html����	published���published_date��time��struct_time���(M�KKKKK*KKVJ����t�}�(�tm_zone�N�	tm_gmtoff�Nu��R�u.

In [10]:
# printing our pickled object with a cleaner format
import pickletools

with open('33_serializing_python_objects_demos/entry.pickle', 'rb') as f:
    pickletools.dis(f)

    0: \x80 PROTO      4
    2: \x95 FRAME      330
   11: }    EMPTY_DICT
   12: \x94 MEMOIZE    (as 0)
   13: (    MARK
   14: \x8c     SHORT_BINUNICODE 'title'
   21: \x94     MEMOIZE    (as 1)
   22: \x8c     SHORT_BINUNICODE 'Dive into history, 2009 edition'
   55: \x94     MEMOIZE    (as 2)
   56: \x8c     SHORT_BINUNICODE 'article_link'
   70: \x94     MEMOIZE    (as 3)
   71: \x8c     SHORT_BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
  147: \x94     MEMOIZE    (as 4)
  148: \x8c     SHORT_BINUNICODE 'comments_link'
  163: \x94     MEMOIZE    (as 5)
  164: N        NONE
  165: \x8c     SHORT_BINUNICODE 'internal_id'
  178: \x94     MEMOIZE    (as 6)
  179: C        SHORT_BINBYTES b'\xde\xd5\xb4\xf8'
  185: \x94     MEMOIZE    (as 7)
  186: \x8c     SHORT_BINUNICODE 'tags'
  192: \x94     MEMOIZE    (as 8)
  193: \x8c     SHORT_BINUNICODE 'diveintopython'
  209: \x94     MEMOIZE    (as 9)
  210: \x8c     SHORT_BINUNICODE 'docbook'
  219

## Serializing Python Objects to be Read by Other Languages

Pickle is python specific so in order to save objects compatible with other programming languages we will need to use something else. JSON is a great candidate! JSON stands for "JavaScript Object Notation" and Python 3 comes with a json standard library for working with these objects.

JSON is case-sensitive, and ingores whitespaces, allowing you to "pretty-print" your json objects without ramification. JSON must also be stored in a Unicode encoding (UTF-32, UTF-16, or the default, utf-8).

## Saving Data to a JSON File

* You can use the Javascript eval() function in python to "decode" json-serialized dfata.

In [47]:
import json

# Create dictionary object
basic_entry = {}
basic_entry['id'] = 256
basic_entry['title'] = 'Dive into history, 2009 edition'
basic_entry['tags'] = ('diveintopython', 'docbook', 'html')
basic_entry['published'] = True
basic_entry['comments_link'] = None

# write dictionary to json file
with open('33_serializing_python_objects_demos/basic.json', mode='w', encoding='utf-8') as f:
    json.dump(basic_entry, f)    

In [48]:
# printing our json object via cmd
! cat 33_serializing_python_objects_demos/basic.json

{"id": 256, "title": "Dive into history, 2009 edition", "tags": ["diveintopython", "docbook", "html"], "published": true, "comments_link": null}

In [49]:
# re-saving our json as pretty format
# This will be more readable, but also a larger file

import json
with open('33_serializing_python_objects_demos/basic-pretty.json', mode='w', encoding='utf-8') as f:
     json.dump(basic_entry, f, indent=2)

In [50]:
# printing our pretty-print json object via cmd
! cat 33_serializing_python_objects_demos/basic-pretty.json

{
  "id": 256,
  "title": "Dive into history, 2009 edition",
  "tags": [
    "diveintopython",
    "docbook",
    "html"
  ],
  "published": true,
  "comments_link": null
}

## Mapping of Python Datatypes to JSON

Python maps pretty cleanly to JSON, but there are two important datatypes that are missing from JSON format. Those include:

* tuples (mapped to array objects in json, essentially a list object)
* bytes (no support)

## Serializing Datatypes Unsupported by JSON

In [55]:
import time

# func to handle serializing time and byte objects to JSON
def to_json(python_object):
    if isinstance(python_object, time.struct_time):
        return {'__class__': 'time.asctime',
                '__value__': time.asctime(python_object)}
    if isinstance(python_object, bytes):
        return {'__class__': 'bytes',
                '__value__': list(python_object)}
    raise TypeError(repr(python_object) + ' is not JSON serializable')

In [56]:
# write unsupported datatypes to JSON file
with open('33_serializing_python_objects_demos/entry.json', 'w', encoding='utf-8') as f:
     json.dump(entry, f, indent=2, default=to_json)

In [57]:
!cat 33_serializing_python_objects_demos/entry.json

{
  "title": "Dive into history, 2009 edition",
  "article_link": "http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition",
  "comments_link": null,
  "internal_id": {
    "__class__": "bytes",
    "__value__": [
      222,
      213,
      180,
      248
    ]
  },
  "tags": [
    "diveintopython",
    "docbook",
    "html"
  ],
  "published": true,
  "published_date": [
    2009,
    3,
    27,
    22,
    20,
    42,
    4,
    86,
    -1
  ]
}

## Loading Data from a JSON file

In [58]:
with open('33_serializing_python_objects_demos/entry.json', mode='r', encoding='utf-8') as f:
    entry = json.load(f)
    
print(entry)

{'title': 'Dive into history, 2009 edition', 'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition', 'comments_link': None, 'internal_id': {'__class__': 'bytes', '__value__': [222, 213, 180, 248]}, 'tags': ['diveintopython', 'docbook', 'html'], 'published': True, 'published_date': [2009, 3, 27, 22, 20, 42, 4, 86, -1]}


 We can see it loaded out file, but didnt convert our date and byte fields because they are non-json compatible.

In [59]:
#Function to convert our non-json compatible fields back into their correct python types
def from_json(json_object):
    if '__class__' in json_object:
        if json_object['__class__'] == 'time.asctime':
            return time.strptime(json_object['__value__'])
        if json_object['__class__'] == 'bytes':
            return bytes(json_object['__value__'])
    return json_object

In [60]:
# Load JSON file with non-json compatible objects
with open('33_serializing_python_objects_demos/entry.json', 'r', encoding='utf-8') as f:
     entry = json.load(f, object_hook=from_json)
        
print(entry)

{'title': 'Dive into history, 2009 edition', 'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition', 'comments_link': None, 'internal_id': b'\xde\xd5\xb4\xf8', 'tags': ['diveintopython', 'docbook', 'html'], 'published': True, 'published_date': [2009, 3, 27, 22, 20, 42, 4, 86, -1]}


If you take a look you will notice our 'tags' field is now a list instead of tuple, for most cases you can ignore the difference, but this is something to keep in mind when working with converting objects to and from python to JSON.