# Data Serialisation

Data serialisation is the process of flattening complex data structures, into a format that can be easily stored, transferred, or shared with another program. Python provides several different ways to do this.

## Serialisation with JSON
The built-in *json* module provides an interface for converting Python objects into JavaScript Object Notation (JSON) format. 

In [1]:
import json

Firstly, a simple example of serialising a dictionary to disk with JSON. 

In [2]:
capitals = {"Ireland":"Dublin", "France":"Paris", "Spain":"Madrid", "Italy": "Rome"} 

We open a file for text writing:

In [3]:
fout = open("example1.json", "w")

We can use the *json.dump()* function to write directly to a file:

In [4]:
json.dump(capitals, fout)
fout.close()

To deserialise, we open the file for text reading:

In [5]:
fin = open("example1.json","r")

Then use the *json.load()* function to reconstruct a copy of the original dictionary:

In [6]:
values = json.load(fin)
fin.close()

The dictionary is exactly the same as we originally created (note that a dictionary ordering is arbitrary, so the pairs may not necessarily appear in the same order, but the values are the same).

In [7]:
values

{'Ireland': 'Dublin', 'France': 'Paris', 'Spain': 'Madrid', 'Italy': 'Rome'}

## Serialisation with Pickle
The built-in Python *pickle* module is the main mechanism provided by Python for serialising data. 

In [8]:
import pickle

A simple example of serialising a dictionary to disk with Pickle. 

In [9]:
capitals = {"Ireland":"Dublin", "France":"Paris", "Spain":"Madrid", "Italy": "Rome"} 
capitals

{'Ireland': 'Dublin', 'France': 'Paris', 'Spain': 'Madrid', 'Italy': 'Rome'}

We open a file for binary writing ("wb"):

In [10]:
fout = open("example1.pkl", "wb")

The simplest usage of Pickle involves calling the *pickle.dump()* function.  

In [11]:
pickle.dump(capitals, fout)
fout.close()

To deserialise, we first open the file for binary reading ("rb"):

In [12]:
fin = open("example1.pkl", "rb")

We then reconstruct the original dictionary using *pickle.load()*:

In [13]:
values = pickle.load(fin)
fin.close()

We can see that Pickle has reconstructed the original dictionary:

In [14]:
values

{'Ireland': 'Dublin', 'France': 'Paris', 'Spain': 'Madrid', 'Italy': 'Rome'}

We can apply the same process for more complex, nested data structures.

In [15]:
cities = ["London", "Paris", "Madrid"]
teams = ["Chelsea", "PSG", "Real Madrid"]
data = [cities, teams]
fout = open("example2.pkl", "wb")
pickle.dump(data, fout)
fout.close()

In [16]:
import pickle
fin = open("example2.pkl", "rb")
backup = pickle.load(fin)
fin.close()

In [17]:
backup

[['London', 'Paris', 'Madrid'], ['Chelsea', 'PSG', 'Real Madrid']]

In [18]:
backup[0]

['London', 'Paris', 'Madrid']

A further example, with more complex data.

In [19]:
book1 = {
    "id" : "978-1933988177",
    "category" : ["book","paperback"],
    "name" : "Lucene in Action",
    "author" : "Michael McCandless",
    "genre" : "technology",
    "price" : 30.50,
    "pages" : 475 }

book2 = {
    "id" : "978-1857995879",
    "category" : ["book","paperback"],
    "name" : "Sophie's World",
    "author" : "Jostein Gaarder",
    "sequence_i" : 1,
    "genre" : "fiction",
    "price" : 3.07,
    "pages" : 64 }

In [20]:
books = [ book1, book2 ]
books

[{'id': '978-1933988177',
  'category': ['book', 'paperback'],
  'name': 'Lucene in Action',
  'author': 'Michael McCandless',
  'genre': 'technology',
  'price': 30.5,
  'pages': 475},
 {'id': '978-1857995879',
  'category': ['book', 'paperback'],
  'name': "Sophie's World",
  'author': 'Jostein Gaarder',
  'sequence_i': 1,
  'genre': 'fiction',
  'price': 3.07,
  'pages': 64}]

In [21]:
# Use Pickle to write this out to disk
fout = open("example3.pkl", "wb")
pickle.dump(books, fout)
fout.close()

Again Pickle can reconstruct the original data structure from the seralised version on disk:

In [22]:
fin = open("example3.pkl", "rb")
backup = pickle.load(fin)
fin.close()

In [23]:
backup

[{'id': '978-1933988177',
  'category': ['book', 'paperback'],
  'name': 'Lucene in Action',
  'author': 'Michael McCandless',
  'genre': 'technology',
  'price': 30.5,
  'pages': 475},
 {'id': '978-1857995879',
  'category': ['book', 'paperback'],
  'name': "Sophie's World",
  'author': 'Jostein Gaarder',
  'sequence_i': 1,
  'genre': 'fiction',
  'price': 3.07,
  'pages': 64}]

In [24]:
backup[0]

{'id': '978-1933988177',
 'category': ['book', 'paperback'],
 'name': 'Lucene in Action',
 'author': 'Michael McCandless',
 'genre': 'technology',
 'price': 30.5,
 'pages': 475}