# Serialization (in Python)

Graeme Watt, IPPP Computing Lunch Club, Monday 25th June 2018

[`https://github.com/GraemeWatt/IPPPComputingLunchClub`](https://github.com/GraemeWatt/IPPPComputingLunchClub)

Interact with this notebook via [Binder](https://mybinder.org/v2/gh/GraemeWatt/IPPPComputingLunchClub/master?filepath=notebooks).

## Introduction

**Aim:** translate data structures into a format that can be *stored* or *transmitted* and *reconstructed* later.

Wikipedia: [Comparison of data serialization formats](https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats)

| Name | Standardized? | Binary? | Human-readable? |
|------|---------------|---------|-----------------|
| [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) | Partial | No | Yes |
| [XML](https://www.w3.org/TR/REC-xml/) | Yes | Partial | Yes |
| [JSON](http://json.org/) | Yes | No | Yes |
| [Pickle (Python)](https://docs.python.org/3/library/pickle.html) | Yes | Yes | No |
| [YAML](http://yaml.org/) | Yes | No | Yes |

Represent data structures with three basic primitives: *mappings* (hashes/dictionaries), *sequences* (arrays/lists) and *scalars* (strings/numbers).

In [1]:
from __future__ import print_function 

## Sample data

Define some sample data taken from the [YUM Burgers menu](https://www.yumfood.org.uk/resources/yum/locations/palatine/218262_PalatineDigitalFri_WK1.png) served at the Palatine Café at Durham University.

In [2]:
meals = [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]]

In [3]:
burgers = [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}]

In [4]:
sides = ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob']

In [5]:
sauces = ['Ketchup', 'Mustard', 'Mayo']

## Comma-separated values (CSV)

In [6]:
import csv

Write the `meals` list to a CSV file, with a header as the first line.

In [7]:
with open('meals.csv', 'wt') as fout:
    csvout = csv.writer(fout)
    csvout.writerow(['Meal', 'Price [GBP]'])
    csvout.writerows(meals)

Read the CSV file and print out each row.

In [8]:
with open('meals.csv', 'rt') as fin:
    csvin = csv.reader(fin)
    for row in csvin:
        print(row)

['Meal', 'Price [GBP]']
['Main + all sides', '6.95']
['Main + 1 side', '4.95']
['Vegetarian Main + all sides', '6.45']
['Vegetarian Main + 1 side', '4.75']


Read each row of the CSV file into a dict whose keys are given by the header.

In [9]:
with open('meals.csv', 'rt') as fin:
    csvin = csv.DictReader(fin)
    for row in csvin:
        print(row['Meal'], 'costs', row['Price [GBP]'])

Main + all sides costs 6.95
Main + 1 side costs 4.95
Vegetarian Main + all sides costs 6.45
Vegetarian Main + 1 side costs 4.75


The field separator need not be a comma.  For example, [YODA](http://yoda.hepforge.org) uses a tab-separated values (TSV) format.

In [10]:
import requests
yodafile = requests.get('https://hepdata.net/record/ins1632756?table=Table1&version=1&format=yoda').text

In [11]:
print(yodafile)

BEGIN YODA_SCATTER2D_V2 /REF/ATLAS_2017_I1632756/d01-x01-y01
Variations: [""]
ErrorBreakdown: {0: {stat: {dn: -80, up: 70}, sys: {dn: -140, up: 140}}, 1: {stat: {dn: -60, up: 60}, sys: {dn: -80, up: 80}}}

IsRef: 1
Path: /REF/ATLAS_2017_I1632756/d01-x01-y01
Title: doi:10.17182/hepdata.79163.v1/t1
Type: Scatter2D
---
# xval	 xerr-	 xerr+	 yval	 yerr-	 yerr+	
6.850000e-01	6.850000e-01	6.850000e-01	7.700000e+02	1.612452e+02	1.565248e+02
1.965000e+00	4.050000e-01	4.050000e-01	2.200000e+02	1.000000e+02	1.000000e+02
END YODA_SCATTER2D_V2
BEGIN YODA_SCATTER2D_V2 /REF/ATLAS_2017_I1632756/d01-x01-y02
Variations: [""]
ErrorBreakdown: {0: {stat: {dn: -300, up: 300}, sys: {dn: -800, up: 800}}, 1: {stat: {dn: -300, up: 200}, sys: {dn: -400, up: 400}}}

IsRef: 1
Path: /REF/ATLAS_2017_I1632756/d01-x01-y02
Title: doi:10.17182/hepdata.79163.v1/t1
Type: Scatter2D
---
# xval	 xerr-	 xerr+	 yval	 yerr-	 yerr+	
6.850000e-01	6.850000e-01	6.850000e-01	2.500000e+03	8.544004e+02	8.544004e+02
1.965000e+00	4.050

In [12]:
from io import StringIO
reader = csv.reader(StringIO(yodafile), delimiter='\t')
for row in reader:
    if len(row) < 2:
        continue
    elif row[0].startswith('END'):
        break
    else:
        print(row)

['# xval', ' xerr-', ' xerr+', ' yval', ' yerr-', ' yerr+', '']
['6.850000e-01', '6.850000e-01', '6.850000e-01', '7.700000e+02', '1.612452e+02', '1.565248e+02']
['1.965000e+00', '4.050000e-01', '4.050000e-01', '2.200000e+02', '1.000000e+02', '1.000000e+02']
['# xval', ' xerr-', ' xerr+', ' yval', ' yerr-', ' yerr+', '']
['6.850000e-01', '6.850000e-01', '6.850000e-01', '2.500000e+03', '8.544004e+02', '8.544004e+02']
['1.965000e+00', '4.050000e-01', '4.050000e-01', '1.200000e+03', '5.000000e+02', '4.472136e+02']


## XML (eXtensible Markup Language)

XML defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.

In [13]:
import xml.etree.ElementTree as ET

Each element has a *tag*, a number of *attributes*, a *text* string, and a number of *child* elements.

Write an XML file.

In [14]:
root = ET.Element('menu')

for meal in meals:
    ET.SubElement(root, 'meal', price=str(meal[1])).text = meal[0]

for burger in burgers:
    ET.SubElement(root, 'burger', vegetarian=str(burger['vegetarian'])).text = burger['type']
    
for side in sides:
    ET.SubElement(root, 'side').text = side
    
for sauce in sauces:
    ET.SubElement(root, 'sauce').text = sauce
    
tree = ET.ElementTree(root)
tree.write('menu.xml')
ET.dump(tree)

<menu><meal price="6.95">Main + all sides</meal><meal price="4.95">Main + 1 side</meal><meal price="6.45">Vegetarian Main + all sides</meal><meal price="4.75">Vegetarian Main + 1 side</meal><burger vegetarian="False">Angus Beef</burger><burger vegetarian="False">Angus Cheese</burger><burger vegetarian="False">Lamb &amp; Feta</burger><burger vegetarian="False">Cajun Chicken</burger><burger vegetarian="True">Falafel</burger><side>Homemade Coleslaw</side><side>French Fries</side><side>Side Salad</side><side>Corn on the Cob</side><sauce>Ketchup</sauce><sauce>Mustard</sauce><sauce>Mayo</sauce></menu>


Read an XML file.

In [15]:
tree = ET.parse('menu.xml')
root = tree.getroot()

In [16]:
root.tag

'menu'

In [17]:
for child in root:
    print()
    print('{} = {}'.format(child.tag, child.text))
    for attrib in child.attrib:
        print('{} = {}'.format(attrib, child.attrib[attrib]))


meal = Main + all sides
price = 6.95

meal = Main + 1 side
price = 4.95

meal = Vegetarian Main + all sides
price = 6.45

meal = Vegetarian Main + 1 side
price = 4.75

burger = Angus Beef
vegetarian = False

burger = Angus Cheese
vegetarian = False

burger = Lamb & Feta
vegetarian = False

burger = Cajun Chicken
vegetarian = False

burger = Falafel
vegetarian = True

side = Homemade Coleslaw

side = French Fries

side = Side Salad

side = Corn on the Cob

sauce = Ketchup

sauce = Mustard

sauce = Mayo


## JSON (JavaScript Object Notation)

JSON has gained popularity over XML in recent years.

In [18]:
import json

Note the similarity of the Python and JSON representations.

In [19]:
menu = {'meals': meals, 'burgers': burgers, 'sides': sides, 'sauces': sauces}
print('Python:', menu)
print()
print('JSON:', json.dumps(menu))

Python: {'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo']}

JSON: {"meals": [["Main + all sides", 6.95], ["Main + 1 side", 4.95], ["Vegetarian Main + all sides", 6.45], ["Vegetarian Main + 1 side", 4.75]], "burgers": [{"type": "Angus Beef", "vegetarian": false}, {"type": "Angus Cheese", "vegetarian": false}, {"type": "Lamb & Feta", "vegetarian": false}, {"type": "Cajun Chicken", "vegetarian": false}, {"type": "Falafel", "vegetarian": true}], "sides": ["Homemade Coleslaw", "French Fries", "Side Salad", "Corn on the Cob"], "sauces": ["Ketchup",

Write a JSON file.

In [20]:
with open('menu.json', 'wt') as fout:
    json.dump(menu, fout)

Read a JSON file.

In [21]:
with open('menu.json', 'rt') as fin:
    menu1 = json.load(fin)

In [22]:
print(menu1 == menu)

True


In [23]:
print('meals = {}'.format(menu1['meals']))
print('burgers = {}'.format(menu1['burgers']))
print('sides = {}'.format(menu1['sides']))
print('sauces = {}'.format(menu1['sauces']))

meals = [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]]
burgers = [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}]
sides = ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob']
sauces = ['Ketchup', 'Mustard', 'Mayo']


### Aside: JSON Schema

[JSON Schema](http://json-schema.org/) is a vocabulary that allows you to annotate and validate JSON documents (example: [hepdata-validator schemas](https://github.com/HEPData/hepdata-validator/tree/master/hepdata_validator/schemas)).

Here is a schema that defines the `menu` data structure.

In [24]:
schema = {
    "properties": {
        "meals": {
            "type": "array",
            "description": "List of meals",
            "items": {
                "type": "array",
                "items": [
                    {"type": "string", "description": "Meal"},
                    {"type": "number", "description": "Price [GBP]", "exclusiveMinimum": 0}
                ],
                "additionalItems": False
            }
        },
        "burgers": {
            "type": "array",
            "description": "List of burgers",
            "items": {
                "properties": {
                    "type": {
                        "type": "string",
                        "description": "Type of burger"
                    },
                    "vegetarian": {
                        "type": "boolean",
                        "description": "Suitable for vegetarians"
                    }
                },
                "required": ["type", "vegetarian"]
            }
        },
        "sides": {
            "type": "array",
            "description": "List of sides",
            "items": {"type": "string"}
        },
        "sauces": {
            "type": "array",
            "description": "List of sides",
            "items": {"type": "string"}
        }
    },
    "required": ["meals", "burgers", "sides"]
}

In [25]:
with open('schema.json', 'wt') as fout:
    json.dump(schema, fout)

In [26]:
with open('schema.json', 'rt') as fin:
    schema1 = json.load(fin)

In [27]:
print(schema1)

{'properties': {'meals': {'type': 'array', 'description': 'List of meals', 'items': {'type': 'array', 'items': [{'type': 'string', 'description': 'Meal'}, {'type': 'number', 'description': 'Price [GBP]', 'exclusiveMinimum': 0}], 'additionalItems': False}}, 'burgers': {'type': 'array', 'description': 'List of burgers', 'items': {'properties': {'type': {'type': 'string', 'description': 'Type of burger'}, 'vegetarian': {'type': 'boolean', 'description': 'Suitable for vegetarians'}}, 'required': ['type', 'vegetarian']}}, 'sides': {'type': 'array', 'description': 'List of sides', 'items': {'type': 'string'}}, 'sauces': {'type': 'array', 'description': 'List of sides', 'items': {'type': 'string'}}}, 'required': ['meals', 'burgers', 'sides']}


In [28]:
from jsonschema import validate

In [29]:
validate(menu1, schema)

In [30]:
print(menu1)

{'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo']}


In [31]:
menu1['meals'][0][1] = 0
print(menu1)
#validate(menu1, schema) # gives a ValidationError

{'meals': [['Main + all sides', 0], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo']}


In [32]:
menu1['meals'][0][1] = 6.95
print(menu1)
validate(menu1, schema)

{'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo']}


## Pickle (Python)

The pickle module implements binary protocols for serializing and de-serializing a Python object structure.

JSON, by default, can only represent a subset of the Python built-in types, and no custom classes.

In [33]:
import datetime
now = datetime.datetime.utcnow()
now

datetime.datetime(2019, 2, 26, 13, 48, 56, 743523)

In [34]:
#json.dumps({'time': now}) # gives a TypeError

In [35]:
class Meal():
    def __init__(self, burger, sides, sauces):
        self.burger = burger
        self.sides = sides
        self.sauces = sauces
    @property
    def vegetarian(self):
        for burger in burgers:
            if burger['type'] == self.burger:
                return burger['vegetarian']
    @property
    def price(self):
        if self.vegetarian:
            return 4.75 if len(self.sides) <= 1 else 6.45
        else:
            return 4.95 if len(self.sides) <= 1 else 6.95

In [36]:
meal1 = Meal('Angus Beef', ['French Fries'], ['Ketchup', 'Mustard'])
meal2 = Meal('Falafel', ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], ['Mayo'])

In [37]:
print(meal1.burger, meal1.sides, meal1.sauces, meal1.vegetarian, meal1.price)
print(meal2.burger, meal2.sides, meal2.sauces, meal2.vegetarian, meal2.price)

Angus Beef ['French Fries'] ['Ketchup', 'Mustard'] False 4.95
Falafel ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'] ['Mayo'] True 6.45


In [38]:
#json.dumps({'meal1': meal1, 'meal2': meal2}) # gives a TypeError

In [39]:
import pickle

In [40]:
print(menu)

{'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo']}


In [41]:
from copy import deepcopy
menu_plus = deepcopy(menu)
menu_plus.update({'now': now, 'meal1': meal1, 'meal2': meal2})
print('Python:', menu_plus)

Python: {'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo'], 'now': datetime.datetime(2019, 2, 26, 13, 48, 56, 743523), 'meal1': <__main__.Meal object at 0x7f00b6fa5390>, 'meal2': <__main__.Meal object at 0x7f00b6c82080>}


In [42]:
pickled_menu_plus = pickle.dumps(menu_plus)
print('Pickle:', pickled_menu_plus)

Pickle: b'\x80\x03}q\x00(X\x05\x00\x00\x00mealsq\x01]q\x02(]q\x03(X\x10\x00\x00\x00Main + all sidesq\x04G@\x1b\xcc\xcc\xcc\xcc\xcc\xcde]q\x05(X\r\x00\x00\x00Main + 1 sideq\x06G@\x13\xcc\xcc\xcc\xcc\xcc\xcde]q\x07(X\x1b\x00\x00\x00Vegetarian Main + all sidesq\x08G@\x19\xcc\xcc\xcc\xcc\xcc\xcde]q\t(X\x18\x00\x00\x00Vegetarian Main + 1 sideq\nG@\x13\x00\x00\x00\x00\x00\x00eeX\x07\x00\x00\x00burgersq\x0b]q\x0c(}q\r(X\x04\x00\x00\x00typeq\x0eX\n\x00\x00\x00Angus Beefq\x0fX\n\x00\x00\x00vegetarianq\x10\x89u}q\x11(h\x0eX\x0c\x00\x00\x00Angus Cheeseq\x12h\x10\x89u}q\x13(h\x0eX\x0b\x00\x00\x00Lamb & Fetaq\x14h\x10\x89u}q\x15(h\x0eX\r\x00\x00\x00Cajun Chickenq\x16h\x10\x89u}q\x17(h\x0eX\x07\x00\x00\x00Falafelq\x18h\x10\x88ueX\x05\x00\x00\x00sidesq\x19]q\x1a(X\x11\x00\x00\x00Homemade Coleslawq\x1bX\x0c\x00\x00\x00French Friesq\x1cX\n\x00\x00\x00Side Saladq\x1dX\x0f\x00\x00\x00Corn on the Cobq\x1eeX\x06\x00\x00\x00saucesq\x1f]q (X\x07\x00\x00\x00Ketchupq!X\x07\x00\x00\x00Mustardq"X\x04\x00\x00\x00

Write the pickled `menu_plus` to a file.  (Note that Pickle doesn't save the `Meal` class definition, just the instances.)

In [43]:
with open('menu_plus.pickle', 'wb') as fout:
    pickle.dump(menu_plus, fout)

Read the pickled `menu_plus` from a file.

In [44]:
with open('menu_plus.pickle', 'rb') as fin:
    menu_plus1 = pickle.load(fin)

In [45]:
print(menu_plus1)

{'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo'], 'now': datetime.datetime(2019, 2, 26, 13, 48, 56, 743523), 'meal1': <__main__.Meal object at 0x7f00b6c764a8>, 'meal2': <__main__.Meal object at 0x7f00b6c76828>}


In [46]:
print(menu_plus1['meal1'].burger, menu_plus1['meal1'].sides, menu_plus1['meal1'].sauces, menu_plus1['meal1'].vegetarian, menu_plus1['meal1'].price)
print(menu_plus1['meal2'].burger, menu_plus1['meal2'].sides, menu_plus1['meal2'].sauces, menu_plus1['meal2'].vegetarian, menu_plus1['meal2'].price)

Angus Beef ['French Fries'] ['Ketchup', 'Mustard'] False 4.95
Falafel ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'] ['Mayo'] True 6.45


Another example (https://docs.python.org/3/library/pickle.html#examples).

In [47]:
# An arbitrary collection of objects supported by pickle.
data = {
    'a': [1, 2.0, 3, 4+6j],
    'b': ("character string", b"byte string"),
    'c': {None, True, False}
}
print('Python:', data)
print()

pickled_data = pickle.dumps(data)
print('Pickle:', pickled_data)

Python: {'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, None, True}}

Pickle: b'\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01]q\x02(K\x01G@\x00\x00\x00\x00\x00\x00\x00K\x03cbuiltins\ncomplex\nq\x03G@\x10\x00\x00\x00\x00\x00\x00G@\x18\x00\x00\x00\x00\x00\x00\x86q\x04Rq\x05eX\x01\x00\x00\x00bq\x06X\x10\x00\x00\x00character stringq\x07C\x0bbyte stringq\x08\x86q\tX\x01\x00\x00\x00cq\ncbuiltins\nset\nq\x0b]q\x0c(\x89N\x88e\x85q\rRq\x0eu.'


In [48]:
with open('data.pickle', 'wb') as fout:
    pickle.dump(data, fout)

In [49]:
with open('data.pickle', 'rb') as fin:
    data1 = pickle.load(fin)

In [50]:
data1

{'a': [1, 2.0, 3, (4+6j)],
 'b': ('character string', b'byte string'),
 'c': {False, None, True}}

### shelve

https://docs.python.org/3/library/shelve.html

A "shelf" is a persistent, dictionary-like object.

In [51]:
import shelve

In [52]:
shelf = shelve.open('shelf')
shelf['data'] = data
shelf['menu_plus'] = menu_plus
shelf.close()

In [53]:
shelf = shelve.open('shelf')
print(list(shelf.keys()))
data1 = shelf['data']
menu_plus1 = shelf['menu_plus']
shelf.close()

['data', 'menu_plus']


In [54]:
print(data1)
print()
print(menu_plus1)

{'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, None, True}}

{'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'], 'sauces': ['Ketchup', 'Mustard', 'Mayo'], 'now': datetime.datetime(2019, 2, 26, 13, 48, 56, 743523), 'meal1': <__main__.Meal object at 0x7f00b6c1bd68>, 'meal2': <__main__.Meal object at 0x7f00b6c1bdd8>}


## YAML (YAML Ain't Markup Language)

JSON is a subset of [YAML](http://yaml.org/) (v1.2).

YAML was designed to be more human-readable than JSON and to provide support for serializing arbitrary native data structures.  

The standard Python library does not include YAML handling, so you need to install a third-party library ([PyYAML](https://pyyaml.org/)).

In [55]:
!pip install PyYAML



In [56]:
import yaml

Dump YAML representation of `menu_plus`.  Note that a `datetime` object and instances of the `Meal` class are supported.

In [57]:
print(yaml.dump(menu_plus))

burgers:
- {type: Angus Beef, vegetarian: false}
- {type: Angus Cheese, vegetarian: false}
- {type: Lamb & Feta, vegetarian: false}
- {type: Cajun Chicken, vegetarian: false}
- {type: Falafel, vegetarian: true}
meal1: !!python/object:__main__.Meal
  burger: Angus Beef
  sauces: [Ketchup, Mustard]
  sides: [French Fries]
meal2: !!python/object:__main__.Meal
  burger: Falafel
  sauces: [Mayo]
  sides: [Homemade Coleslaw, French Fries, Side Salad, Corn on the Cob]
meals:
- [Main + all sides, 6.95]
- [Main + 1 side, 4.95]
- [Vegetarian Main + all sides, 6.45]
- [Vegetarian Main + 1 side, 4.75]
now: 2019-02-26 13:48:56.743523
sauces: [Ketchup, Mustard, Mayo]
sides: [Homemade Coleslaw, French Fries, Side Salad, Corn on the Cob]



In [58]:
print(data)

{'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, None, True}}


Dump YAML representation of `data`.

In [59]:
print(yaml.dump(data))

a: [1, 2.0, 3, !!python/complex '4.0+6.0j']
b: !!python/tuple
- character string
- !!binary |
  Ynl0ZSBzdHJpbmc=
c: !!set {false: null, null: null, true: null}



In [60]:
print(yaml.load(yaml.dump(data)))

{'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, None, True}}


Write a YAML file.

In [61]:
with open('menu_plus.yaml', 'wt') as fout:
    yaml.dump(menu_plus, fout)

Read a YAML file.

In [62]:
with open('menu_plus.yaml', 'rt') as fin:
    menu_plus1 = yaml.load(fin)

In [63]:
print(menu_plus1)

{'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'meal1': <__main__.Meal object at 0x7f00b6326dd8>, 'meal2': <__main__.Meal object at 0x7f00b6326cf8>, 'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'now': datetime.datetime(2019, 2, 26, 13, 48, 56, 743523), 'sauces': ['Ketchup', 'Mustard', 'Mayo'], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob']}


In [64]:
print('meals = {}'.format(menu_plus1['meals']))
print('burgers = {}'.format(menu_plus1['burgers']))
print('sides = {}'.format(menu_plus1['sides']))
print('sauces = {}'.format(menu_plus1['sauces']))

meals = [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]]
burgers = [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}]
sides = ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob']
sauces = ['Ketchup', 'Mustard', 'Mayo']


As with Pickle, note that YAML doesn't save the Meal class definition, just the instances.

In [65]:
print(menu_plus1['meal1'].burger, menu_plus1['meal1'].sides, menu_plus1['meal1'].sauces, menu_plus1['meal1'].vegetarian, menu_plus1['meal1'].price)
print(menu_plus1['meal2'].burger, menu_plus1['meal2'].sides, menu_plus1['meal2'].sauces, menu_plus1['meal2'].vegetarian, menu_plus1['meal2'].price)

Angus Beef ['French Fries'] ['Ketchup', 'Mustard'] False 4.95
Falafel ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob'] ['Mayo'] True 6.45


YAML allows multiple documents separated by "`---`" and comments starting with "`#`".

In [66]:
documents = '# First document.\n' + yaml.dump(menu) + '---\n' + '# Second document.\n' + yaml.dump(data)
print(documents)

# First document.
burgers:
- {type: Angus Beef, vegetarian: false}
- {type: Angus Cheese, vegetarian: false}
- {type: Lamb & Feta, vegetarian: false}
- {type: Cajun Chicken, vegetarian: false}
- {type: Falafel, vegetarian: true}
meals:
- [Main + all sides, 6.95]
- [Main + 1 side, 4.95]
- [Vegetarian Main + all sides, 6.45]
- [Vegetarian Main + 1 side, 4.75]
sauces: [Ketchup, Mustard, Mayo]
sides: [Homemade Coleslaw, French Fries, Side Salad, Corn on the Cob]
---
# Second document.
a: [1, 2.0, 3, !!python/complex '4.0+6.0j']
b: !!python/tuple
- character string
- !!binary |
  Ynl0ZSBzdHJpbmc=
c: !!set {false: null, null: null, true: null}



Write multiple YAML documents with `dump_all` and read them with `load_all`.

In [67]:
with open('documents.yaml', 'wt') as fout:
    yaml.dump_all([menu, data], fout)

In [68]:
with open('documents.yaml', 'rt') as fin:
    documents = yaml.load_all(fin)
    for document in documents:
        print(document)
        print()

{'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'sauces': ['Ketchup', 'Mustard', 'Mayo'], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob']}

{'a': [1, 2.0, 3, (4+6j)], 'b': ('character string', b'byte string'), 'c': {False, None, True}}



[LibYAML](https://pyyaml.org/wiki/LibYAML) is a C library for parsing and emitting YAML.  The LibYAML bindings are faster than the pure Python version, but Pickle is even faster.

In [69]:
print('Loading using YAML Loader:')
%timeit with open('menu_plus.yaml', 'rt') as fin: menu_plus1 = yaml.load(fin)
print()
print('Loading using YAML CLoader:')
%timeit with open('menu_plus.yaml', 'rt') as fin: menu_plus1 = yaml.load(fin, Loader=yaml.CLoader)
print()
print('Loading using Pickle:')
%timeit with open('menu_plus.pickle', 'rb') as fin: menu_plus1 = pickle.load(fin)

Loading using YAML Loader:
9.3 ms ± 316 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Loading using YAML CLoader:
959 µs ± 79.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Loading using Pickle:
32.4 µs ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Same timing tests for dumping.

In [70]:
print('Dumping using YAML Dumper:')
%timeit with open('menu_plus.yaml', 'wt') as fout: yaml.dump(menu_plus1, fout)
print()
print('Dumping using YAML CDumper:')
%timeit with open('menu_plus.yaml', 'wt') as fout: yaml.dump(menu_plus1, fout, Dumper=yaml.CDumper)
print()
print('Dumping using Pickle:')
%timeit with open('menu_plus.pickle', 'wb') as fout: pickle.dump(menu_plus1, fout)

Dumping using YAML Dumper:
6.19 ms ± 660 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Dumping using YAML CDumper:
1.54 ms ± 55.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Dumping using Pickle:
The slowest run took 4.13 times longer than the fastest. This could mean that an intermediate result is being cached.
1.04 ms ± 742 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


Import YAML `CLoader` and `CDumper` if available, otherwise use pure Python.

In [71]:
try:
    from yaml import CLoader as Loader, CDumper as Dumper
    print('Using LibYAML based parser and emitter.')
except ImportError:
    from yaml import Loader, Dumper
    print('Using pure Python parser and emitter.')

Using LibYAML based parser and emitter.


In [72]:
with open('menu.yaml', 'wt') as fout:
    yaml.dump(menu1, fout, Dumper=Dumper)

In [73]:
with open('menu.yaml', 'rt') as fin:
    menu1 = yaml.load(fin, Loader=Loader)

**Warning:** It is not safe to call `yaml.load` (or `pickle.load`) with any data received from an untrusted source!  The function `yaml.safe_load` limits this ability to simple Python objects.

In [74]:
#with open('menu_plus.yaml', 'rt') as fin:
#    menu_plus1 = yaml.safe_load(fin) # gives a ConstructorError

In [75]:
#with open('menu_plus_safe.yaml', 'wt') as fout:
#    yaml.safe_dump(menu_plus, fout) # gives a RepresenterError

Import YAML CSafeLoader and CSafeDumper if available, otherwise use pure Python.

In [76]:
try:
    from yaml import CSafeLoader as Loader, CSafeDumper as Dumper
    print('Using LibYAML based (Safe) parser and emitter.')
except ImportError:
    from yaml import SafeLoader as Loader, SafeDumper as Dumper
    print('Using pure Python (Safe) parser and emitter.')

Using LibYAML based (Safe) parser and emitter.


Note use of `yaml.load` (not `yaml.safe_load`) and `yaml.dump` (not `yaml.safe_dump`) when using the "Safe" variants of the Loader and Dumper.

In [77]:
#with open('menu_plus.yaml', 'rt') as fin:
#    menu_plus1 = yaml.load(fin, Loader=Loader) # gives a ConstructorError

In [78]:
#with open('menu_plus_safe.yaml', 'wt') as fout:
#    yaml.dump(menu_plus1, fout, Dumper=Dumper) # gives a RepresenterError

In [79]:
with open('menu.yaml', 'wt') as fout:
    yaml.dump(menu, fout, Dumper=Dumper)

In [80]:
with open('menu.yaml', 'rt') as fin:
    menu1 = yaml.load(fin, Loader=Loader)
print(menu1)

{'burgers': [{'type': 'Angus Beef', 'vegetarian': False}, {'type': 'Angus Cheese', 'vegetarian': False}, {'type': 'Lamb & Feta', 'vegetarian': False}, {'type': 'Cajun Chicken', 'vegetarian': False}, {'type': 'Falafel', 'vegetarian': True}], 'meals': [['Main + all sides', 6.95], ['Main + 1 side', 4.95], ['Vegetarian Main + all sides', 6.45], ['Vegetarian Main + 1 side', 4.75]], 'sauces': ['Ketchup', 'Mustard', 'Mayo'], 'sides': ['Homemade Coleslaw', 'French Fries', 'Side Salad', 'Corn on the Cob']}


Repeat timing tests limiting to simple Python objects and add JSON to comparison.  JSON is faster than YAML (with LibYAML bindings), but still slightly slower than Pickle.

In [81]:
print('Dumping using YAML SafeDumper:')
%timeit with open('menu.yaml', 'wt') as fout: yaml.dump(menu, fout, Dumper=yaml.SafeDumper)
print()
print('Dumping using YAML CSafeDumper:')
%timeit with open('menu.yaml', 'wt') as fout: yaml.dump(menu, fout, Dumper=yaml.CSafeDumper)
print()
print('Dumping using JSON:')
%timeit with open('menu.json', 'wt') as fout: json.dump(menu, fout)
print()
print('Dumping using Pickle:')
%timeit with open('menu.pickle', 'wb') as fout: pickle.dump(menu, fout)
print()

Dumping using YAML SafeDumper:
3.71 ms ± 667 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Dumping using YAML CSafeDumper:
1.51 ms ± 315 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Dumping using JSON:
1.2 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Dumping using Pickle:
789 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)



In [82]:
print('Loading using YAML SafeLoader:')
%timeit with open('menu.yaml', 'rt') as fin: menu1 = yaml.load(fin, Loader=yaml.SafeLoader)
print()
print('Loading using YAML CSafeLoader:')
%timeit with open('menu.yaml', 'rt') as fin: menu1 = yaml.load(fin, Loader=yaml.CSafeLoader)
print()
print('Loading using JSON:')
%timeit with open('menu.json', 'rt') as fin: menu1 = json.load(fin)
print()
print('Loading using Pickle:')
%timeit with open('menu.pickle', 'rb') as fin: menu1 = pickle.load(fin)

Loading using YAML SafeLoader:
5.3 ms ± 446 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Loading using YAML CSafeLoader:
555 µs ± 22.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Loading using JSON:
43.1 µs ± 2.81 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Loading using Pickle:
21.5 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


## Summary

Simple data format comprising rows and columns. $\Rightarrow$ Use CSV.

Nested data formats comprising key/value pairs and lists. $\Rightarrow$ Use JSON or YAML.

More complicated Python objects and binary data. $\Rightarrow$ Use Pickle or YAML.