# Lecture 11


## Data Serialization

Data serialization refers to the process of converting data (usually in memory) that may have complex structure (e.g. a tree), into a linear sequence that can be use to reconstitute the original data structure. Such a sequence can be stored in a file or transmitted over a network. 

For example consider the following "simple" data structure:

In [3]:
# Simple Data Type

data_dict = { "A": 1, 
              "B": "Foo"}

### Python `repr`

The python `repr` method of build-ins and classes you implement can be used as a means of serialization. Take any python built in and you can see it's string representation, which is essentially a string of python code that can evaluates to the object:

In [4]:
repr(data_dict)

"{'A': 1, 'B': 'Foo'}"

This representation can be easily written to a file:

In [5]:
with open('file.py',"w") as f: 
    f.write(repr(data_dict))

In [6]:
!cat file.py

{'A': 1, 'B': 'Foo'}

And reconstituted by evaluating the contents of the file:

In [7]:
with open('file.py', 'r') as f: 
    data_dict_reloaded = eval(f.read())

data_dict_reloaded

{'A': 1, 'B': 'Foo'}

Note that `eval` uses the python interpreter to execute python expressions stored in strings:

In [8]:
eval("print('Hello World')")

Hello World


In [9]:
x=eval("1+1")
x

2

### YAML

There are other standard formats for storing simple data types. For example YAML:

In [10]:
import yaml
yaml.dump(data_dict)

'A: 1\nB: Foo\n'

In [11]:
with open('file.yaml',"w") as f: 
    f.write(yaml.dump(data_dict))

In [12]:
!cat file.yaml

A: 1
B: Foo


In [13]:
with open('file.yaml', 'r') as f: 
    data_dict_reloaded = yaml.safe_load(f.read())

data_dict_reloaded

{'A': 1, 'B': 'Foo'}

### JSON

[JSON](https://www.json.org/json-en.html) is commonly used to transmit data on the web:

In [14]:
import json
json.dumps(data_dict)

'{"A": 1, "B": "Foo"}'

In [15]:
with open('file.json',"w") as f: 
    json.dump(data_dict,f)

In [16]:
!cat file.json

{"A": 1, "B": "Foo"}

In [17]:
with open('file.json', 'r') as f: 
    data_dict_reloaded = json.load(f)

data_dict_reloaded

{'A': 1, 'B': 'Foo'}

### XML

XML is another format commonly used for storing data. It allows a bit more structure and there are python tools for creating XML representations of data, but it's a bit more complicated than the example above, so we'll skip it for now.

### pickle

[pickle](https://docs.python.org/3/library/pickle.html) is python's method of serialing objects. Some advantages are that it is a binary format, so it is more compact, and that it can store full python objects, not just simple built-ins. Lets look at the [pickle documentation](https://docs.python.org/3/library/pickle.html) first.

Here is an example:

In [18]:
import pickle
pickle.dumps(data_dict,protocol=2)

b'\x80\x02}q\x00(X\x01\x00\x00\x00Aq\x01K\x01X\x01\x00\x00\x00Bq\x02X\x03\x00\x00\x00Fooq\x03u.'

In [19]:
with open('file.pickle',"wb") as f: 
    pickle.dump(data_dict,f)

In [20]:
!cat file.pickle

��       }�(�A�K�B��Foo�u.

In [21]:
with open('file.pickle', 'rb') as f: 
    data_dict_reloaded = pickle.load(f)

data_dict_reloaded

{'A': 1, 'B': 'Foo'}

## Python classes

Imagine you have data stored in a python object:

In [27]:
# Instance of a python class with data

class data_class:
    def __init__(self):
        self._data = dict()
    
    def add(self,key,value):
        self._data[key]=value
        
    def get(self,key):
        return self._data[key]
    
    def __repr__(self):
        return self._data.__repr__()

data_class_instance = data_class()
data_class_instance.add("A",1)
data_class_instance.add("B","Foo")

print("Value of A:", data_class_instance.get("A"))
print("Value of B:", data_class_instance.get("B"))

Value of A: 1
Value of B: Foo


Since we implemented `__repr__`, I should be able to store the data using `repr`:

In [28]:
with open('file.py',"w") as f: 
    f.write(repr(data_class_instance))

In [29]:
with open('file.py', 'r') as f: 
    data_class_instance_reloaded = eval(f.read())

data_class_instance_reloaded

{'A': 1, 'B': 'Foo'}

But what I get back is not the original object reconstituted, but a dictionary holding the data:

In [30]:
type(data_class_instance_reloaded)

dict

In [31]:
data_class_instance_reloaded.add("C",2)

AttributeError: 'dict' object has no attribute 'add'

In [32]:
data_class_instance_reloaded

{'A': 1, 'B': 'Foo'}

### pickle

Pickle allows me to store the object:

In [33]:
with open('file.pickle',"wb") as f: 
    pickle.dump(data_class_instance,f)

In [34]:
with open('file.pickle', 'rb') as f: 
    data_class_instance_reloaded = pickle.load(f)

data_class_instance_reloaded

{'A': 1, 'B': 'Foo'}

In [35]:
type(data_class_instance_reloaded)

__main__.data_class

In [36]:
data_class_instance_reloaded.add("C",2)

## Storing Multiple Objects into Pickle

Use a dictionary.

In [37]:
data_class_instance_2 = data_class()
data_class_instance_2.add("C",2)
data_class_instance_2.add("D","Bar")

In [38]:
with open('file.pickle',"wb") as f: 
    pickle.dump({"my_class":data_class_instance,
                 "my_class_2":data_class_instance_2},
                f)

In [39]:
with open('file.pickle', 'rb') as f: 
    loaded_data = pickle.load(f)

data_class_instance_reloaded = loaded_data["my_class"]
data_class_instance_reloaded_2 = loaded_data["my_class_2"]

## Pickling Data

In [40]:
import numpy as np
M = np.random.random((1000,1000))

In [41]:
with open('M.pickle',"wb") as f: 
    pickle.dump(M, f)

In [44]:
np.save("M.npy",M)

In [45]:
!ls -lh

total 32088
-rw-r--r--@ 1 afarbin  staff    15K Oct  6 12:35 Lecture.11.ipynb
-rw-r--r--@ 1 afarbin  staff   7.6M Oct  6 12:36 M.npy
-rw-r--r--@ 1 afarbin  staff   7.6M Oct  6 12:35 M.pickle
-rw-r--r--@ 1 afarbin  staff    20B Oct  6 12:33 file.json
-rw-r--r--@ 1 afarbin  staff   132B Oct  6 12:35 file.pickle
-rw-r--r--@ 1 afarbin  staff    20B Oct  6 12:34 file.py
-rw-r--r--@ 1 afarbin  staff    12B Oct  6 12:33 file.yaml


In [46]:
M_list=M.tolist()

In [47]:
with open('M_list.pickle',"wb") as f: 
    pickle.dump(M, f)

In [50]:
!ls -lh

total 16440
-rw-r--r--@ 1 afarbin  staff    17K Oct  6 12:37 Lecture.11.ipynb
-rw-r--r--@ 1 afarbin  staff   7.6M Oct  6 12:36 M.npy
-rw-r--r--@ 1 afarbin  staff    20B Oct  6 12:34 file.py


In [53]:
!rm *.pickle *.yaml *.json file.py

rm: *.pickle: No such file or directory
rm: *.yaml: No such file or directory
rm: *.json: No such file or directory
