![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)

<center>
<h1><font size="+3">GSFC Python Bootcamp</font></h1>
</center>

---
<center>
<H1 style="color:red">
Introduction to pickle
</H1>
</center>


In [None]:
from __future__ import print_function

## <font color="red"> Serialization and Deserialization</font>
* **Serialization** is a process of transforming objects or data structures into byte streams or strings. 
* These byte streams can then be stored or transferred easily. 
* This allows the developers to save, for example, configuration data or user's progress, and then store it (on disk or in a database) or send it to another location.
* The reverse process of serialization is known as **deserialization**.

## <font color="red"> What is pickle?</font>

* The module `pickle` is used for serializing and de-serializing a Python object structure. 
* Any object in python can be pickled so that it can be saved on disk. 
* `pickle` “serialises” the object first before writing it to file. 
* Pickling (serialization) is a way to convert a python object (list, dict, etc.) into a character stream which contains all the information necessary to reconstruct the object in another python script.

The following types can be serialized and deserialized using the `pickle` module:
* All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
* Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects
* Functions (pickled by their name references, and not by their value) and classes that are defined at the top level of a module


## <font color="red">Application of Pickling</font>

* Saving a program's state data to disk so that it can carry on where it left off when restarted (persistence)
* Sending python data over a TCP connection in a multi-core or distributed system (marshalling)
* Storing python objects in a database
* Converting an arbitrary python object to a string so that it can be used as a dictionary key (e.g. for caching & memoization)
* Machine Learning (saving <a href="https://pythonprogramming.net/pickle-classifier-save-nltk-tutorial/">trained ML algorithm</a>)

## <font color="red">How to Use pickle</font>

In [None]:
import pickle

### Python Object Serialization

The pickle module turns an arbitrary Python object into a series of bytes. This process is also called serialization. 
* Useful for storing data
* Inter process communication

In [None]:
data_org = { 'a':'A', 'b':2, 'c':3.0 } 
print('DATA:', data_org)

In [None]:
# Use pickle.dumps() to create a string representation of the value of the object.
data_string = pickle.dumps(data_org)
print('PICKLE:', data_string )

By default, the pickle will contain only ASCII characters. 

### Python Object De-Serialization

* Once the data is serialized, you can write it to a file, socket, pipe, etc. 
* Then later you can read the file and unpickle the data to construct a new object with the same values.

**Get the data back from the serialized object**

In [None]:
print('BEFORE: ', data_org)

In [None]:
data2 = pickle.loads(data_string)
print('AFTER:  ',data2)

In [None]:
print('EQUAL?:', (data_org == data2))
print('SAME ?:', (data_org is data2))

**Write pickled data to a file and Read the data back**

In [None]:
# Write data into a file
with open('pickled_data_file.pkl', 'wb') as fid:
     pickle.dump(data_org, fid)

In [None]:
# Read the data from the file
with open('pickled_data_file.pkl', 'rb') as fid:
     data3 = pickle.load(fid)

In [None]:
print('Data Before Write:', data_org)
print('Data After  Read :', data3)
print('EQUAL?:', (data_org == data3))

### Pickling and Unpickling Custom Objects

In [None]:
class Planets():
    def __init__(self):
        self.size = 0.0
        self.name = ''
    def set_size(self, num):
        self.size = num
    def set_name(self, name):
        self.name = name

In [None]:
mercury = Planets()
mercury.set_name('Mercury')
mercury.set_size(1516.0)

In [None]:
with open('test_pickle.pkl', 'wb') as pickle_out:
     pickle.dump(mercury, pickle_out)

In [None]:
with open('test_pickle.pkl', 'rb') as pickle_in:
     unpickled_mercury = pickle.load(pickle_in)

In [None]:
print("Name: ", unpickled_mercury.name)
print("Size: ", unpickled_mercury.size)

## <font color="red">Conclusions</font>

**Advantages**

1. Helps in saving complicated data.
2. Quite easy to use, doesn’t require several lines of code and hence not bulky.
3. Saved data is not so readable hence provides some data security.

**Disadvantages**

1. Non-Python programs may not be able to reconstruct pickled Python objects.
2. Security risks in unpickling data from malicious sources.