<a href="https://colab.research.google.com/github/ahmadaking/Comparisons/blob/master/useful_modules/introduction_pickle.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg)

<center>
<h1><font size="+3">GSFC Python Bootcamp</font></h1>
</center>

---
<center>
<H1 style="color:red">
Serialization and Deserialization with pickle
</H1>
</center>


In [None]:
from __future__ import print_function

## <font color="red"> Serialization and Deserialization</font>
* **Serialization** is a process of transforming objects or data structures into byte streams or strings. 
* These byte streams can then be stored or transferred easily. 
* This allows the developers to save, for example, configuration data or user's progress, and then store it (on disk or in a database) or send it to another location.
* The reverse process of serialization is known as **deserialization**.

## <font color="red"> What is pickle?</font>

* The module `pickle` is used for serializing and deserializing a Python object structure. 
* Any object in python can be pickled so that it can be saved on disk. 
* `pickle` “serialises” the object first before writing it to file. 
* Pickling (serialization) is a way to convert a python object (list, dict, etc.) into a character stream which contains all the information necessary to reconstruct the object in another python script.

The following types can be serialized and deserialized using the `pickle` module:
* All native datatypes supported by Python (booleans, None, integers, floats, complex numbers, strings, bytes, byte arrays)
* Dictionaries, sets, lists, and tuples - as long as they contain pickleable objects
* Functions (pickled by their name references, and not by their value) and classes that are defined at the top level of a module


## <font color="red">Applications of Pickling</font>

* Saving a program's state data to disk so that it can carry on where it left off when restarted (persistence)
* Sending python data over a TCP connection in a multi-core or distributed system (marshalling)
* Storing python objects in a database
* Converting an arbitrary python object to a string so that it can be used as a dictionary key (e.g. for caching & memoization)
* Machine Learning (saving <a href="https://pythonprogramming.net/pickle-classifier-save-nltk-tutorial/">trained ML algorithm</a>)

## <font color="red">How to Use pickle</font>

In [None]:
import pickle

The main functions of `pickle` are:

* `dump()`: pickles data by accepting data and a file object.
* `load()`: takes a file object, reconstruct the objects from the pickled representation, and returns it.
* `dumps()`: returns the pickled data as a string.
* `loads()`: reads the pickled data from a string.

`dump()`/`load()` serializes/deserializes objects through files but `dumps()`/`loads()` serializes/deserializes objects through string representation.

### Python Object Serialization

The pickle module turns an arbitrary Python object into a series of bytes. This process is also called serialization. 
* Useful for storing data
* Inter process communication

In [None]:
data_org = { 'a':'A', 'b':2, 'c':3.0 } 
print('DATA:', data_org)

The `dumps()` function creates a string representation of the value of the object.

In [None]:
data_string = pickle.dumps(data_org)
print('PICKLE:', data_string )

By default, the pickle will contain only ASCII characters. 

### Python Object Deserialization

* Once the data is serialized, you can write it to a file, socket, pipe, etc. 
* Then later you can read the file and unpickle the data to construct a new object with the same values.

**Get the data back from the serialized object**

In [None]:
print('BEFORE: ', data_org)

The `loads()` function reconstructs the objects from the pickled string representation.

In [None]:
data2 = pickle.loads(data_string)
print('AFTER:  ',data2)

In [None]:
print('EQUAL?:', (data_org == data2))
print('SAME ?:', (data_org is data2))

**Write pickled data to a file and Read the data back**

The `dump()` function serializes the data and writes it to the file.

In [None]:
with open('pickled_data_file.pkl', 'wb') as fid:
     pickle.dump(data_org, fid)

The `load()` function takes a file object, reconstruct the objects from the pickled representation, and returns it.

In [None]:
# Read the data from the file
with open('pickled_data_file.pkl', 'rb') as fid:
     data3 = pickle.load(fid)

In [None]:
print('Data Before Write:', data_org)
print('Data After  Read :', data3)
print('EQUAL?:', (data_org == data3))

### Pickling and Unpickling Custom Objects

**Example 1**: Instance of a class

In [None]:
class Planets:
      def __init__(self, platnet_name, planet_size):
          self.size = planet_size
          self.name = platnet_name

In [None]:
mercury = Planets('Mercury', 1516.0)

* The file is opened in binary mode for writing. 

In [None]:
with open('pickle_instance.pkl', 'wb') as pickle_out:
     pickle.dump(mercury, pickle_out)

* The file is opened in binary mode for reading. 

In [None]:
with open('pickle_instance.pkl', 'rb') as pickle_in:
     unpickled_mercury = pickle.load(pickle_in)

In [None]:
print("Name: ", unpickled_mercury.name)
print("Size: ", unpickled_mercury.size)

**Example 2**: Collection of objects

In [None]:
def my_func():
    return "my_func was called"

In [None]:
with open('pickle_objects.pkl', 'wb') as pickle_out:
     # serialize class object
     pickle.dump(Planets, pickle_out)
     # serialize class instance
     pickle.dump(Planets('Jupiter', 43441), pickle_out)
     # serialize function object
     pickle.dump(my_func, pickle_out)
     # serialize complex number
     pickle.dump(3.7 + 2.5j, pickle_out)
     # serialize bytes object
     pickle.dump(bytes([1, 2, 3, 4, 5]), pickle_out)

* Objects are returned in the same order in which we have pickled them in the first place. 
* When there is no more data to return, the `load()` function throws `EOFError`.

In [None]:
with open('pickle_objects.pkl', 'rb') as pickle_in:
     # deserialize class object
     NewPlanets = pickle.load(pickle_in)
     # deserialize class instance
     new_jupiter = pickle.load(pickle_in)
     # deserialize function object
     new_func = pickle.load(pickle_in)
     # deserialize complex number
     new_complex = pickle.load(pickle_in)
     # deserialize bytes object
     new_byte = pickle.load(pickle_in)
     pickle.load(pickle_in)

* Once you have unpickled the data you can use it like an ordinary Python object.

In [None]:
mercury = NewPlanets('Mercury', 1516.0)
print(mercury.name, mercury.size)

In [None]:
print(new_jupiter.name, new_jupiter.size)

In [None]:
new_func()

In [None]:
print("Complex Number: ", new_complex)
print("Byte object: ", new_byte)

## <font color="red">Conclusions</font>

**Advantages**

1. Helps in saving complicated data.
2. Quite easy to use, doesn’t require several lines of code and hence not bulky.
3. Saved data is not so readable hence provides some data security.

**Disadvantages**

1. Non-Python programs may not be able to reconstruct pickled Python objects.
2. Security risks in unpickling data from malicious sources.

**When to Pickle**

* Pickling is useful for applications where you need some degree of persistency in your data. Your program's state data can be saved to disk, so you can continue working on it later on. 
* It can also be used to send data over a Transmission Control Protocol (TCP) or socket connection, or to store python objects in a database. 
* Pickle is very useful for when you're working with machine learning algorithms, where you want to save them to be able to make new predictions at a later time, without having to rewrite everything or train the model all over again.

**When Not to Pickle**

* If you want to use data across different programming languages, pickle is not recommended. Its protocol is specific to Python, thus, cross-language compatibility is not guaranteed. 
* The same holds for different versions of Python itself. Unpickling a file that was pickled in a different version of Python may not always work properly, so you have to make sure that you're using the same version and perform an update if necessary. 
* You should also try not to unpickle data from an untrusted source. Malicious code inside the file might be executed upon unpickling.