# Serialization: More than Pickling
PyCon US 2022  
Joe Lucas

## What is "serialization"?

**to serialize** (_verb_): to translate a data structure into something that can be _stored_, _transmitted_, or _reconstructed_ later.

### Why?

You've spent hours training a machine learning model. How do you save it and use it later?

You've built an in-memory object with costly, time-dependent queries (e.g. a snapshot). How do you share it with colleagues?

### Types

Plaintext vs. **Binary**

In [None]:
from utils import EBIRD_KEY

In [None]:
import requests
from sklearn.tree import DecisionTreeClassifier

class Bird_Counter:
    def __init__(self):
        self.count = 0
        self.clf = None
    
    def get_birds(self):
        url = "https://api.ebird.org/v2/data/obs/US-UT/recent"
        payload={}
        headers = {'X-eBirdApiToken': EBIRD_KEY}
        response = requests.request("GET", url, headers=headers, data=payload)
        self.count = len(response.text)
    
    def predict_birds(self):
        self.clf = DecisionTreeClassifier(random_state=1337)

In [None]:
%%time
b = Bird_Counter()
b.get_birds()
b.predict_birds()
print(f"There were {b.count} birds.")
print(f"Our classifier random state is {b.clf.random_state}")

## [Pickle](https://docs.python.org/3/library/pickle.html)

In [None]:
import pickle

with open("bird.pkl", "wb") as f:
    pickle.dump(b, f)  # <-- What's this look like on disk?

In [None]:
with open("bird.pkl", "rb") as f:
    c = pickle.load(f)

print(f"There were {c.count} birds.")
print(f"Our classifier random state is {c.clf.random_state}")

**Let's share our pickle with a friend.**

### Pros

1. Standard Library
2. We didn't have to define a schema

### Cons

1. Security Considerations
2. Only interoperable with Python
3. `load` still requires access to the class definition

### How is it used?

https://github.com/scikit-learn/scikit-learn/search?l=Python&p=2&q=pickle


## References

https://github.com/trailofbits/fickling