# Data Serialization

* Pickle
* Shelve
* JSON
* Numpy: npy, npz
* CSV
* HDF5, NetCDF
* Xarray
* Parquet


https://towardsdatascience.com/the-best-format-to-save-pandas-data-414dca023e0d


## Pickle and Shelve

Native Python serialization

* [Pickle](https://docs.python.org/3/library/pickle.html).
* [Shelve](https://docs.python.org/3/library/shelve.html).

### What can be pickled and unpickled?

The following types can be pickled:

* None, True, and False;
* integers, floating-point numbers, complex numbers;
* strings, bytes, bytearrays;
* tuples, lists, sets, and dictionaries containing only picklable objects;

For our purposes, the list stops here. In reality, the following can also be pickled, but I do not recommend doing so for now. There are scenarios where pickling these things makes sense, but they are more advanced and we will not discuss them for now:

* functions (built-in and user-defined) defined at the top level of a module (using def, not lambda);
* classes defined at the top level of a module;
* instances of such classes whose `__dict__` or the result of calling `__getstate__()` is picklable.

```{warning}

The `pickle` module **is not secure**. Only unpickle data you trust.

It is possible to construct malicious pickle data which will **execute arbitrary code during unpickling**. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.

Consider signing data with hmac if you need to ensure that it has not been tampered with.

Safer serialization formats such as json may be more appropriate if you are processing untrusted data.
```


In [7]:
import pickle
pickle.dumps(1)

b'\x80\x04K\x01.'

In [8]:
pickle.dumps([1])

b'\x80\x04\x95\x06\x00\x00\x00\x00\x00\x00\x00]\x94K\x01a.'

In [10]:
a = [1, 1.5, "hello", {3, 4}, {'int': 9, 'real': 9.0, 'complex': 9j}]

with open('data.pkl', 'wb') as f:
    pickle.dump(a, f)
    
with open('data.pkl', 'rb') as f:
    b = pickle.load(f)
    
b

[1, 1.5, 'hello', {3, 4}, {'int': 9, 'real': 9.0, 'complex': 9j}]

## A shelf of pickles

The `shelve` module provides a disk-stored object that behaves like a dict, whose keys are 

In [3]:
from pathlib import Path

import pandas as pd

df = pd.read_csv(Path.home()/"shared/climate-data/monthly_in_situ_co2_mlo_cleaned.csv")
df

Unnamed: 0,year,month,date_index,fraction_date,c02,data_adjusted_season,data_fit,data_adjusted_seasonally_fit,data_filled,data_adjusted_seasonally_filed
0,1958,1,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,1958,2,21231,1958.1260,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
2,1958,3,21259,1958.2027,315.70,314.43,316.19,314.90,315.70,314.43
3,1958,4,21290,1958.2877,317.45,315.16,317.30,314.98,317.45,315.16
4,1958,5,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71
...,...,...,...,...,...,...,...,...,...,...
763,2021,8,44423,2021.6219,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
764,2021,9,44454,2021.7068,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
765,2021,10,44484,2021.7890,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
766,2021,11,44515,2021.8740,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99


In [5]:
df.to_feather("co2.fth")
%ls -l co2*

-rw-r--r--  1 fperez  staff  31850 19 Apr 17:00 co2.fth


In [6]:
df2 = pd.read_feather("co2.fth")
df2

Unnamed: 0,year,month,date_index,fraction_date,c02,data_adjusted_season,data_fit,data_adjusted_seasonally_fit,data_filled,data_adjusted_seasonally_filed
0,1958,1,21200,1958.0411,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,1958,2,21231,1958.1260,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
2,1958,3,21259,1958.2027,315.70,314.43,316.19,314.90,315.70,314.43
3,1958,4,21290,1958.2877,317.45,315.16,317.30,314.98,317.45,315.16
4,1958,5,21320,1958.3699,317.51,314.71,317.86,315.06,317.51,314.71
...,...,...,...,...,...,...,...,...,...,...
763,2021,8,44423,2021.6219,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
764,2021,9,44454,2021.7068,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
765,2021,10,44484,2021.7890,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
766,2021,11,44515,2021.8740,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
