[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/nuitrcs/NextStepsInPython/blob/master/pickleJson/pickle.ipynb)

# <br>json and pickle files

Today we're talking about saving Python objects like dictionaries, lists and strings.

If you haven't already found a need to save Python objects, you will.

If your Jupyter notebook is getting really long and slow, save that dictionary that you carefully built, shut down your kernel, and continue your analysis in another notebook. 

My general advice is, at a minimum, to be sure to keep data collection, cleaning, analysis, and visualization in different notebooks. It is also often good to separate different analyses, especially if they produce large objects that are saved in memory, as this will slow down your notebook.

### <br><br>Saving and loading objects using built-in Python functions

You might already save and load dictionaries and lists with your own code by **writing** text files, and then **reading** and parsing text files.

For example:

In [1]:
speed_dict = {"Peregrine falcon": "242 mph", 
              "Golden eagle": "150–200 mph", 
              "White-throated needletail swift": "105 mph", 
              "Eurasian hobby": "100 mph", 
              "Mexican free-tailed bat": "100 mph", 
              "Frigatebird": "95 mph", 
              "Rock dove (pigeon)": "92.5 mph", 
              "Spur-winged goose": "88 mph", 
              "Gyrfalcon": "80 mph", 
              "Grey-headed albatross": "79 mph", 
              "Cheetah": "68.0–75.0 mph", 
              "Sailfish": "67.85 mph", 
              "Anna's hummingbird": "61.06 mph", 
              "Swordfish": "60 mph", 
              "Pronghorn": "55.0 mph", 
              "Springbok": "55 mph", 
              "Quarter Horse": "55.0 mph", 
              "Blue wildebeest": "50.0 mph", 
              "Lion": "50.0 mph", 
              "Blackbuck": "50 mph"}

<br>We can open a new file, loop through the keys and values in our dictionary, and write them to the file:

In [2]:
with open("fastestAnimals.txt", "w") as f:
    for k, v in speed_dict.items():
        f.write(k + "\t" + v + "\n")

<br><br>Then we can read in the file and change its contents into a Python object. I often use a list comprehension or dictionary comprehension to change .txt files into a list or dictionary.

In [3]:
with open("fastestAnimals.txt", "r") as f:
    txt_speed_dict = {line.split("\t")[0]: line.rstrip("\n").split("\t")[1] for line in f}

In [4]:
txt_speed_dict

{'Peregrine falcon': '242 mph',
 'Golden eagle': '150–200 mph',
 'White-throated needletail swift': '105 mph',
 'Eurasian hobby': '100 mph',
 'Mexican free-tailed bat': '100 mph',
 'Frigatebird': '95 mph',
 'Rock dove (pigeon)': '92.5 mph',
 'Spur-winged goose': '88 mph',
 'Gyrfalcon': '80 mph',
 'Grey-headed albatross': '79 mph',
 'Cheetah': '68.0–75.0 mph',
 'Sailfish': '67.85 mph',
 "Anna's hummingbird": '61.06 mph',
 'Swordfish': '60 mph',
 'Pronghorn': '55.0 mph',
 'Springbok': '55 mph',
 'Quarter Horse': '55.0 mph',
 'Blue wildebeest': '50.0 mph',
 'Lion': '50.0 mph',
 'Blackbuck': '50 mph'}

<br><br>**Do you have to write your own code to save and then recreate these objects? Is there something faster than looping through all the items?**

### <br><br>Serializing Python objects

The process of converting a Python object into a format that can be stored is called **serialization**. The process of reconstructing that data is called **deserialization**.

We will cover two different file formats for serialization: pickle and json. There are now many other formats for serializing data, but these are two of the most recognizable formats, and they have very simple commands. <br><br>First let's go over the differences between the two and why you might want to use one or the other.

#### <br><br>pickle

- commonly ends in .pkl or .pickle
- stores objects in binary format
- can be used to save any Python object, including your object classes and functions
- pickle objects can only be opened in Python
- is binary, so is not human readable - you can't open the files and read the data
- WARNING: can be used to store malicious code!!! So never ever open pickle files you receive from someone else. Only open your own pickle files, and only if no one else has had access to them. 

#### <br><br>json

- ends in .json
- can be used to store strings, integers, floats, lists, dictionaries, tuples, and booleans. Cannot store classes or functions
- can be opened in other languages
- is human readable - you can look at the actual file and read your data
- cannot be used to store malicious code
- usually a little faster to write and read than pickle

### <br><br>Let's practice with pickle.

In [5]:
speed_dict

{'Peregrine falcon': '242 mph',
 'Golden eagle': '150–200 mph',
 'White-throated needletail swift': '105 mph',
 'Eurasian hobby': '100 mph',
 'Mexican free-tailed bat': '100 mph',
 'Frigatebird': '95 mph',
 'Rock dove (pigeon)': '92.5 mph',
 'Spur-winged goose': '88 mph',
 'Gyrfalcon': '80 mph',
 'Grey-headed albatross': '79 mph',
 'Cheetah': '68.0–75.0 mph',
 'Sailfish': '67.85 mph',
 "Anna's hummingbird": '61.06 mph',
 'Swordfish': '60 mph',
 'Pronghorn': '55.0 mph',
 'Springbok': '55 mph',
 'Quarter Horse': '55.0 mph',
 'Blue wildebeest': '50.0 mph',
 'Lion': '50.0 mph',
 'Blackbuck': '50 mph'}

<br>First, we import the pickle module.

In [6]:
import pickle

<br>To write our dictionary to a pickle file, we open the filename in write mode using a with statement, but we have to open in **write binary mode**.

After our with statement, we **dump** the data into the pickle file.

In [7]:
with open("fastestAnimals.pkl", "wb") as f:
    pickle.dump(speed_dict, f)

<br>To load a pickle file, we use the same syntax, except we open the file in **read binary mode** and we **load** the data.

In [8]:
with open("fastestAnimals.pkl", "rb") as f:
    pkl_speed_dict = pickle.load(f)

In [9]:
pkl_speed_dict

{'Peregrine falcon': '242 mph',
 'Golden eagle': '150–200 mph',
 'White-throated needletail swift': '105 mph',
 'Eurasian hobby': '100 mph',
 'Mexican free-tailed bat': '100 mph',
 'Frigatebird': '95 mph',
 'Rock dove (pigeon)': '92.5 mph',
 'Spur-winged goose': '88 mph',
 'Gyrfalcon': '80 mph',
 'Grey-headed albatross': '79 mph',
 'Cheetah': '68.0–75.0 mph',
 'Sailfish': '67.85 mph',
 "Anna's hummingbird": '61.06 mph',
 'Swordfish': '60 mph',
 'Pronghorn': '55.0 mph',
 'Springbok': '55 mph',
 'Quarter Horse': '55.0 mph',
 'Blue wildebeest': '50.0 mph',
 'Lion': '50.0 mph',
 'Blackbuck': '50 mph'}

### <br><br>Pickle Exercise

In [10]:
speed_list = ["Peregrine falcon", 
              "Golden eagle", 
              "White-throated needletail swift", 
              "Eurasian hobby", 
              "Mexican free-tailed bat", 
              "Frigatebird", 
              "Rock dove (pigeon)", 
              "Spur-winged goose", 
              "Gyrfalcon", 
              "Grey-headed albatross", 
              "Cheetah", 
              "Sailfish", 
              "Anna's hummingbird", 
              "Swordfish", 
              "Pronghorn", 
              "Springbok", 
              "Quarter Horse", 
              "Blue wildebeest", 
              "Lion", 
              "Blackbuck"]

Run the code above to save `speed_list`. Write code to save the speed_list as a pickle file called `speedyAnimals.pkl`

In [11]:
with open("speedyAnimals.pkl", "wb") as f:
    pickle.dump(speed_list, f)

Now write code to open the pickle file you just created as a list called `pickle_list`.

In [12]:
with open("speedyAnimals.pkl", "rb") as f:
    pickle_list = pickle.load(f)

In [13]:
print(pickle_list)

['Peregrine falcon', 'Golden eagle', 'White-throated needletail swift', 'Eurasian hobby', 'Mexican free-tailed bat', 'Frigatebird', 'Rock dove (pigeon)', 'Spur-winged goose', 'Gyrfalcon', 'Grey-headed albatross', 'Cheetah', 'Sailfish', "Anna's hummingbird", 'Swordfish', 'Pronghorn', 'Springbok', 'Quarter Horse', 'Blue wildebeest', 'Lion', 'Blackbuck']


<br><br>Notice that with both the dictionary and the list, pickle automatically maintained the type of Python object without us needing to specify what type of object it was. 

### <br><br>Let's practice with json.

In [14]:
import json

<br>With json, we can open in regular write and read modes. We still use dump and load.

In [15]:
with open("fastestAnimal.json", "w") as f:
    json.dump(speed_dict, f)

In [16]:
with open("fastestAnimal.json", "r") as f:
    json_speed_dict = json.load(f)

In [17]:
json_speed_dict

{'Peregrine falcon': '242 mph',
 'Golden eagle': '150–200 mph',
 'White-throated needletail swift': '105 mph',
 'Eurasian hobby': '100 mph',
 'Mexican free-tailed bat': '100 mph',
 'Frigatebird': '95 mph',
 'Rock dove (pigeon)': '92.5 mph',
 'Spur-winged goose': '88 mph',
 'Gyrfalcon': '80 mph',
 'Grey-headed albatross': '79 mph',
 'Cheetah': '68.0–75.0 mph',
 'Sailfish': '67.85 mph',
 "Anna's hummingbird": '61.06 mph',
 'Swordfish': '60 mph',
 'Pronghorn': '55.0 mph',
 'Springbok': '55 mph',
 'Quarter Horse': '55.0 mph',
 'Blue wildebeest': '50.0 mph',
 'Lion': '50.0 mph',
 'Blackbuck': '50 mph'}

### <br><br>Json Exercise

In [18]:
falcon = "Peregrine falcons are among the world's most common birds of prey and live on all continents except Antarctica. They prefer wide-open spaces, and thrive near coasts where shorebirds are common, but they can be found everywhere from tundra to deserts. Peregrines are even known to live on bridges and skyscrapers in major cities."
print(falcon)

Peregrine falcons are among the world's most common birds of prey and live on all continents except Antarctica. They prefer wide-open spaces, and thrive near coasts where shorebirds are common, but they can be found everywhere from tundra to deserts. Peregrines are even known to live on bridges and skyscrapers in major cities.


Run the code above to store the string `falcon`. Write code to save the string as a json file called `falcon_info.json`. 

In [22]:
with open("falcon_info.json", "w") as f:
    json.dump(falcon, f)

Write code to open up the file you just created. Save it as a string `json_string`.

In [23]:
with open("falcon_info.json", "r") as f:
    json_string = json.load(f)

In [24]:
print(json_string)

Peregrine falcons are among the world's most common birds of prey and live on all continents except Antarctica. They prefer wide-open spaces, and thrive near coasts where shorebirds are common, but they can be found everywhere from tundra to deserts. Peregrines are even known to live on bridges and skyscrapers in major cities.


### <br><br>BONUS: Saving pandas dataframes

You can always save a pandas dataframe as a csv, which is a type of plain text file that is human readable. You can also save a dataframe as a pickle or json file, which are faster to write and faster to read.

In [25]:
import pandas as pd

In [26]:
df = pd.DataFrame.from_dict(speed_dict, orient="index", columns=["speed"])

In [27]:
df

Unnamed: 0,speed
Peregrine falcon,242 mph
Golden eagle,150–200 mph
White-throated needletail swift,105 mph
Eurasian hobby,100 mph
Mexican free-tailed bat,100 mph
Frigatebird,95 mph
Rock dove (pigeon),92.5 mph
Spur-winged goose,88 mph
Gyrfalcon,80 mph
Grey-headed albatross,79 mph


<br>Now that we have a dataframe, we can save it as a csv:

In [28]:
df.to_csv("speeds.csv")

<br>Or we can save it as a pickle:

In [29]:
df.to_pickle("speeds.pkl")

<br>To confirm that it worked, let's load our pickled dataframe back in:

In [30]:
df2 = pd.read_pickle("speeds.pkl")

In [31]:
df2

Unnamed: 0,speed
Peregrine falcon,242 mph
Golden eagle,150–200 mph
White-throated needletail swift,105 mph
Eurasian hobby,100 mph
Mexican free-tailed bat,100 mph
Frigatebird,95 mph
Rock dove (pigeon),92.5 mph
Spur-winged goose,88 mph
Gyrfalcon,80 mph
Grey-headed albatross,79 mph


<br>You can also pickle a dataframe object the same way we did earlier, by opening a new file in write binary mode and dumping the dataframe.

<br>Let's save our dataframe with json. You **cannot** save a dataframe as a json file using the open/dump method, but you can use this pandas function:

In [32]:
df.to_json("speeds.json")

In [33]:
df3 = pd.read_json("speeds.json")

In [34]:
df3

Unnamed: 0,speed
Anna's hummingbird,61.06 mph
Blackbuck,50 mph
Blue wildebeest,50.0 mph
Cheetah,68.0–75.0 mph
Eurasian hobby,100 mph
Frigatebird,95 mph
Golden eagle,150–200 mph
Grey-headed albatross,79 mph
Gyrfalcon,80 mph
Lion,50.0 mph


<br>There are other formats you can use to serialize pandas dataframes that are safer and faster than pickle, including *feather* and *parquet*. 

<br>Also, keep in mind that json and pickle are used for storing and loading your Python objects faster, but they are not used for compression. If smaller file size is your goal, you will want to use a compression package, with or without serialization, like *gzip* or *brotli*.