# Why turn into a Pickle?

![](https://miro.medium.com/max/700/1*0YM3ADGfo0yo8FlZ9xT-0g.jpeg)

Image by [Photo Mix](https://pixabay.com/users/photomix-company-1546875/) from [Pixabay](https://pixabay.com/) \[I did not get a free-use Pickle Rick Image :( \]

## What is Pickle

Pickle is a python module used to serialize a python object into a binary format and deserialize it back to the python object.

## Two important use cases of using pickle

> **_First Case:_**

You are working on a machine learning problem, using the favorite: _jupyter notebook_. You analyzed the data and identified principal features that will help your model. You perform feature engineering and you know that this will be the final data you will pass to the machine learning model. But feature engineering requires a lot of time and you do not want to lose the feature engineered data after you shut down the notebook.

> Here’s where Pickle comes in.

You just pass the feature engineered data to Pickle and save it in a binary format. Then load the pickled data back whenever you are ready to perform modeling.

**Dump the data(save it)**
```
#Code Example  
#Import the module  
import pickle#Do Some Feature Engineering  
feature\_engineered\_data = do\_feature\_engineering(data)#Dump it(save it in binary format)  
with open('fe\_data.pickle','wb') as fe\_data\_file:  
     pickle.dump(feature\_engineered\_data, fe\_data\_file)
```
**Load the data back**
```
#Code Example  
#Import the module  
import pickle#Load the data - No need to do Feature Engineering again  
with open('fe\_data.pickle','rb') as fe\_data\_file:  
     feature\_engineered\_data = pickle.load(fe\_data\_file)#Continue with your modeling
```
So what exactly is the advantage?

Feature engineering can be a heavy process and you do not want to redo it in case of a notebook shutdown. Hence, it is beneficial to store it in a reusable format.

Let’s say you manage separate notebooks for EDA, feature engineering, and modeling. Using Pickle, you can dump the data from one notebook and load it into a different notebook.

> **_Second Case:_**

A more popular case is to pickle the _machine learning model_ object.

You are done with feature engineering, modeling, and attained a pretty good accuracy, hurrah! Your job here is done, so you turn off the laptop and get a good night’s sleep.

The next morning you get another test set to test the model, but because you shut down your laptop, you have to train the model again, which takes 6 hours!

> Here’s where Pickle comes in again

Save the trained model and load the model back whenever you have new data to test it on.

**Dump the model**
```
#import module  
import pickle#Train the data  
model.fit(X\_train, X\_test)#Dump the model  
with open('fitted\_model.pickle','wb') as modelFile:  
     pickle.dump(model,modelFile)
```
**Load the model**
```
#import module  
import pickle#Load the model - No need to TRAIN it again(6 hours saved)  
with open('fitted\_model.pickle','rb') as modelFile:  
     model = pickle.load(modelFile)#Predict with the test set  
prediction = model.predict(X\_test)
```
> **You can use Pickle to save the final data and train it with multiple models, or you can save the model and test it on multiple data without training the model again. How great is that?**

What do you think of these use cases, and what other use cases can you think of? Let me know in the comments!

Thanks to Elliot Gunn and Anne Bonner