<h2 style="color:green" align="center">Machine Learning With Python: Save And Load Trained Model</h2>

In [1]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression

In [2]:
df = pd.read_csv("homeprices.csv")
df.head()

Unnamed: 0,area,price
0,2600,550000
1,3000,565000
2,3200,610000
3,3600,680000
4,4000,725000


In [3]:
model = LinearRegression()
model.fit(df[['area']], df.price)

LinearRegression()

In [4]:
model.coef_

array([135.78767123])

In [5]:
model.intercept_

180616.43835616432

In [6]:
model.predict([[5000]])

array([859554.79452055])

<h3 style='color:purple'>Save Model To a File Using Python Pickle</h3>

In [7]:
import pickle

In [8]:
with open('model_pickle','wb') as file:
    pickle.dump(model,file)

<h4 style='color:purple'>Load Saved Model</h4>

In [9]:
with open('model_pickle','rb') as file:
    mp = pickle.load(file)

In [10]:
mp.coef_

array([135.78767123])

In [11]:
mp.intercept_

180616.43835616432

In [12]:
mp.predict([[5000]])

array([859554.79452055])

<h3 style='color:purple'>Save Trained Model Using joblib</h3>

In [13]:
import joblib

In [14]:
joblib.dump(model, 'model_joblib')

['model_joblib']

<h4 style='color:purple'>Load Saved Model</h4>

In [15]:
mj = joblib.load('model_joblib')

In [16]:
mj.coef_

array([135.78767123])

In [17]:
mj.intercept_

180616.43835616432

In [18]:
mj.predict([[5000]])

array([859554.79452055])

## Pickle VS Joblib
* **joblib is usually significantly faster on large numpy arrays** because it has a special handling for the array buffers of the numpy datastructure. It can also compress that data on the fly while pickling using zlib or lz4.
* **joblib also makes it possible to memory map** the data buffer of an uncompressed joblib-pickled numpy array when loading it which makes it possible to share memory between processes.
* **if you don't pickle large numpy arrays, then regular pickle can be significantly faster, especially on large collections of small python objects** (e.g. a large dict of str objects) because the pickle module of the standard library is implemented in C while joblib is pure python.
* since PEP 574 (Pickle protocol 5) has been merged in Python 3.8, it is now much more efficient (memory-wise and cpu-wise) to pickle large numpy arrays using the standard library. Large arrays in this context means 4GB or more.
* But **joblib can still be useful with Python 3.8 to load objects that have nested numpy arrays** in memory mapped mode with <pre>mmap_mode="r"</pre>