# Save and Load Machine Learning Models
1. The importance of serializing models for reuse.
2. How to use pickle to serialize and deserialize machine learning models.
3. How to use Joblib to serialize and deserialize machine learning models.

Pickle is the standard way of serializing objects in Python. You can use the pickle operation
to serialize your machine learning algorithms and save the serialized format to a file. Later you
can load this file to deserialize your model and use it to make new predictions.

In [1]:
# Pima Indians Diabetes Dataset
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from pickle import dump
from pickle import load

#Loading dataset
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv('pima-indians-diabetes.data',names=names)

# separate array into input and output components
X = df.drop('class',axis='columns')
Y = df['class']

In [2]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=7)
# Fit the model on 33%
model = LogisticRegression()
model.fit(X_train, Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [3]:
dump(model, open('finalized_model.sav', 'wb'))

In [4]:
# some time later...


# load the model from disk
loaded_model = load(open('finalized_model.sav', 'rb'))

In [5]:
result = loaded_model.score(X_test, Y_test)
print(result)

0.755905511811


# Finalize Your Model with Joblib
The Joblib2 library is part of the SciPy ecosystem and provides utilities for pipelining Python
jobs.

It provides utilities for saving and loading Python objects that make use of NumPy data
structures, efficiently3.

This can be useful for some machine learning algorithms that require a
lot of parameters or store the entire dataset (e.g. k-Nearest Neighbors).



In [8]:
from sklearn.externals.joblib import dump as dp
from sklearn.externals.joblib import load as ld

In [7]:
# Fit the model on 33%
model1 = LogisticRegression()
model1.fit(X_train, Y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
          verbose=0, warm_start=False)

In [9]:
dp(model1,'joblibsaving.sav')

['joblibsaving.sav']

In [10]:
job_model = ld('joblibsaving.sav')

In [11]:
job_model.score(X_test,Y_test)

0.75590551181102361