## Chapter 17 Save and Load Machine Learning Models

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

#### 1. Finalize Model with pickle

Pickle is the standard way of serializing objects in Python. You can use the pickle1 operation to serialize your machine learning algorithms and save the serialized format to a file. Later you
can load this file to deserialize your model and use it to make new predictions.

In [2]:
# load data
filename = 'data/pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
df = pd.read_csv(filename, names=names)
print(df.shape)
df.head()

(768, 9)


Unnamed: 0,preg,plas,pres,skin,test,mass,pedi,age,class
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [3]:
# Save model using pickle
from pickle import dump, load

array = df.values
X = array[:, :-1]
Y = array[:, -1]
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33, random_state=7)
# fit the model
model = LogisticRegression(max_iter=200)
model.fit(X_train, Y_train)
# save the model ot disk
filename = 'finalized_model.sav'
dump(model, open(filename, 'wb')) # write in binary mode

In [4]:
# load the model from disk
loaded_model = load(open(filename, 'rb'))
result = loaded_model.score(X_test, Y_test)
print(result)

0.7874015748031497


#### 2. Finalize Model with Joblib

The Joblib2 library is part of the SciPy ecosystem and provides utilities for pipelining Python jobs. It provides utilities for saving and loading Python objects that make use of NumPy data
structures, eﬃciently3. This can be useful for some machine learning algorithms that require a lot of parameters or store the entire dataset (e.g. k-Nearest Neighbors).

In [5]:
# Save model using joblib
from joblib import dump, load

model = LogisticRegression(max_iter=200)
model.fit(X_train, Y_train)
# save the model to disk
filename = 'finalized_model.sav'
dump(model, filename)

['finalized_model.sav']

In [6]:
# load the model from disk
loaded_model = load(filename)
result = loaded_model.score(X_test, Y_test)
print(result)

0.7874015748031497
