# Save and Load Machine Learning Models in Python with `scikit-learn`

## Save Your Model with `Pickle`

In [14]:
import joblib as jl
import pandas as pd
import pickle as pk
from sklearn import model_selection
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [15]:
# load the csv
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv'
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
dataframe = pd.read_csv(url, names=names)

In [16]:
# split target data from main df
array = dataframe.values
X = array[:, 0:8]
Y = array[:, 8]
test_size = 0.33
seed = 7

In [17]:
# split data into test and train
Xtrain, Xtest, Ytrain, Ytest = train_test_split(X, Y, test_size=test_size, random_state=seed)

# fit the model on training set
model = LogisticRegression()
model.fit(Xtrain, Ytrain)

# save the model to disk
filename = 'finalized_model.sav'
pk.dump(model, open(filename, 'wb'))

# ======= main work here ======= #

# load the model from disk
loaded_model = pk.load(open(filename, 'rb'))
result = loaded_model.score(Xtest, Ytest)
print(result)

0.7874015748031497


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


## Save Your Model with `joblib`

In [19]:
# save the model to disk
filename = 'finalized_model.sav'
jl.dump(model, filename)

# ======= main work here ======= #

# load the model from disk
loaded_model = jl.load(filename)
result = loaded_model.score(Xtest,Ytest)
print(result)

0.7874015748031497


# Tips for Saving Your Model

- `python version`. Take note of the python version. You almost certainly require the same major (and maybe minor) version of Python used to serialize the model when you later load it and deserialize it.

- `library version`. The version of all major libraries used in your machine learning project almost certainly need to be the same when deserializing a saved model. This is not limited to the version of `NumPy` and the version of `scikit-learn`.

- `manual serialization`. You might like to manually output the parameters of your learned model so that you can use them directly in `scikit-learn` or another platform in the future. Often, the algorithms used by machine learning algorithms to make predictions are a lot simpler than those used to learn the parameters and may be easy to implement in custom code that you have control over.