# Scikit-Learn course 7

## VI. Saving and loading trained machine learning models

## 0. Build machine learning model

In [27]:
from sklearn.metrics import (accuracy_score,
                            precision_score, 
                            recall_score,
                            f1_score)

In [28]:
def evaluate_preds(y_true, y_preds):
    """
    Performs evaluation comparison on y_true labels vs. y_pred labels.
    """
    accuracy = accuracy_score(y_true, y_preds)
    precision = precision_score(y_true, y_preds)
    recall = recall_score(y_true, y_preds)
    f1 = f1_score(y_true, y_preds)
    metric_dict = {"accuracy": round(accuracy, 2),
                   "precision": round(precision, 2), 
                   "recall": round(recall, 2),
                   "f1": round(f1, 2)}
    print(f"Acc: {accuracy * 100:.2f}%")
    print(f"Precision: {precision:.2f}")
    print(f"Recall: {recall:.2f}")
    print(f"F1 score: {f1:.2f}")

    return metric_dict

In [29]:
import pandas as pd 
import numpy as np

np.random.seed(42)

heart_disease = pd.read_csv("data/heart-disease.csv")
X = heart_disease.drop("target", axis=1)
y = heart_disease.target

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_train, y_train)
y_preds = clf.predict(X_test)
evaluate_preds(y_test, y_preds)

Acc: 82.89%
Precision: 0.83
Recall: 0.85
F1 score: 0.84


{'accuracy': 0.83, 'precision': 0.83, 'recall': 0.85, 'f1': 0.84}

## 1. Saving and loading a model with pickle
<br>
https://docs.python.org/3/library/pickle.html

We'll use pickle's `dump()` function and pass it our model, along with the `open()` function containing a string for the filename we want to save our model as, along with the `"wb"` string which stands for "write binary", which is the file type `open()` will write our model as.

In [30]:
import pickle

# Save an existing model to file
pickle.dump(clf, open("pkl_random_forest_model_1.pkl", "wb"))

Once it's saved, we can import it using pickle's `load()` function, passing it `open()` containing the filename as a string and `"rb"` standing for "read binary".

In [31]:
# Load a saved model
loaded_pickle_model = pickle.load(open("pkl_random_forest_model_1.pkl", "rb"))

In [32]:
# Make predictions and evaluate the loaded model
np.random.seed(42)
pickle_y_preds = loaded_pickle_model.predict(X_test)
evaluate_preds(y_test, pickle_y_preds)

Acc: 82.89%
Precision: 0.83
Recall: 0.85
F1 score: 0.84


{'accuracy': 0.83, 'precision': 0.83, 'recall': 0.85, 'f1': 0.84}

## 2. Saving and loading a model with joblib
<br>
https://joblib.readthedocs.io/en/latest/persistence.html

The other way to load and save models is with joblib. Which works relatively the same as pickle.

To save a model, we can use joblib's `dump()` function, passing it the model (clf) and the desired filename.

In [33]:
from joblib import dump, load

# Save a model to file
dump(clf, filename="joblib_random_forest_model_1.joblib") 

['joblib_random_forest_model_1.joblib']

Once you've saved a model using `dump()`, you can import it using `load()` and passing it the filename of the model.

In [34]:
# Import a saved joblib model
loaded_joblib_model = load(filename="joblib_random_forest_model_1.joblib")

In [35]:
# Make and evaluate joblib predictions 
joblib_y_preds = loaded_joblib_model.predict(X_test)
evaluate_preds(y_test, joblib_y_preds)

Acc: 82.89%
Precision: 0.83
Recall: 0.85
F1 score: 0.84


{'accuracy': 0.83, 'precision': 0.83, 'recall': 0.85, 'f1': 0.84}

Which one should you use, `pickle` or `joblib` ?

According to Scikit-Learn's documentation, they suggest it may be more efficient to use `joblib` as it's more efficient with large numpy array (which is what may be contained in trained/fitted Scikit-Learn models).

Either way, they both function fairly similar so deciding on which one to use, shouldn't cause too much of an issue.