# 1. Description
### Usage of Joblib and Pickle
#### Saving the trained Models:
* After training a machine learning model, you can save the model to disk. This allows you to load the model later and use it for making predictions without retraining.
* Save the trained machine learning model for future use.
* Training **machine learning model** can be quite time consuming if training dataset is very big. In this case it makes sense to **train a model and save it to a file** so that later on while making **predictions** you can just load that model from a file and you **don't need to train** it every time.
* **Pickle and sklearn's joblib modules** can be used for this purpose.
* **Joblib** seems to be more efficient with big **numpy arrays** hence it is preferred when you have many **numpy** objects involved in your training step.


# 2. Import libraries

In [1]:
# import necessary required libraries
import pandas as pd
import numpy as np
from sklearn import linear_model

# 3. Load dataset

In [2]:
# load the raw dataset from the csv file
url = "https://raw.githubusercontent.com/akdubey2k/ML/main/ML_5_Save_Model_Using_Joblib_And_Pickle/ML_5_homeprices_Save_Model_Using_Joblib_And_Pickle.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,area,price
0,2600,550000
1,3000,565000
2,3200,610000
3,3600,680000
4,4000,725000


# 4. Data preprocessing

In [3]:
# independent variable
X = df.drop('price',axis='columns')
X.head()

Unnamed: 0,area
0,2600
1,3000
2,3200
3,3600
4,4000


In [4]:
# dependent variable
y = df.price
y.head()

0    550000
1    565000
2    610000
3    680000
4    725000
Name: price, dtype: int64

# 5. Create and train the model

In [5]:
lr_model = linear_model.LinearRegression()
# independent variable, dependent variable
# lr_model.fit(df[['area']], df.price)
lr_model.fit(X,y)

# 6. Validate the model

### predict the price of home with 5000 sq. ft. from trained model

In [6]:
lr_model.predict([[5000]])



array([859554.79452055])

In [7]:
coef = lr_model.coef_

In [8]:
inter = lr_model.intercept_

### Using linear regression to predict the house price
dependent variable = slope/coefficient * independent variable + intercept<br>
**y = m * X + b**

In [9]:
# 135.78767123 * 5000 + 180616.43835616432
coef * 5000 + inter

array([859554.79452055])

# 7. Evaluate the model

In [10]:
# lr_model.score(df[['area']], df['price'])
lr_model.score(X,y)

0.9584301138199486

# 8. Save Model To a File Using Python Pickle.
**Pickle** is a module in Python used for **_serializing and de-serializing_** Python objects.
* **Pickle** converts Python objects like **lists, dictionaries, etc. into byte streams (zeroes and ones).**
* The byte streams can be convert back into Python objects through a process called **unpickling.**
* The **pickle module** implements **binary protocols** for **_serializing and de-serializing_** a Python object structure.
* **“Pickling”** is the process whereby a <u>_Python object hierarchy is converted into a byte stream,_</u>
* **“unpickling”** is the <u>_inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy._</u>
* **Pickling (and unpickling)** is alternatively known as **“serialization”, “marshalling,” or “flattening”;** however, to avoid confusion, the terms used here are **“pickling” and “unpickling”.**

## 8.1 Import pickle library

In [11]:
import pickle

## 8.2 Open the file and dump trained model onto it

In [12]:
with open('pickle_model','wb') as file:
    pickle.dump(lr_model, file) # source, destination

## 8.3 Load the saved Model using "Pickle object"

In [13]:
with open('pickle_model','rb') as file:
    p_model = pickle.load(file)

# 9. Validate with trained and saved model

In [14]:
p_model.predict([[5000]])



array([859554.79452055])

In [15]:
p_model.coef_

array([135.78767123])

In [16]:
p_model.intercept_

180616.43835616432

# 10. Evaluate with trained and saved model

In [17]:
p_model.score(df[['area']],df.price)

0.9584301138199486

# 11. Save Model To a File Using Sklearn's Joblib

It may be better to use **joblib** replacement of **_pickle (dump & load),_** which is more efficient on objects that carry **large numpy arrays** internally as is often the case for fitted __scikit-learn__ estimators,

## 11.1 Import Sklearn's Joblib library

In [18]:
import joblib

## 11.2 Dump the trained model onto "Joblib" file object

In [19]:
joblib.dump(lr_model,'model_joblib') # source, destination

['model_joblib']

## 11.3 Load the Saved Model using "Joblib" object

In [20]:
j_model = joblib.load('model_joblib')

# 12. Validate with trained and saved model

In [21]:
j_model.predict([[5000]])



array([859554.79452055])

In [22]:
j_model.intercept_

180616.43835616432

In [23]:
j_model.coef_

array([135.78767123])

# 13. Evaluate with trained and saved model

In [24]:
j_model.score(df[['area']],df.price)

0.9584301138199486