<h2 style="color:green" align="center">Machine Learning With Python: Save And Load Trained Model</h2>

In [1]:
import pandas as pd
import numpy as np
from sklearn import linear_model

In [2]:
df = pd.read_csv("homeprices.csv")
df.head()

Unnamed: 0,area,price
0,2600,550000
1,3000,565000
2,3200,610000
3,3600,680000
4,4000,725000


In [3]:
model = linear_model.LinearRegression()
model.fit(df[['area']],df.price)

In [4]:
model.coef_

array([135.78767123])

In [5]:
model.intercept_

180616.43835616432

In [7]:
model.predict([[5000]])

array([859554.79452055])

<h3 style='color:purple'>Save Model To a File Using Python Pickle</h3>

In [4]:
import pickle

In [7]:
with open('model_pickle.pkl','wb') as file:
    pickle.dump(model,file)

<h4 style='color:purple'>Load Saved Model</h4>

In [8]:
with open('model_pickle.pkl','rb') as file:
    mp = pickle.load(file)

In [9]:
mp.coef_

array([135.78767123])

In [10]:
mp.intercept_

180616.43835616432

In [11]:
mp.predict([[5000]])

array([859554.79452055])

<h3 style='color:purple'>Save Trained Model Using joblib</h3>

In [12]:
from sklearn.externals import joblib



In [13]:
joblib.dump(model, 'model_joblib')

['model_joblib']

<h4 style='color:purple'>Load Saved Model</h4>

In [14]:
mj = joblib.load('model_joblib')

In [15]:
mj.coef_

array([135.78767123])

In [16]:
mj.intercept_

180616.43835616432

In [17]:
mj.predict([[5000]])

array([859554.79452055])

**Joblib** and **pickle** are both libraries in Python used for serializing and deserializing Python objects, which is essential for saving machine learning models, configurations, and other data structures to disk and loading them back into memory. Here's a comparison of these two, along with other alternatives and their best use cases:

### Pickle
- **Description**: The `pickle` module is part of the Python standard library and is used for serializing and deserializing Python object structures.
- **Usage**: It converts a Python object hierarchy into a byte stream (serialization) and can recreate the object from the byte stream (deserialization).
- **Pros**:
  - Built into Python, no need for external installation.
  - Can serialize almost any Python object.
- **Cons**:
  - Not secure against erroneous or maliciously constructed data.
  - Slower compared to alternatives for large numerical data.
  - Issues with handling large data and parallel processing.

### Joblib
- **Description**: `joblib` is a library that provides utilities for lightweight pipelining in Python, particularly for numerical data. It extends `pickle` with features aimed at efficiently handling large data.
- **Usage**: Optimized for saving large numpy arrays and scikit-learn models.
- **Pros**:
  - Faster and more efficient for large numpy arrays compared to `pickle`.
  - Provides support for disk-cached pipelines and parallel processing.
- **Cons**:
  - Slightly larger file sizes compared to `pickle`.

### Other Alternatives

#### 1. **Dill**
- **Description**: `dill` extends `pickle` by adding the ability to serialize a wider variety of Python objects, including functions and classes.
- **Usage**: Can be used for more complex objects that `pickle` cannot handle.
- **Pros**:
  - More flexible than `pickle`.
  - Handles more types of objects.
- **Cons**:
  - Larger serialized file sizes.
  - Not as widely adopted as `pickle` or `joblib`.

#### 2. **MessagePack**
- **Description**: `msgpack` is a binary format that is more efficient and compact than JSON and can be used for serializing and deserializing data.
- **Usage**: Suitable for web applications where compactness and speed are crucial.
- **Pros**:
  - Efficient and fast.
  - Language-independent.
- **Cons**:
  - Not designed specifically for Python objects.
  - Requires conversion of custom Python objects.

#### 3. **HDF5 (h5py)**
- **Description**: HDF5 is a file format and set of tools for managing complex data. `h5py` is a Python interface to the HDF5 binary data format.
- **Usage**: Excellent for handling large datasets, particularly numerical data.
- **Pros**:
  - Highly efficient storage for large numerical datasets.
  - Supports complex hierarchies of data.
- **Cons**:
  - More complex API.
  - Not as straightforward for serializing arbitrary Python objects.

#### 4. **Feather (Apache Arrow)**
- **Description**: Feather is a fast, lightweight, and easy-to-use binary file format for storing data frames. It is a part of the Apache Arrow project.
- **Usage**: Ideal for pandas data frames.
- **Pros**:
  - Extremely fast read and write performance.
  - Language-independent.
- **Cons**:
  - Designed specifically for tabular data, not for arbitrary Python objects.

### Best Use Cases

- **Pickle**: General-purpose serialization, especially when simplicity and built-in support are needed. Best for smaller or simpler objects.
- **Joblib**: Best for saving and loading large numpy arrays and scikit-learn models, especially when performance is a concern.
- **Dill**: When you need to serialize more complex Python objects like functions or lambda expressions.
- **MessagePack**: When you need efficient and compact serialization for cross-language applications.
- **HDF5 (h5py)**: Ideal for managing and storing large numerical datasets, especially when hierarchical structure is needed.
- **Feather (Apache Arrow)**: Best for high-performance storage of pandas data frames.

Each of these tools has its own strengths and is best suited for different scenarios depending on the nature of the data and the specific requirements of the application.