In [1]:
import joblib

`joblib` is a popular library in Python for efficiently **saving and loading** large data objects, particularly NumPy arrays and scikit-learn models. It is often used to save trained machine learning models to disk and load them later for inference or further training. Below is a step-by-step guide on how to use `joblib` to save and load trained models.

### Installation
First, ensure you have `joblib` installed. You can install it using pip:

```bash
pip install joblib
```

### Saving a Trained Model
After training a model (e.g., using scikit-learn), you can save it to a file using `joblib.dump`.

```python
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import joblib

# Load dataset and split into training and testing sets
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Train a model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Save the trained model to a file
joblib.dump(model, 'trained_model.pkl')
```

### Loading a Saved Model
To load the saved model from the file, use `joblib.load`.

```python
import joblib

# Load the model from the file
loaded_model = joblib.load('trained_model.pkl')

# Use the loaded model for prediction
predictions = loaded_model.predict(X_test)
print(predictions)
```

### Key Points
1. **File Extension**: While `.pkl` is commonly used, you can use any file extension (or none). `joblib` doesn't enforce a specific extension.
2. **Compression**: `joblib.dump` supports compression to reduce file size. You can specify compression methods like `gzip`, `bz2`, or `lzma`.

   ```python
   joblib.dump(model, 'trained_model.pkl.gz', compress='gzip')
   ```

3. **Efficiency**: `joblib` is optimized for large NumPy arrays, making it faster and more efficient than Python's built-in `pickle` for scikit-learn models.
4. **Cross-Platform**: Saved models can be loaded on different machines or platforms, as long as the same version of the libraries is used.

### Example with Compression
Here’s an example of saving and loading a model with compression:

```python
# Save with compression
joblib.dump(model, 'trained_model.pkl.gz', compress='gzip')

# Load the compressed model
loaded_model = joblib.load('trained_model.pkl.gz')
```

### Notes
- Ensure compatibility between the versions of `joblib`, scikit-learn, and other libraries used during training and loading.
- If you encounter issues with large models, consider using `joblib`'s `dump` and `load` with memory-mapped arrays for better performance.
-  when loading a trained model using `joblib` (or `pickle`), **you must ensure that all custom classes, functions, and dependencies** used during the training of the model are available in the environment where you are loading the model. This is because the serialization process (saving the model) stores the structure of the object but not the actual code. When deserializing (loading the model), Python needs access to the original code to reconstruct the object.

### Why Are Custom Classes/Functions Required?
When you save a trained model, `joblib` serializes the model's parameters, architecture, and other necessary data. However, it does not save the actual Python code (e.g., custom classes, functions, or transformers) that defines the model or any preprocessing steps. Therefore, when you load the model, Python needs access to the original code to properly reconstruct the object.

### Steps to Load and Use a Trained Model with Custom Dependencies
1. **Ensure All Dependencies Are Available**:
   - Import all custom classes, functions, and libraries that were used during the training process.
   - Ensure that the versions of the libraries (e.g., scikit-learn, TensorFlow, etc.) are compatible with the ones used during training.

2. **Load the Model**:
   - Use `joblib.load` to load the saved model.

3. **Make Predictions**:
   - Use the loaded model to make predictions.

### Example Scenario
Suppose you trained a model that uses a custom preprocessing class and a custom metric function. Here's how you would handle loading and using the model:

#### Custom Code Used During Training
```python
from sklearn.base import BaseEstimator, TransformerMixin
import numpy as np

# Custom preprocessing class
class CustomScaler(BaseEstimator, TransformerMixin):
    def __init__(self, factor=1.0):
        self.factor = factor

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X * self.factor

# Custom metric function
def custom_metric(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))
```

#### Training and Saving the Model
```python
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import joblib

# Load dataset
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2, random_state=42)

# Create a pipeline with the custom scaler
pipeline = Pipeline([
    ('scaler', CustomScaler(factor=2.0)),
    ('model', RandomForestClassifier(random_state=42))
])

# Train the model
pipeline.fit(X_train, y_train)

# Save the trained model
joblib.dump(pipeline, 'trained_model.pkl')
```

#### Loading and Using the Model
```python
import joblib
import numpy as np
from sklearn.base import BaseEstimator, TransformerMixin

# Re-define the custom classes/functions (or import them from the original module)
class CustomScaler(BaseEstimator, TransformerMixin):
    def __init__(self, factor=1.0):
        self.factor = factor

    def fit(self, X, y=None):
        return self

    def transform(self, X):
        return X * self.factor

def custom_metric(y_true, y_pred):
    return np.mean(np.abs(y_true - y_pred))

# Load the trained model
pipeline = joblib.load('trained_model.pkl')

# Make predictions
predictions = pipeline.predict(X_test)
print(predictions)
```

### Key Points to Remember
1. **Recreate the Environment**:
   - Ensure that the environment where you load the model has all the necessary dependencies, including custom classes, functions, and libraries.

2. **Version Compatibility**:
   - Use the same versions of libraries (e.g., scikit-learn, TensorFlow) as were used during training to avoid compatibility issues.

3. **Organize Code**:
   - If you have many custom classes or functions, consider organizing them into a separate module and importing them during both training and inference.

4. **Error Handling**:
   - If you forget to import a custom class or function, you'll get an error like:
     ```
     AttributeError: Can't get attribute 'CustomScaler' on <module '__main__'>
     ```
     This indicates that Python cannot find the required class or function.

By ensuring that all dependencies are available, you can successfully load and use your trained model for predictions.

By following these steps, you can easily save and load trained machine learning models using `joblib`.