# Model Packaging and Persistence

In this notebook, we'll learn how to **save, load, and package** machine learning models for reuse and deployment.

We’ll explore:
- Different model persistence methods
- Saving models using Pickle and Joblib
- Packaging models with MLflow for production

## Why Model Persistence Matters

After training a machine learning model, you often need to **reuse** it for predictions without retraining.
Model persistence allows you to:
- Save trained models to disk
- Load them later for inference
- Share models across systems and environments

In [ ]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

# Load dataset
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)

# Train a Random Forest model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

## 🧰 Method 1: Saving Model using Pickle

`pickle` is a Python library that serializes (saves) and deserializes (loads) Python objects.

In [ ]:
import pickle

# Save model
with open('rf_model.pkl', 'wb') as f:
    pickle.dump(model, f)

# Load model
with open('rf_model.pkl', 'rb') as f:
    loaded_model = pickle.load(f)

print("✅ Model loaded successfully.")

In [ ]:
# Verify the loaded model
y_pred = loaded_model.predict(X_test)
print("Sample predictions:", y_pred[:5])

## ⚡ Method 2: Using Joblib

`joblib` is optimized for saving **large NumPy arrays** and scikit learn models efficiently.

In [ ]:
import joblib

# Save the model
joblib.dump(model, 'rf_model.joblib')

# Load the model
loaded_joblib_model = joblib.load('rf_model.joblib')

print("✅ Joblib model loaded successfully.")

## 📦 Method 3: Using MLflow Model Packaging

MLflow allows models to be saved in a **standardized format**, making them easy to load and deploy across platforms.

MLflow automatically saves model metadata (version, environment, and dependencies).

In [ ]:
import mlflow
import mlflow.sklearn

# Log and save the model using MLflow
with mlflow.start_run(run_name="RF_Model_Packaging"):
    mlflow.sklearn.log_model(model, "random_forest_model")
    print("✅ Model logged with MLflow.")

### 🔍 Loading MLflow Models
MLflow models can be reloaded using the model URI path.

In [ ]:
# Example: load model from MLflow URI
# model_uri = 'runs:/<run_id>/random_forest_model'
# loaded_mlflow_model = mlflow.sklearn.load_model(model_uri)

## 🧾 Saving Model Metadata and Versioning

It’s good practice to save **model metadata**, such as:
- Training dataset details
- Model parameters
- Evaluation metrics

This ensures transparency and reproducibility.

In [ ]:
import json

metadata = {
    "model_name": "RandomForestClassifier",
    "version": "1.0",
    "accuracy": float(model.score(X_test, y_test)),
    "features": list(iris.feature_names)
}

with open('model_metadata.json', 'w') as f:
    json.dump(metadata, f, indent=4)

print("🗂️ Metadata saved successfully.")

## 🧩 Best Practices

1. Use **Joblib** for large scikit-learn models.
2. Store **model version and metadata** with every save.
3. Use **MLflow** for team-based tracking and deployment.
4. Maintain a **consistent file structure** for saved artifacts.
5. Test **model loading** in a separate environment before deployment.

## ✅ Summary

In this notebook, you learned how to:
- Persist models using Pickle and Joblib
- Log models with MLflow
- Save model metadata for reproducibility
- Prepare models for deployment

Next → `04-Model_Deployment_Basics.ipynb`, where we’ll deploy models using Flask and MLflow serving.