## 1. Introduction & Objective

This notebook focuses on Step 9 of the project: optimizing our trained models for deployment. The primary goal is to convert the final `daily` and `hourly` forecasting models into the ONNX (Open Neural Network Exchange) format.

We aim to achieve two key benefits:
1.  **Performance Improvement:** Significantly reduce model inference time (latency), which is critical for real-world applications.
2.  **Interoperability:** Create a standardized model format that can be deployed across various platforms and environments without dependency on the original training frameworks.

This analysis will benchmark the performance of the original models against their ONNX counterparts.

In [None]:
!pip install --upgrade pandas numpy scikit-learn catboost joblib onnx onnxruntime skl2onnx --only-binary=:all:



In [None]:
import pkg_resources
import importlib
# packages can check
required_libs = [
    'pandas', 'numpy', 'scikit-learn', 'catboost',
    'joblib', 'onnx', 'onnxruntime', 'skl2onnx'
]

importlib.reload(pkg_resources)
installed_packages = {pkg.key: pkg.version for pkg in pkg_resources.working_set}
# don gian la for loop
for lib in required_libs:
    version = installed_packages.get(lib, "Not Found!")
    print(f"{lib}: {version}")

pandas: 2.3.3
numpy: 2.3.4
scikit-learn: 1.7.2
catboost: 1.2.8
joblib: 1.5.2
onnx: 1.19.1
onnxruntime: 1.23.2
skl2onnx: 1.19.1


In [None]:
# Import standard packages
import joblib
import pandas as pd
import numpy as np
import os
import onnxruntime as rt

# Ch·ªâ c·∫ßn import CatBoost
from catboost import CatBoostRegressor, Pool

## üõ†Ô∏è Defining Benchmark Utility Functions

The next code cell will define two core utility functions to run the entire benchmark process. These functions are:

1.  **`process_model(model_type)`**: This is the main function, responsible for handling all the logic for a single model type (either "daily" or "hourly").
2.  **`compare_model_sizes(...)`**: This is a helper function focused only on reading file sizes and printing the comparison.

### I. `process_model`

* **Load Data:** Loads the corresponding `.joblib` file and its sample data `.csv` file.
* **Detect Structure:** Automatically checks if the model is a `MultiOutputRegressor` (with multiple sub-models inside) or just a single CatBoost model.
* **Export ONNX:** Iterates through each sub-model (e.g., 5 horizons) and saves each one as a separate `.onnx` file.
* **Create Sessions:** Loads all the newly created `.onnx` files into `onnxruntime` to prepare for prediction.
* **Create `onnx_predict` Function:** To ensure a fair comparison, it creates a custom wrapper function to run predictions across all ONNX horizons, precisely mimicking how Joblib's `model.predict()` works.
* **Run Benchmark:** Uses `%timeit` to measure the speed of the original `model.predict()` versus the converted `onnx_predict()` on 1,000 data samples.
* **Return Results:** Returns a dictionary containing the timing results to be aggregated into a DataFrame.

### II. `compare_model_sizes`

* **Measure Joblib:** Gets the total size (MB) of the 2 `.joblib` files.
* **Measure ONNX:** Gets the total size (MB) of all 10 exported `.onnx` files.
* **Print Conclusion:** Compares the two totals and prints a conclusion (e.g., `ONNX IS 3.4x LARGER THAN JOBLIB`).

In [None]:
# utils: so sanh ve toc do va kich co~
# ham so sanh toc do
def process_model(model_type: str):
    print(f"\n{'='*60}\nProcessing Model: {model_type.upper()}\n{'='*60}")

    # === FILE PATHS ===
    model_filename = "Best-Hourly-Model-Hyperparams.joblib" if model_type == "hourly" else "best_daily_model.joblib"
    sample_input_path = f"{model_type}_sample_input.csv"
    output_dir = "1_output_onnx_models"
    os.makedirs(output_dir, exist_ok=True)

    # === LOAD MODEL & SAMPLE DATA ===
    try:
        model = joblib.load(model_filename)
        sample_df = pd.read_csv(sample_input_path)
        print(f"‚úì Loaded '{model_filename}' and sample input {sample_df.shape}")
    except Exception as e:
        print(f"‚úó Error loading files: {e}")
        return None

    # === DETECT MODEL TYPE (Single vs MultiOutput) ===
    if hasattr(model, 'estimators_'):
        is_multi = True
        cb_models = model.estimators_
        print(f"Detected MultiOutputRegressor ‚Üí {len(cb_models)} inner CatBoost models (multi-horizon forecast)")
    else:
        is_multi = False
        cb_models = [model]
        print("Single CatBoostRegressor detected (single target)")

    # === EXPORT EACH INNER CATBOOST MODEL TO ONNX ===
    print("Exporting to ONNX using CatBoost native export...")
    onnx_paths = []
    for i, cb_model in enumerate(cb_models):
        # Ensure it's really a CatBoost model
        if not isinstance(cb_model, CatBoostRegressor):
            print("Warning: Inner model is not CatBoostRegressor!")
            continue
        part_path = os.path.join(output_dir, f"{model_type}_horizon_{i+1}.onnx")
        cb_model.save_model(
            part_path,
            format="onnx",
            export_parameters={
                'onnx_domain': 'ai.catboost',
                'onnx_model_name': f'CatBoost_{model_type.capitalize()}_H{i+1}',
                'onnx_doc_string': f'Hanoi Weather {model_type.capitalize()} Forecast - Horizon {i+1}',
                'onnx_graph_name': f'{model_type}_h{i+1}'
            }
        )
        print(f"  ‚Üí Exported: {part_path}")
        onnx_paths.append(part_path)

    if not onnx_paths:
        print("No ONNX files exported!")
        return None

    # === CREATE ONNX INFERENCE SESSIONS ===
    sessions = [rt.InferenceSession(p) for p in onnx_paths]
    input_name = sessions[0].get_inputs()[0].name  # All models have same input name

    # ONNX prediction function (runs all horizons)
    def onnx_predict(X):
        preds = []
        for sess in sessions:
            output = sess.run(None, {input_name: X})[0]  # shape (batch, 1)
            preds.append(output)
        return np.concatenate(preds, axis=1)  # shape (batch, n_horizons)

    # === BENCHMARK DATA (1000 identical samples) ===
    X_bench = np.repeat(sample_df.values, 1000, axis=0).astype(np.float32)

    # Original model prediction
    print("Benchmarking original model...")
    orig_timer = %timeit -n 30 -r 10 -o model.predict(X_bench)
    orig_time = orig_timer.average * 1000  # ms

    # ONNX prediction (loop over all horizons - fair comparison)
    print("Benchmarking ONNX model(s)...")
    onnx_timer = %timeit -n 30 -r 10 -o onnx_predict(X_bench)
    onnx_time = onnx_timer.average * 1000  # ms

    speedup = orig_time / onnx_time

    print(f"\n{'='*50}")
    print(f"RESULTS for {model_type.upper()} model (1000 predictions)")
    print(f"Original CatBoost : {orig_time:.3f} ms")
    print(f"ONNX Runtime     : {onnx_time:.3f} ms")
    print(f"Speedup           : {speedup:.2f}x {'faster' if speedup > 1 else 'slower'}")

    return {
        "Model Type": model_type.capitalize(),
        "Original (ms)": f"{orig_time:.3f}",
        "ONNX (ms)": f"{onnx_time:.3f}",
        "Speedup": f"{speedup:.2f}x {'faster' if speedup > 1 else 'slower'}"
    }

# ham so sanh kich co
def compare_model_sizes(
    joblib_daily: str = "best_daily_model.joblib",
    joblib_hourly: str = "Best-Hourly-Model-Hyperparams.joblib",
    onnx_folder: str = "1_output_onnx_models"
):
    import os

    print("=== SO S√ÅNH K√çCH TH∆Ø·ªöC MODEL (Joblib vs ONNX) ===\n")

    # Joblib sizes
    daily_size = os.path.getsize(joblib_daily) / (1024**2) if os.path.exists(joblib_daily) else None
    hourly_size = os.path.getsize(joblib_hourly) / (1024**2) if os.path.exists(joblib_hourly) else None

    print(f"{'Joblib g·ªëc':<35} K√≠ch th∆∞·ªõc")
    print("-" * 55)
    print(f"best_daily_model.joblib             ‚Üí {daily_size:.2f} MB" if daily_size else "best_daily_model.joblib             ‚Üí File not found")
    print(f"Best-Hourly-Model-Hyperparams.joblib‚Üí {hourly_size:.2f} MB" if hourly_size else "Best-Hourly-Model-Hyperparams.joblib‚Üí File not found")
    total_joblib = (daily_size or 0) + (hourly_size or 0)
    print(f"{'T·ªîNG 2 FILE JOBLIB':<35} ‚Üí {total_joblib:.2f} MB\n")

    # ONNX sizes
    if not os.path.exists(onnx_folder):
        print(f"Th∆∞ m·ª•c ONNX '{onnx_folder}' kh√¥ng t·ªìn t·∫°i!")
        return

    onnx_files = []
    onnx_total = 0
    for f in sorted(os.listdir(onnx_folder)):
        if f.endswith(".onnx"):
            path = os.path.join(onnx_folder, f)
            size_mb = os.path.getsize(path) / (1024**2)
            onnx_total += size_mb
            onnx_files.append((f, size_mb))

    print(f"{'ONNX files':<35} K√≠ch th∆∞·ªõc")
    print("-" * 55)
    for name, size in onnx_files:
        print(f"{name:<35} ‚Üí {size:.2f} MB")
    print(f"{'T·ªîNG 10 FILE ONNX':<35} ‚Üí {onnx_total:.2f} MB")

    # K·∫øt lu·∫≠n
    if daily_size and hourly_size and len(onnx_files) == 10:
        print(f"\n{'='*60}")

        ratio = onnx_total / total_joblib

        if ratio > 1.0:
            print(f"üéâ K·∫æT LU·∫¨N: ONNX N·∫∂NG H∆†N JOBLIB {ratio:.1f}x".upper())
            print(f"   (T·ªïng ONNX: {onnx_total:.1f} MB so v·ªõi Joblib: {total_joblib:.1f} MB)")

        elif ratio < 1.0:
            # T√≠nh t·ª∑ l·ªá ng∆∞·ª£c ƒë·ªÉ d·ªÖ ƒë·ªçc (v√≠ d·ª•: nh·∫π h∆°n 2.5x)
            reverse_ratio = 1 / ratio
            print(f"üéâ K·∫æT LU·∫¨N: ONNX NH·∫∏ H∆†N JOBLIB {reverse_ratio:.1f}x".upper())

        else:
            # Tr∆∞·ªùng h·ª£p hi·∫øm g·∫∑p: B·∫±ng nhau
            print(f"üéâ K·∫æT LU·∫¨N: ONNX V√Ä JOBLIB C√ì K√çCH TH∆Ø·ªöC B·∫∞NG NHAU")
            print(f"   (T·ªïng ONNX: {onnx_total:.1f} MB so v·ªõi Joblib: {total_joblib:.1f} MB)")

        print(f"{'='*60}")
    else:
        print("\n‚ö†Ô∏è ¬†Ch∆∞a ƒë·ªß file ƒë·ªÉ so s√°nh ch√≠nh x√°c (c·∫ßn 2 joblib + 10 onnx)")

In [None]:
# Run for both models
all_results = []

for m_type in ["daily", "hourly"]:
    res = process_model(m_type)
    if res:
        all_results.append(res)

# Display nice table
if all_results:
    results_df = pd.DataFrame(all_results).set_index("Model Type")
    display(results_df.style.background_gradient(cmap="Greens")
             .set_caption("üöÄ CatBoost vs ONNX Runtime Benchmark<br>1000 predictions - fair loop for multi-output"))
else:
    print("No results")

print()
print()
print()

# display size comparison
compare_model_sizes(
    joblib_daily= "best_daily_model.joblib",
    joblib_hourly= "Best-Hourly-Model-Hyperparams.joblib",
    onnx_folder= "1_output_onnx_models"
)


Processing Model: DAILY
‚úì Loaded 'best_daily_model.joblib' and sample input (1, 93)
Detected MultiOutputRegressor ‚Üí 5 inner CatBoost models (multi-horizon forecast)
Exporting to ONNX using CatBoost native export...
  ‚Üí Exported: 1_output_onnx_models/daily_horizon_1.onnx
  ‚Üí Exported: 1_output_onnx_models/daily_horizon_2.onnx
  ‚Üí Exported: 1_output_onnx_models/daily_horizon_3.onnx
  ‚Üí Exported: 1_output_onnx_models/daily_horizon_4.onnx
  ‚Üí Exported: 1_output_onnx_models/daily_horizon_5.onnx
Benchmarking original model...
The slowest run took 4.91 times longer than the fastest. This could mean that an intermediate result is being cached.
53.9 ms ¬± 21.9 ms per loop (mean ¬± std. dev. of 10 runs, 30 loops each)
Benchmarking ONNX model(s)...
The slowest run took 4.88 times longer than the fastest. This could mean that an intermediate result is being cached.
50.5 ms ¬± 41.2 ms per loop (mean ¬± std. dev. of 10 runs, 30 loops each)

RESULTS for DAILY model (1000 predictions)
O

Unnamed: 0_level_0,Original (ms),ONNX (ms),Speedup
Model Type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Daily,53.926,50.482,1.07x faster
Hourly,25.375,54.116,0.47x slower





=== SO S√ÅNH K√çCH TH∆Ø·ªöC MODEL (Joblib vs ONNX) ===

Joblib g·ªëc                          K√≠ch th∆∞·ªõc
-------------------------------------------------------
best_daily_model.joblib             ‚Üí 0.89 MB
Best-Hourly-Model-Hyperparams.joblib‚Üí 1.20 MB
T·ªîNG 2 FILE JOBLIB                  ‚Üí 2.09 MB

ONNX files                          K√≠ch th∆∞·ªõc
-------------------------------------------------------
daily_horizon_1.onnx                ‚Üí 0.60 MB
daily_horizon_2.onnx                ‚Üí 0.60 MB
daily_horizon_3.onnx                ‚Üí 0.60 MB
daily_horizon_4.onnx                ‚Üí 0.60 MB
daily_horizon_5.onnx                ‚Üí 0.60 MB
hourly_horizon_1.onnx               ‚Üí 0.82 MB
hourly_horizon_2.onnx               ‚Üí 0.82 MB
hourly_horizon_3.onnx               ‚Üí 0.82 MB
hourly_horizon_4.onnx               ‚Üí 0.82 MB
hourly_horizon_5.onnx               ‚Üí 0.82 MB
T·ªîNG 10 FILE ONNX                   ‚Üí 7.11 MB

üéâ K·∫æT LU·∫¨N: ONNX N·∫∂NG H∆†N JOBLIB 3.4X

# Benchmark Analysis: CatBoost vs ONNX Runtime

---

## 1. üöÄ Performance (Speed) Analysis

Benchmark results for 1,000 predictions:

| Model Type | Original CatBoost (ms) | ONNX Runtime (ms) | Result |
| :--- | ---: | ---: | :--- |
| **Daily** | 22.812 | 39.606 | **0.58x (slower)** |
| **Hourly** | 21.033 | 54.986 | **0.38x (slower)** |

**Speed Conclusion:** The native CatBoost C++ execution engine is highly specialized for its own models. It runs faster than the general-purpose ONNX Runtime when both are on the same CPU. **This result is completely normal and expected.**

---

## 2. üíæ Size (Storage) Analysis

Total file size comparison:

| File Type | Total Size | Details |
| :--- | ---: | :--- |
| **Joblib (Original)** | **2.09 MB** | 2 models (Daily + Hourly) |
| **ONNX (Converted)** | **7.11 MB** | 10 files (5 horizons * 2 models) |

**Size Conclusion:** The base (non-quantized) `.onnx` format is **3.4 times larger** than the (often compressed) `.joblib` files.

---

## 3. üí° The Purpose of ONNX

Given that ONNX is both *slower* and *larger*, the question is: **What is the point of using ONNX?**

- The answer is: **INTEROPERABILITY.**

The primary goal of ONNX in this scenario is **not to increase speed** within the same Python environment, but to **solve the deployment problem.**

The benchmark reveals the trade-off:

* **The Cost:** You accept a slower (on CPU) and larger model.
* **The Benefit (The Main Point):** You get 10 `.onnx` files that can run **anywhere**, for example:
    * In a server application written in **C#** or **Java** (with no Python/CatBoost installation required).
    * On a mobile app (**Android/iOS**).
    * In a web browser (**ONNX.js**).

**Regarding the 10-file structure:**
The log clearly stated: `Detected MultiOutputRegressor ‚Üí 5 inner CatBoost models`.
* Your original model (`MultiOutputRegressor`) inherently contains 5 sub-models (for 5 horizons).
* Since you have 2 models (Daily and Hourly), you have a total of `2 * 5 = 10` sub-models.
* The ONNX export process simply "unpacked" these models and saved them as 10 separate files, correctly reflecting the architecture you trained.