In [1]:
!pip install matplotlib openpyxl scikit-learn lightgbm xgboost catboost  --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
import pandas as pd
import numpy as np

## **Brief Overview about ONNX**

The application of deep learning models in products is equally important and challenging, alongside the research into more accurate and faster deep learning models. A major difficulty lies in converting a model from one framework to another, as each library uses distinct functions and data types.

For example, when conducting research and experimentation, researchers often use PyTorch because of its ease of use and its popularity within the research community, which facilitates reference and collaboration. Yet, for product deployment, certain tools may exclusively support TensorFlow, necessitating the conversion of models from PyTorch to TensorFlow.

To address this, we require a standard data format for both the functions and the data types involved in the conversion. ONNX (Open Neural Network Exchange) is the key that can solve all these issues.

**ONNX**, which stands for **Open Neural Network Exchange**, serves as an intermediary tool that facilitates the conversion of machine learning models from various frameworks into a standard format provided by ONNX, thereby enabling easy transitions between different frameworks. ONNX supports conversions among many popular frameworks today, including Keras, TensorFlow, Scikit-learn, PyTorch, and XGBoost.

### ONNX's Mechanism: The Key to Interoperability

ONNX achieves this seamless conversion through three core elements:

1.  **Standardized Graph Representation:** Since each framework has its own unique computational graph representation, ONNX provides a **standard graph**. This graph is expressed using multiple computational **nodes (operators)** that can represent the graphs of all supported frameworks.
2.  **Standardized Data Types:** ONNX offers standard data types, such as $\text{int}8$, $\text{int}16$, $\text{float}16$, and others.
3.  **Standardized Operators (Functions):** ONNX provides a set of standard functions (operators) that can be mapped to their corresponding functions in the target framework. For example, the softmax function in PyTorch will be converted to the corresponding softmax operator in ONNX.

### ONNX Conversion Types

ONNX supports two primary conversion methods:

1.  **Trace-based:** This method involves providing an input to the model and then executing the model. The operators (functions) used by the model during this execution process are recorded (traced). A crucial point to note is that if your model is a **dynamic model** (e.g., using different functions based on the input data), the converted model may be inaccurate.
2.  **Script-based:** In this method, the model is exported as a **ScriptModule**.


## **When should you use ONNX?**

You should consider converting your model to ONNX if:

1. **You need to deploy the model in a cleaner production environment**
.Let's say you have a more complex scenario with few neural networks and some of them are trained using PyTorch while others are trained using TensorFlow. In this case, if you use ONNX , you do not need to install specific versions of PyTorch or TensorFlow when deploying your models as you only need on ONNX Runtime so it becomes way easier to combine multiple models.
ONNX is especially useful when:
* Running on high-performance servers
* Running on edge devices (IoT sensors, gateways)
* Deploying in a mobile app
* Or if the system requires real-time predictions

2. **Performance Gains**
  ONNX Runtime can run faster than other frameworks such as:
* 20‚Äì50% faster than TensorFlow
* 10‚Äì40% faster than PyTorch
  ‚Üí Especially useful for time-series models such as LSTM, TCN, or Transformer.

3. **Using specific backends providers with ONNX runtime**
   This let you leverage hardware accelerators such as: NVIDIA TensorRT, Intel OpenVINO, or AMD MIOpen to speed up inference. Also, training in Python but deploying on a C#/C++/Java application means that you can switch back and forth between multiple programming languages.
### **When is ONNX not necessary?**

You don‚Äôt need ONNX if:

1. **You are only analyzing data in a notebook and not deploying**

* Training experimentally
* Running locally
* Only doing EDA or prototyping
  ‚Üí PyTorch or TensorFlow alone is sufficient.

2. **The model uses many custom operators**
For example:

* Custom PyTorch modules
* Custom TensorFlow layers
  ‚Üí Converting to ONNX may be difficult or not fully supported.

**Conclusion**

If you want to deploy a model for real-world use, **ONNX is recommended** for faster inference and easier deployment.
If you are only analyzing or experimenting in a notebook, **ONNX is not necessary**.


## Appplication to this project

In [3]:
df = pd.read_excel("HCMWeatherDaily.xlsx")
df["datetime"] = pd.to_datetime(df["datetime"])
df = df.sort_values("datetime")

In [4]:
def feature_eng(df):
        df = df.copy().sort_values(by = ['datetime'])
        #drop uneeded data
        columns_to_drop = [
            'name', 'snow', 'snowdepth', 'name', 'stations', 'conditions','description', 'severerisk', 'sunset', 'sunrise', 'moonphase', 'precipprob', 'uvindex'
        ]
        df_cleaned = df.drop(columns=columns_to_drop, errors='ignore')

        #create time feature
        df_cleaned['month'] = df_cleaned['datetime'].dt.month
        df_cleaned['dfy'] = df_cleaned['datetime'].dt.dayofyear
        # df_cleaned['year'] = df_cleaned['datetime'].dt.year
        # make sense the characteristics
        df_cleaned['month_sin'] = np.sin(2 * np.pi * df_cleaned['month'] / 12)
        df_cleaned['month_cos'] = np.cos(2 * np.pi * df_cleaned['month'] / 12)
        df_cleaned['dfy_sin'] = np.sin(2 * np.pi * df_cleaned['dfy'] / 365)
        df_cleaned['dfy_cos'] = np.cos(2 * np.pi * df_cleaned['dfy'] / 365)

        time_df = df_cleaned[['month_sin', 'month_cos', 'dfy_sin', 'dfy_cos']]
        
        # ROLLING FEATURES
        df_cleaned['winddir_sin'] = np.sin(np.deg2rad(df_cleaned['winddir']))
        df_cleaned['winddir_cos'] = np.cos(np.deg2rad(df_cleaned['winddir']))
        rolling_fea = ['winddir_cos', 'winddir_sin', 'dew', 'humidity', 'precip', 'precipcover', 'visibility', 'solarenergy', 'cloudcover', 'windspeed']
        
        # DERIVED FEATURES
        derived = {}
        derived['temp_range_lag1'] = df_cleaned['tempmax'].shift(1) - df_cleaned['tempmin'].shift(1)
        derived['dew_temp_diff'] = df_cleaned['dew'].shift(1) - df_cleaned['temp'].shift(1)
        derived['solar_per_cloud'] = df_cleaned['solarenergy'].shift(1) * (1- df_cleaned['cloudcover'].shift(1))/100
        derived['humid_rad_ratio'] = df_cleaned['humidity'].shift(1)/ (df_cleaned['solarradiation'].shift(1)+1e-6)
        derived['wind_humidity_interaction'] = df_cleaned['humidity'].shift(1) * (df_cleaned['windspeed'].shift(1)) / 100
        derived['temp_humid'] = df_cleaned['temp'].shift(1) * df_cleaned['humidity'].shift(1)
        derived['heat_index'] = df_cleaned['feelslike'].shift(1) - df_cleaned['temp'].shift(1)
        derived['flmax_humid'] = df_cleaned['feelslikemax'].shift(1) * df_cleaned['humidity'].shift(1)/100
        derived['flmin_cloud'] = df_cleaned['feelslikemin'].shift(1) * df_cleaned['cloudcover'].shift(1)/100
        derived['sea_level_pressure_tendency'] = df_cleaned['sealevelpressure'].shift(1) - df_cleaned['sealevelpressure'].shift(6)
        #df_cleaned['wind_dir'].head(10)
        # interaction of solar energy and cloud cover
        derived_df = pd.DataFrame(derived)
        df_cleaned = pd.concat([df_cleaned, derived_df], axis=1)
        
        
        
        #onehot encode
        nominal_cols = ['icon','preciptype']
        df_encoded = pd.get_dummies(df_cleaned[nominal_cols], drop_first = True).astype(int)
        df_no_cat = df_cleaned.drop(columns = nominal_cols, errors = 'ignore')
        # Combine encoded features
        df_cleaned = pd.concat([df_no_cat, df_encoded], axis = 1)
        # print(df_cleaned.columns)
        
        # Season features
        season = {}
        for feature in ['humidity', 'dew', 'precip', 'windspeed']:
            season[f'{feature}_seasonal'] = df_cleaned[feature].shift(1).rolling(3).max() - df_cleaned[feature].shift(1).rolling(7).max()
            season[f'{feature}_trend'] = df_cleaned[feature].shift(1) - df_cleaned[feature].shift(2)
            season[f'{feature}_derivative'] = season[f'{feature}_trend'].shift(1) - season [f'{feature}_trend'].shift(2)

        season_df = pd.DataFrame(season)
        df_cleaned = pd.concat([df_cleaned, season_df], axis = 1)
        
        rolling_columns = {}
        for num in [7, 21, 42, 84, 126, 182]:
            for feature in rolling_fea:
                rolling_columns[f'{num}D_AVG_{feature}'] = df_cleaned[feature].shift(1).rolling(num).mean()
                rolling_columns[f'{num}D_STD_{feature}'] = df_cleaned[feature].shift(1).rolling(num).std()
            for feature in ['icon_partly-cloudy-day', 'icon_rain']:
                rolling_columns[f'{num}D_AVG_{feature}'] = df_cleaned[feature].shift(1).rolling(num).mean()
        rolling_columns_df = pd.DataFrame(rolling_columns)
        df_cleaned = pd.concat([df_cleaned, rolling_columns_df], axis = 1)   

        # df_fe = df_cleaned[['temp'] + time_features]
        full_features = ['temp', 'datetime']  + list(derived_df.columns) + list(time_df.columns) + list(season_df.columns) + list(rolling_columns_df.columns)
        df_fe = df_cleaned[full_features]
    
        # df_fe = df_fe.replace([np.inf, -np.inf], np.nan)
        # df_fe = df_fe.select_dtypes(include=[np.number]).fillna(0)
        # # 3. Drop NaNs 
        # print(f'num of na: {df_fe.isna().sum()}')
        df_fe = df_fe.fillna(0)


        return df_fe
    
    



In [5]:
train_end = pd.Timestamp("2023-06-30")
gap_months = 9
test_start = train_end + pd.DateOffset(months=gap_months)

df_cleaned_fe = feature_eng(df)

# === Chu·∫©n b·ªã d·ªØ li·ªáu d·ª± ƒëo√°n 5 ng√†y ===
df_multi = df_cleaned_fe.copy()

# T·∫°o c√°c target d, d+1, d+2, d+3, d+4
df_multi['temp_d']   = df_multi['temp']          # h√¥m nay
df_multi['temp_d+1'] = df_multi['temp'].shift(-1)
df_multi['temp_d+2'] = df_multi['temp'].shift(-2)
df_multi['temp_d+3'] = df_multi['temp'].shift(-3)
df_multi['temp_d+4'] = df_multi['temp'].shift(-4)

# Features
X_multi = df_multi.drop(columns=['temp','temp_d','temp_d+1','temp_d+2','temp_d+3','temp_d+4','datetime'])
y_multi = df_multi[['temp_d','temp_d+1','temp_d+2','temp_d+3','temp_d+4']]
dates = df_multi['datetime']

# Chia train/test theo th·ªùi gian
X_train_multi = X_multi[dates <= train_end].fillna(0)
y_train_multi = y_multi[dates <= train_end]

X_test_multi  = X_multi[dates >= test_start].fillna(0)
y_test_multi  = y_multi[dates >= test_start]

#  Lo·∫°i b·ªè c√°c row trong y_train/y_test c√≥ NaN ƒë·ªÉ tr√°nh l·ªói XGBoost
train_valid_idx = y_train_multi.dropna().index
X_train_multi = X_train_multi.loc[train_valid_idx]
y_train_multi = y_train_multi.loc[train_valid_idx]

test_valid_idx = y_test_multi.dropna().index
X_test_multi = X_test_multi.loc[test_valid_idx]
y_test_multi = y_test_multi.loc[test_valid_idx]
# Before calling convert_all_models_from_notebook(...)

# Convert X dataframes/arrays to float type, ensuring no strings remain
X_train_multi = X_train_multi.astype(np.float32)
X_test_multi = X_test_multi.astype(np.float32)

# Also ensure Y (labels) are correct integers or floats depending on the task
y_train_multi = y_train_multi.astype(np.int32) 
y_test_multi = y_test_multi.astype(np.int32) 
# or 
# y_train_multi = y_train_multi.astype(np.float32)


In [6]:
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error, mean_absolute_percentage_error


base_models = [
    ("XGBoost", XGBRegressor(
        n_estimators=700,
        learning_rate=0.01211,
        max_depth=2,
        min_child_weight=3,
        subsample=0.53156,
        colsample_bytree=0.52213,
        reg_alpha=0.05247,
        reg_lambda=0.00000128,
        random_state=42,
        early_stopping_rounds=100,
    )
    ),
    
    ("CatBoost", CatBoostRegressor(
        iterations=1000, learning_rate=0.05, depth=6, loss_function='RMSE',
        random_seed=42, eval_metric='RMSE', verbose=False
    )),
    
    ("AdaBoost", AdaBoostRegressor(
        n_estimators=50,
        learning_rate=0.02765299922596566,
        random_state=42
    )
    )
]


day_targets = ['temp_d','temp_d+1','temp_d+2','temp_d+3','temp_d+4']
day_labels  = ['Day 0','Day 1','Day 2','Day 3','Day 4']


# === V√íNG FOR TRAIN T·∫§T C·∫¢ MODEL ===
all_results = []

for i, target in enumerate(day_targets):
    y_train_day = y_train_multi[target]
    y_test_day  = y_test_multi[target]

    for name, model in base_models:
        # Fit model
        if name == "XGBoost":
            model.fit(X_train_multi, y_train_day,
                      eval_set=[(X_train_multi, y_train_day), (X_test_multi, y_test_day)],
                      verbose=False)
        elif name == "CatBoost":
            model.fit(X_train_multi, y_train_day,
                      eval_set=(X_test_multi, y_test_day),
                      use_best_model=True)
        else:
            model.fit(X_train_multi, y_train_day)
        
        # Predict
        y_pred_train = model.predict(X_train_multi)
        y_pred_test  = model.predict(X_test_multi)

       

In [7]:
!pip install onnx --quiet
!pip install onnxruntime --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [8]:
pip install onnx onnxruntime skl2onnx onnxmltools --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [17]:
"""
Convert trained ML models to ONNX format for efficient deployment
Supports: XGBoost, CatBoost, LightGBM, RandomForest, AdaBoost, DecisionTree
"""

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor, AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from catboost import CatBoostRegressor
import onnx
import onnxruntime as ort
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from onnxmltools.convert import convert_xgboost, convert_lightgbm
from onnxmltools.convert.common.data_types import FloatTensorType as OnnxFloatTensorType
import os
class ModelToONNXConverter:
    """
    Convert various ML models to ONNX format
    """
    
    def __init__(self, n_features, model_save_dir='onnx_models'):
        """
        Initialize converter
        
        Args:
            n_features: Number of input features
            model_save_dir: Directory to save ONNX models
        """
        self.n_features = n_features
        self.model_save_dir = model_save_dir
        
        # Create directory if it doesn't exist
        os.makedirs(model_save_dir, exist_ok=True)
        
    def convert_xgboost(self, model, model_name, target_opset=12):
        """Convert XGBoost model to ONNX"""
        try:
            initial_type = [('float_input', OnnxFloatTensorType([None, self.n_features]))]
            onnx_model = convert_xgboost(
                model,
                initial_types=initial_type,
                target_opset=target_opset
            )
            save_path = os.path.join(self.model_save_dir, f"{model_name}.onnx")
            onnx.save_model(onnx_model, save_path)
            print(f"‚úÖ XGBoost model saved: {save_path}")
            return save_path
        except Exception as e:
            print(f"‚ùå Error converting XGBoost: {e}")
            return None
        
    
    def convert_lightgbm(self, model, model_name, target_opset=12):
        """Convert LightGBM model to ONNX"""
        try:
            initial_type = [('float_input', OnnxFloatTensorType([None, self.n_features]))]
            onnx_model = convert_lightgbm(
                model,
                initial_types=initial_type,
                target_opset=target_opset
            )
            
            save_path = os.path.join(self.model_save_dir, f"{model_name}.onnx")
            onnx.save_model(onnx_model, save_path)
            print(f"‚úÖ LightGBM model saved: {save_path}")
            return save_path
        except Exception as e:
            print(f"‚ùå Error converting LightGBM: {e}")
            return None
    
    def convert_catboost(self, model, model_name):
        """Convert CatBoost model to ONNX"""
        try:
            # CatBoost has built-in ONNX export
            save_path = os.path.join(self.model_save_dir, f"{model_name}.onnx")
            model.save_model(
                save_path,
                format="onnx",
                export_parameters={
                    'onnx_domain': 'ai.catboost',
                    'onnx_model_version': 1,
                    'onnx_doc_string': 'CatBoost model for temperature prediction',
                    'onnx_graph_name': 'CatBoostModel'
                }
            )
            print(f"‚úÖ CatBoost model saved: {save_path}")
            return save_path
        except Exception as e:
            print(f"‚ùå Error converting CatBoost: {e}")
            return None
    
    def convert_sklearn(self, model, model_name, target_opset=12):
        """Convert sklearn-based models (RandomForest, AdaBoost, DecisionTree) to ONNX"""
        try:
            initial_type = [('float_input', FloatTensorType([None, self.n_features]))]
            onnx_model = convert_sklearn(
                model,
                initial_types=initial_type,
                target_opset=target_opset
            )
            
            save_path = os.path.join(self.model_save_dir, f"{model_name}.onnx")
            onnx.save_model(onnx_model, save_path)
            print(f"‚úÖ Sklearn model saved: {save_path}")
            return save_path
        except Exception as e:
            print(f"‚ùå Error converting Sklearn model: {e}")
            return None
    
    def convert_model(self, model, model_name, model_type):
        """
        Universal converter that routes to appropriate conversion method
        
        Args:
            model: Trained model object
            model_name: Name for saving the model
            model_type: Type of model ('xgboost', 'lightgbm', 'catboost', 'sklearn')
        
        Returns:
            Path to saved ONNX model or None if failed
        """
        model_type = model_type.lower()
        
        if model_type == 'xgboost':
            return self.convert_xgboost(model, model_name)
        elif model_type == 'lightgbm':
            return self.convert_lightgbm(model, model_name)
        elif model_type == 'catboost':
            return self.convert_catboost(model, model_name)
        elif model_type in ['sklearn', 'randomforest', 'adaboost', 'decisiontree']:
            return self.convert_sklearn(model, model_name)
        else:
            print(f"‚ùå Unknown model type: {model_type}")
            return None
    
    def verify_onnx_model(self, onnx_path, X_test_sample):
        """
        Verify ONNX model by running inference
        
        Args:
            onnx_path: Path to ONNX model
            X_test_sample: Sample data for testing (numpy array)
        
        Returns:
            Prediction results or None if failed
        """
        try:
            # Load ONNX model
            ort_session = ort.InferenceSession(onnx_path)
            
            # Get input name
            input_name = ort_session.get_inputs()[0].name
            
            # Run inference
            ort_inputs = {input_name: X_test_sample.astype(np.float32)}
            ort_outputs = ort_session.run(None, ort_inputs)
            
            print(f"‚úÖ ONNX model verified: {onnx_path}")
            print(f"   Output shape: {ort_outputs[0].shape}")
            return ort_outputs[0]
        except Exception as e:
            print(f"‚ùå Error verifying ONNX model: {e}")
            return None


# ============================================================================
# EXAMPLE USAGE WITH YOUR NOTEBOOK'S MODELS
# ============================================================================

def convert_all_models_from_notebook(X_train_multi, y_train_multi, X_test_multi, y_test_multi):
    """
    Train and convert all models from your notebook to ONNX
    """
    
    # Initialize converter
    n_features = X_train_multi.shape[1]
    converter = ModelToONNXConverter(n_features=n_features, model_save_dir='onnx_models')
    
    # Define models from your notebook
    base_models = [
        ("XGBoost", XGBRegressor(
            n_estimators=600, learning_rate=0.05, max_depth=3, min_child_weight=5,
            subsample=0.8, colsample_bytree=0.8, reg_alpha=0.1, reg_lambda=1.0,
            random_state=42
        ), "xgboost"),
        
        ("CatBoost", CatBoostRegressor(
            iterations=1000, learning_rate=0.05, depth=6, loss_function='RMSE',
            random_seed=42, eval_metric='RMSE', verbose=False
        ), "catboost"),
        
        ("AdaBoost", AdaBoostRegressor(
            n_estimators=400, learning_rate=0.05, random_state=42
        ), "sklearn"),
        
        ("RandomForest", RandomForestRegressor(
            n_estimators=200, max_depth=10, min_samples_split=22,
            min_samples_leaf=9, random_state=42, n_jobs=-1
        ), "sklearn"),
        
        ("DecisionTree", DecisionTreeRegressor(
            max_depth=6, min_samples_split=20, min_samples_leaf=10,
            random_state=42
        ), "sklearn")
    ]
    
    # Target columns for 5-day prediction
    day_targets = ['temp_d', 'temp_d+1', 'temp_d+2', 'temp_d+3', 'temp_d+4']
    day_labels = ['Day0', 'Day1', 'Day2', 'Day3', 'Day4']
    
    converted_models = []
    
    print("\n" + "="*70)
    print("üöÄ STARTING MODEL TRAINING AND ONNX CONVERSION")
    print("="*70)
    
    # Loop through each day prediction
    for i, (target, day_label) in enumerate(zip(day_targets, day_labels)):
        print(f"\nüìÖ Processing {day_label} ({target})")
        print("-" * 70)
        
        y_train_day = y_train_multi[target]
        y_test_day = y_test_multi[target]
        
        # Loop through each model
        for model_name, model, model_type in base_models:
            print(f"\nüîß Training {model_name} for {day_label}...")
            
            # Train model
            if model_type == "xgboost":
                model.fit(X_train_multi, y_train_day, verbose=False)
            elif model_type == "catboost":
                model.fit(X_train_multi, y_train_day, verbose=False)
            else:
                model.fit(X_train_multi, y_train_day)
            
            # Convert to ONNX
            onnx_name = f"{model_name}_{day_label}"
            onnx_path = converter.convert_model(model, onnx_name, model_type)
            
            if onnx_path:
                # Verify conversion
                X_sample = X_test_multi[:5].values if hasattr(X_test_multi, 'values') else X_test_multi[:5]
                converter.verify_onnx_model(onnx_path, X_sample)
                
                converted_models.append({
                    'day': day_label,
                    'model': model_name,
                    'onnx_path': onnx_path,
                    'status': 'success'
                })
            else:
                converted_models.append({
                    'day': day_label,
                    'model': model_name,
                    'onnx_path': None,
                    'status': 'failed'
                })
    
    # Summary
    print("\n" + "="*70)
    print("üìä CONVERSION SUMMARY")
    print("="*70)
    
    df_summary = pd.DataFrame(converted_models)
    print(df_summary)
    
    success_count = df_summary[df_summary['status'] == 'success'].shape[0]
    total_count = df_summary.shape[0]
    print(f"\n‚úÖ Successfully converted: {success_count}/{total_count} models")
    
    return converted_models, df_summary


# ============================================================================
# STANDALONE CONVERSION FUNCTION (if you already have trained models)
# ============================================================================

def convert_single_model_to_onnx(model, model_name, model_type, n_features, save_dir='onnx_models'):
    """
    Convert a single trained model to ONNX
    
    Args:
        model: Trained model object
        model_name: Name for the ONNX file
        model_type: 'xgboost', 'lightgbm', 'catboost', or 'sklearn'
        n_features: Number of input features
        save_dir: Directory to save the model
    
    Returns:
        Path to saved ONNX model
    """
    converter = ModelToONNXConverter(n_features=n_features, model_save_dir=save_dir)
    return converter.convert_model(model, model_name, model_type)


# ============================================================================
# USAGE EXAMPLE
# ============================================================================

if __name__ == "__main__":
    """
    To use this script with your notebook data:
    
    1. Make sure you have the required packages installed:
       pip install onnx onnxruntime skl2onnx onnxmltools
    
    2. Load your data (from your notebook):
       - X_train_multi, y_train_multi, X_test_multi, y_test_multi
    
    3. Run the conversion:
       converted_models, summary = convert_all_models_from_notebook(
           X_train_multi, y_train_multi, X_test_multi, y_test_multi
       )
    
    4. Your ONNX models will be saved in the 'onnx_models' directory
    """
    
    print("üìñ ONNX Model Converter Ready!")
    print("\nTo convert your models, call:")
    print("convert_all_models_from_notebook(X_train_multi, y_train_multi, X_test_multi, y_test_multi)")

üìñ ONNX Model Converter Ready!

To convert your models, call:
convert_all_models_from_notebook(X_train_multi, y_train_multi, X_test_multi, y_test_multi)


In [20]:
convert_all_models_from_notebook(X_train_multi, y_train_multi, X_test_multi, y_test_multi)


üöÄ STARTING MODEL TRAINING AND ONNX CONVERSION

üìÖ Processing Day0 (temp_d)
----------------------------------------------------------------------

üîß Training XGBoost for Day0...
‚ùå Error converting XGBoost: could not convert string to float: '[2.7926846E1]'

üîß Training CatBoost for Day0...
‚úÖ CatBoost model saved: onnx_models/CatBoost_Day0.onnx
‚úÖ ONNX model verified: onnx_models/CatBoost_Day0.onnx
   Output shape: (5, 1)

üîß Training AdaBoost for Day0...
[0;93m2025-11-17 15:28:37.159975590 [W:onnxruntime:, execution_frame.cc:874 VerifyOutputSizes] Expected shape from model of {-1} does not match actual shape of {5,1} for output predictions[m
‚úÖ Sklearn model saved: onnx_models/AdaBoost_Day0.onnx
‚úÖ ONNX model verified: onnx_models/AdaBoost_Day0.onnx
   Output shape: (5, 1)

üîß Training RandomForest for Day0...
‚úÖ Sklearn model saved: onnx_models/RandomForest_Day0.onnx
‚úÖ ONNX model verified: onnx_models/RandomForest_Day0.onnx
   Output shape: (5, 1)

üîß Trai

([{'day': 'Day0', 'model': 'XGBoost', 'onnx_path': None, 'status': 'failed'},
  {'day': 'Day0',
   'model': 'CatBoost',
   'onnx_path': 'onnx_models/CatBoost_Day0.onnx',
   'status': 'success'},
  {'day': 'Day0',
   'model': 'AdaBoost',
   'onnx_path': 'onnx_models/AdaBoost_Day0.onnx',
   'status': 'success'},
  {'day': 'Day0',
   'model': 'RandomForest',
   'onnx_path': 'onnx_models/RandomForest_Day0.onnx',
   'status': 'success'},
  {'day': 'Day0',
   'model': 'DecisionTree',
   'onnx_path': 'onnx_models/DecisionTree_Day0.onnx',
   'status': 'success'},
  {'day': 'Day1', 'model': 'XGBoost', 'onnx_path': None, 'status': 'failed'},
  {'day': 'Day1',
   'model': 'CatBoost',
   'onnx_path': 'onnx_models/CatBoost_Day1.onnx',
   'status': 'success'},
  {'day': 'Day1',
   'model': 'AdaBoost',
   'onnx_path': 'onnx_models/AdaBoost_Day1.onnx',
   'status': 'success'},
  {'day': 'Day1',
   'model': 'RandomForest',
   'onnx_path': 'onnx_models/RandomForest_Day1.onnx',
   'status': 'success'},
 

NOTICE : The XGBoost can not run since the bug : could not convert string to float: '[2.7926846E1]'. We tried to fix that in the lib of XGBoost, we evenly change the line that need to debug but it still do not work. So,we continue to let the other models run and save in the folder.