
## Payment Delay Training 

This notebook is designed to train a model for predicting payment delays. We will load and prepare the preprocessed data, handle missing values and encode categorical variables. The model training then consists of several steps to fit the models and select the best parameters for our final model. We will evaluate the models using various metrics to determine the best model.

### Install and Import Packages
All necessary packages for this notebook are going to be outlined in the following notebook cell. In order to make sure that the results are reproducible, the following packages are going to be installed.

In [0]:
%pip install xgboost
%pip install optuna
%pip install optuna-dashboard
%restart_python

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


In [0]:
import mlflow
from mlflow.models import infer_signature
from mlflow.client import MlflowClient
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_percentage_error
import optuna
import optuna.visualization as ov
from sklearn.model_selection import train_test_split
from pathlib import Path
import pandas as pd

  from google.protobuf import service as _service


### Setup Spark Session and consume data product
In order to isolate the created data assets, we create a catalog within Databricks and a respective schema within the catalog. Please replace the values `<CATALOG_NAME>` and `<SCHEMA_NAME>` with the specific values that match our use case and group. You can find the correct names by checking the **Unity Catalog** and look for the specific catalog and schema names:`uc_XXX`, `grpX`.

Please note: 
We adapted the code here to match our use case. Therefore, some of the lines are commented out and not needed. However, they can be useful for future applications. 

In [0]:
%sql
-- CREATE CATALOG IF NOT EXISTS <CATALOG_NAME>;
SET CATALOG uc_delayed_payment;
-- CREATE SCHEMA IF NOT EXISTS <SCHEMA_NAME>;
USE SCHEMA grp1;

### Prepare Data
Replace the value `<TABLE_NAME>` with the name of the table that we created with the data preparation notebook. Additionally, set the value `<SEED_PARAMETER>` with a random number. 

In [0]:
data = (spark.read.table("prepared_accounting_document").
    where(col("delay").isNotNull()).
    drop("ClearingDate", "NetDueDate").
    sample(0.25, seed=42))

In [0]:
mlflow.set_tracking_uri("databricks")
mlflow.set_registry_uri("databricks-uc")

### Create Training Data
In the following step we will create a train-test split and focus on the column `delay`. We will also check the data types and adjust accordingly. Once the data is ready, we can continue with the actual model training

Please adjust the code by setting the values for `<SPLIT>`. We aim for an 80% / 20% train-test split. Also, set a random number for the value `<SEED_PARAMETER>`.

In [0]:
train_df, test_df = data.drop("CompanyCode", "AccountingDocument", "FiscalYear", "AccountingDocumentItem").randomSplit([0.8, 0.2], seed=42)

The next cell converts the Spark DataFrames `train_df` and `test_df` into Pandas DataFrames.
It separates the target variable "delay" from the features for both training and testing datasets.

In [0]:
train_target = train_df.select("delay").toPandas()
train_data = train_df.drop("delay").toPandas()
test_target = test_df.select("delay").toPandas()
test_data = test_df.drop("delay").toPandas()

In the following function `infer_column_dtype` we want to return the strings `numeric`, `datetime` or `boolean` depending on the data type of the column. Please adjust the return statements accordingly by replacing `<DTYPE>`.

In [0]:
def infer_column_dtype(series):
    # Try to convert to numeric
    try:
        pd.to_numeric(series.dropna())
        return 'numeric'
    except:
        pass

    # Try to convert to datetime
    try:
        pd.to_datetime(series.dropna(), errors='raise', infer_datetime_format=True)
        return 'datetime'
    except:
        pass

    # If all unique values are 'True' or 'False' like
    lower_vals = set(str(v).strip().lower() for v in series.dropna().unique())
    if lower_vals <= {'true', 'false', '1', '0'}:
        return 'boolean'
    
    return 'string'


In [0]:
# apply to train data
for col in train_data.columns:
    inferred = infer_column_dtype(train_data[col])
    if inferred == 'numeric':
        train_data[col] = pd.to_numeric(train_data[col], errors='coerce')
    elif inferred == 'datetime':
        train_data[col] = pd.to_datetime(train_data[col], errors='coerce')
    elif inferred == 'boolean':
        train_data[col] = train_data[col].astype('bool')
    else:
        train_data[col] = train_data[col].astype('category')

# apply to test data
for col in test_data.columns:
    inferred = infer_column_dtype(test_data[col])
    if inferred == 'numeric':
        test_data[col] = pd.to_numeric(test_data[col], errors='coerce')
    elif inferred == 'datetime':
        test_data[col] = pd.to_datetime(test_data[col], errors='coerce')
    elif inferred == 'boolean':
        test_data[col] = test_data[col].astype('bool')
    else:
        test_data[col] = test_data[col].astype('category')

## Model Training
The function `model_training` performs the following steps:

1. Load the training data
2. Set model parameters
3. Create train-test split
4. Train the model
5. Evaluate the model's performance on the validation dataset
6. Fine-tune the model parameters for optimal performance

Running the code may take a lot of time.

In [0]:
mlflow.xgboost.autolog(log_input_examples=True)



In [0]:
def model_training(trial: optuna.trial.Trial, X: pd.DataFrame, y: pd.Series):
    # Convert unsupported data types
    X = X.copy()
    for col in X.select_dtypes(include=['category', 'datetime64[ns, UTC]']).columns:
        if X[col].dtype.name == 'category':
            X[col] = X[col].astype('category').cat.codes
        elif X[col].dtype.name == 'datetime64[ns, UTC]':
            X[col] = X[col].astype('int64')  

    params = {
        "booster": "gbtree",
        "objective": "reg:squarederror",
        "tree_method": "hist",
        "max_depth": trial.suggest_int("max_depth", 3, 10),
        "learning_rate": trial.suggest_float("learning_rate", 1e-2, 3e-1, log=True),
        "n_estimators": trial.suggest_int("n_estimators", 200, 1500),
        "subsample": trial.suggest_float("subsample", 0.5, 1.0),
        "colsample_bytree": trial.suggest_float("colsample_bytree", 0.5, 1.0),
        "gamma": trial.suggest_float("gamma", 0.0, 5.0),
        "min_child_weight": trial.suggest_float("min_child_weight", 1.0, 10.0),
        "reg_alpha": trial.suggest_float("reg_alpha", 0.0, 1.0),
        "reg_lambda": trial.suggest_float("reg_lambda", 0.0, 1.0),
        "random_state": 42,
    }

    X_train, X_valid, y_train, y_valid = train_test_split(
        X, y, test_size=0.2, random_state=42
    )

    model = XGBRegressor(**params, enable_categorical=True, n_jobs=-1, early_stopping_rounds=10)
    model.fit(
        X_train,
        y_train,
        eval_set=[(X_valid, y_valid)],
        verbose=False
    )
    preds = model.predict(X_valid)
    mse_score = mean_squared_error(y_valid, preds)
    mape_score = mean_absolute_percentage_error(y_valid, preds)
    r2_metric = r2_score(y_valid, preds)
    return mse_score

mlflow.set_registry_uri("databricks-uc")

study = optuna.create_study(
    direction="minimize",
    sampler=optuna.samplers.TPESampler(seed=42),
    pruner=optuna.pruners.MedianPruner(n_warmup_steps=5),
)
study.optimize(lambda trial: model_training(trial, train_data, train_target), n_trials=50)

[I 2025-07-08 06:31:42,887] A new study created in memory with name: no-name-c050be5b-9622-4272-883a-71be9372e000
2025/07/08 06:31:43 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'f1323ffa03a9497983275a530e4731dc', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:31:53 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:31:59,150] Trial 0 finished with value: 44.72896889231868 and parameters: {'max_depth': 5, 'learning_rate': 0.2536999076681772, 'n_estimators': 1152, 'subsample': 0.7993292420985183, 'colsample_bytree': 0.5780093202212182, 'gamma': 0.7799726016810132, 'min_child_weight': 1.5227525095137953, 'reg_alpha': 0.8661761457749352, 'reg_lambda': 0.6011150117432088}. Best is trial 0 with value: 44.72896889231868.
2025/07/08 06:31:59 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '44bf46dd8ea8476b88110157dbe3d8ea', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:33:28 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:33:33,593] Trial 1 finished with value: 11.98869495222585 and parameters: {'max_depth': 8, 'learning_rate': 0.010725209743171997, 'n_estimators': 1461, 'subsample': 0.9162213204002109, 'colsample_bytree': 0.6061695553391381, 'gamma': 0.9091248360355031, 'min_child_weight': 2.650640588680904, 'reg_alpha': 0.3042422429595377, 'reg_lambda': 0.5247564316322378}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:33:34 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '2f9ff31a2cb14b059786e614f067f109', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:34:19 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:34:24,560] Trial 2 finished with value: 34.326006405149315 and parameters: {'max_depth': 6, 'learning_rate': 0.02692655251486473, 'n_estimators': 996, 'subsample': 0.569746930326021, 'colsample_bytree': 0.6460723242676091, 'gamma': 1.8318092164684585, 'min_child_weight': 5.104629857953324, 'reg_alpha': 0.7851759613930136, 'reg_lambda': 0.19967378215835974}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:34:25 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '6b20a961470d4b8ebd684b3996f8e826', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:34:46 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:34:51,456] Trial 3 finished with value: 23.157061119395447 and parameters: {'max_depth': 7, 'learning_rate': 0.07500118950416987, 'n_estimators': 260, 'subsample': 0.8037724259507192, 'colsample_bytree': 0.5852620618436457, 'gamma': 0.3252579649263976, 'min_child_weight': 9.539969835279999, 'reg_alpha': 0.9656320330745594, 'reg_lambda': 0.8083973481164611}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:34:52 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '82bdae2d64914cfc8a2755dd0bafcf8c', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:35:50 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:35:55,925] Trial 4 finished with value: 55.092616006660776 and parameters: {'max_depth': 5, 'learning_rate': 0.013940346079873234, 'n_estimators': 1090, 'subsample': 0.7200762468698007, 'colsample_bytree': 0.5610191174223894, 'gamma': 2.475884550556351, 'min_child_weight': 1.3094966900369656, 'reg_alpha': 0.9093204020787821, 'reg_lambda': 0.2587799816000169}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:35:56 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '5119ffe764db4f658ded13666781ad83', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:36:39 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:36:45,394] Trial 5 finished with value: 17.029137220252917 and parameters: {'max_depth': 8, 'learning_rate': 0.028869220380495747, 'n_estimators': 876, 'subsample': 0.7733551396716398, 'colsample_bytree': 0.5924272277627636, 'gamma': 4.847923138822793, 'min_child_weight': 7.976195410250031, 'reg_alpha': 0.9394989415641891, 'reg_lambda': 0.8948273504276488}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:36:45 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'ff4d6795a7d84e63a7347fb925dfa9e8', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:36:55 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:37:00,980] Trial 6 finished with value: 30.854916291248383 and parameters: {'max_depth': 7, 'learning_rate': 0.22999586428143728, 'n_estimators': 315, 'subsample': 0.5979914312095727, 'colsample_bytree': 0.522613644455269, 'gamma': 1.6266516538163218, 'min_child_weight': 4.498095607205338, 'reg_alpha': 0.2713490317738959, 'reg_lambda': 0.8287375091519293}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:37:01 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e2a7725c35f84dd0921c213260888060', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:37:39 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:37:44,531] Trial 7 finished with value: 68.5464131360344 and parameters: {'max_depth': 5, 'learning_rate': 0.026000059117302653, 'n_estimators': 906, 'subsample': 0.5704621124873813, 'colsample_bytree': 0.9010984903770198, 'gamma': 0.3727532183988541, 'min_child_weight': 9.881982429404655, 'reg_alpha': 0.7722447692966574, 'reg_lambda': 0.1987156815341724}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:37:45 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e5042edd24ab412887c43ec11561db66', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:38:03 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:38:08,858] Trial 8 finished with value: 58.01445705352781 and parameters: {'max_depth': 3, 'learning_rate': 0.1601531217136121, 'n_estimators': 1119, 'subsample': 0.8645035840204937, 'colsample_bytree': 0.8856351733429728, 'gamma': 0.3702232586704518, 'min_child_weight': 4.226191556898454, 'reg_alpha': 0.11586905952512971, 'reg_lambda': 0.8631034258755935}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:38:09 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '83622f7c8e924b60bea14cad199b000b', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:38:43 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:38:48,857] Trial 9 finished with value: 24.957522318067685 and parameters: {'max_depth': 7, 'learning_rate': 0.030816017044468066, 'n_estimators': 282, 'subsample': 0.6554911608578311, 'colsample_bytree': 0.6625916610133735, 'gamma': 3.64803089169032, 'min_child_weight': 6.738017242196918, 'reg_alpha': 0.8872127425763265, 'reg_lambda': 0.4722149251619493}. Best is trial 1 with value: 11.98869495222585.
2025/07/08 06:38:49 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '76b537219bf04e79a6ca933930a3400d', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:40:00 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:40:05,978] Trial 10 finished with value: 11.560412789149192 and parameters: {'max_depth': 10, 'learning_rate': 0.010206070557576998, 'n_estimators': 1461, 'subsample': 0.9762904054253753, 'colsample_bytree': 0.7728591400411077, 'gamma': 3.451235367845725, 'min_child_weight': 3.2989911288867892, 'reg_alpha': 0.4040260720186447, 'reg_lambda': 0.4792951727359478}. Best is trial 10 with value: 11.560412789149192.
2025/07/08 06:40:06 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'b0ac5bdf52c54aee9b85190dcb58deae', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:41:24 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:41:30,114] Trial 11 finished with value: 11.262265516712958 and parameters: {'max_depth': 10, 'learning_rate': 0.010290509463842875, 'n_estimators': 1458, 'subsample': 0.9856075751261573, 'colsample_bytree': 0.7692492775881493, 'gamma': 3.513496163789992, 'min_child_weight': 2.7994580130099718, 'reg_alpha': 0.4635918553211216, 'reg_lambda': 0.4834189706494368}. Best is trial 11 with value: 11.262265516712958.
2025/07/08 06:41:30 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '2152f112193d469a99560bad90cf391a', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:42:24 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:42:29,966] Trial 12 finished with value: 12.231520680552569 and parameters: {'max_depth': 10, 'learning_rate': 0.016090378457491377, 'n_estimators': 1478, 'subsample': 0.9993825915423253, 'colsample_bytree': 0.773494291083593, 'gamma': 3.588729908849948, 'min_child_weight': 2.992087886377533, 'reg_alpha': 0.5615195711001008, 'reg_lambda': 0.39452773428750565}. Best is trial 11 with value: 11.262265516712958.
2025/07/08 06:42:30 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '4e763caa001c48bf925e301876eb62bd', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:42:51 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:42:56,628] Trial 13 finished with value: 13.916697778512427 and parameters: {'max_depth': 10, 'learning_rate': 0.06372126605598062, 'n_estimators': 609, 'subsample': 0.9996812193207756, 'colsample_bytree': 0.7783018090268712, 'gamma': 3.5857872450314563, 'min_child_weight': 3.1765062820968044, 'reg_alpha': 0.5293469466158548, 'reg_lambda': 0.66932019101184}. Best is trial 11 with value: 11.262265516712958.
2025/07/08 06:42:57 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'fef8b778f2a64d33b195a8f791f832d1', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:44:06 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:44:11,868] Trial 14 finished with value: 20.533538139428956 and parameters: {'max_depth': 9, 'learning_rate': 0.010434934234060698, 'n_estimators': 1309, 'subsample': 0.9209681375680994, 'colsample_bytree': 0.997344085519017, 'gamma': 4.501945220447685, 'min_child_weight': 6.02918724738983, 'reg_alpha': 0.3849679031223864, 'reg_lambda': 0.3846362884431037}. Best is trial 11 with value: 11.262265516712958.
2025/07/08 06:44:12 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '587eae2f66da4db88b6a301578b14784', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:45:11 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:45:17,122] Trial 15 finished with value: 10.512458621904626 and parameters: {'max_depth': 9, 'learning_rate': 0.01929336929209302, 'n_estimators': 1251, 'subsample': 0.9255692062543908, 'colsample_bytree': 0.7136008139243852, 'gamma': 2.9672393737603278, 'min_child_weight': 3.7851067242311287, 'reg_alpha': 0.6412492882382235, 'reg_lambda': 0.6847322143200381}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:45:17 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'd6b4d966c41d4b1989c833e626e54b9b', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:46:11 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:46:17,073] Trial 16 finished with value: 12.203730747053488 and parameters: {'max_depth': 9, 'learning_rate': 0.018052178148322568, 'n_estimators': 1292, 'subsample': 0.8656827772929472, 'colsample_bytree': 0.6975731931568029, 'gamma': 2.8847458694483894, 'min_child_weight': 2.162372349491923, 'reg_alpha': 0.6743015516475324, 'reg_lambda': 0.013179214380923399}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:46:17 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '72fed7eed52540089153ec39fd719ec7', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:46:42 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:46:47,900] Trial 17 finished with value: 11.662779382942322 and parameters: {'max_depth': 9, 'learning_rate': 0.043935402772298154, 'n_estimators': 691, 'subsample': 0.9257586930950283, 'colsample_bytree': 0.8448372126058836, 'gamma': 4.1740978967231275, 'min_child_weight': 3.9506154053406375, 'reg_alpha': 0.6903532720592127, 'reg_lambda': 0.7074708755705578}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:46:48 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'def5d7ecc58c425d9921d10f5d8899cb', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:47:00 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:47:06,192] Trial 18 finished with value: 15.933352582325908 and parameters: {'max_depth': 8, 'learning_rate': 0.09691933397433784, 'n_estimators': 1312, 'subsample': 0.8484798249927401, 'colsample_bytree': 0.7190927655406338, 'gamma': 2.5466354943676404, 'min_child_weight': 5.95037183074397, 'reg_alpha': 0.09017405294742764, 'reg_lambda': 0.7083280293583518}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:47:06 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '1edbf652c4d94d078b5484bc4ba392c1', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:47:45 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:47:51,512] Trial 19 finished with value: 14.835381426143323 and parameters: {'max_depth': 9, 'learning_rate': 0.019226623717450545, 'n_estimators': 1245, 'subsample': 0.7159288929998384, 'colsample_bytree': 0.8276805446655708, 'gamma': 3.013861386823152, 'min_child_weight': 7.049838668255706, 'reg_alpha': 0.5853112490757284, 'reg_lambda': 0.9668032162157972}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:47:52 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'b12b5a32e6db40feb79cefb8345676cf', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:48:42 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:48:47,459] Trial 20 finished with value: 75.67914736353873 and parameters: {'max_depth': 3, 'learning_rate': 0.03724758407777501, 'n_estimators': 583, 'subsample': 0.9472069741612222, 'colsample_bytree': 0.9640514341164363, 'gamma': 1.8943797783382172, 'min_child_weight': 2.122574885094949, 'reg_alpha': 0.45969085941145454, 'reg_lambda': 0.5801848125515162}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:48:48 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'dfb8e314bd514bdb9302644eb7adb52c', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:50:12 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:50:17,821] Trial 21 finished with value: 11.102502084621092 and parameters: {'max_depth': 10, 'learning_rate': 0.010383459778716245, 'n_estimators': 1436, 'subsample': 0.96435817396435, 'colsample_bytree': 0.7403689004603271, 'gamma': 3.9288169650889704, 'min_child_weight': 3.621522340874989, 'reg_alpha': 0.4163277297894666, 'reg_lambda': 0.4240818488368138}. Best is trial 15 with value: 10.512458621904626.
2025/07/08 06:50:18 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'fb13f2652dbc4ab69e21f02b4f6c33d0', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:51:26 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:51:31,458] Trial 22 finished with value: 10.175806118731453 and parameters: {'max_depth': 10, 'learning_rate': 0.01350198546958803, 'n_estimators': 1356, 'subsample': 0.8930320162511838, 'colsample_bytree': 0.7298260008718997, 'gamma': 4.109596684564759, 'min_child_weight': 3.8515169013186075, 'reg_alpha': 0.2510912068515665, 'reg_lambda': 0.3606445241280036}. Best is trial 22 with value: 10.175806118731453.
2025/07/08 06:51:32 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '8ce9b1f41ea442069ef5db82975f68c8', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:52:09 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:52:15,265] Trial 23 finished with value: 13.410539551919827 and parameters: {'max_depth': 10, 'learning_rate': 0.020527549289163977, 'n_estimators': 1364, 'subsample': 0.5091560990452557, 'colsample_bytree': 0.7185874321408128, 'gamma': 4.323215453508567, 'min_child_weight': 4.807597955747773, 'reg_alpha': 0.2164675633138951, 'reg_lambda': 0.3343335292409134}. Best is trial 22 with value: 10.175806118731453.
2025/07/08 06:52:15 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'fdc186c28a284f44a4678fd30b39e9ca', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:53:29 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:53:34,840] Trial 24 finished with value: 9.332473672942985 and parameters: {'max_depth': 9, 'learning_rate': 0.014117424137166936, 'n_estimators': 1196, 'subsample': 0.8943960712921115, 'colsample_bytree': 0.6549162921667234, 'gamma': 4.004895558289636, 'min_child_weight': 3.8008078602246087, 'reg_alpha': 0.16448804475298373, 'reg_lambda': 0.04420583478650142}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:53:35 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '04dea49e3cda4da78e3092f06cf01067', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:54:32 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:54:37,923] Trial 25 finished with value: 15.080888345798364 and parameters: {'max_depth': 8, 'learning_rate': 0.013937049675539477, 'n_estimators': 1189, 'subsample': 0.8795066159430124, 'colsample_bytree': 0.6590513565761367, 'gamma': 4.986507545524743, 'min_child_weight': 5.436846792119293, 'reg_alpha': 0.0209928166776997, 'reg_lambda': 0.0046861665342635694}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:54:38 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '94de2d201f1345fbac1f3edf2108c7a1', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:55:25 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:55:30,717] Trial 26 finished with value: 10.42088414327574 and parameters: {'max_depth': 9, 'learning_rate': 0.021190684085194478, 'n_estimators': 999, 'subsample': 0.8213190480367871, 'colsample_bytree': 0.6860182059693651, 'gamma': 3.0925833546550705, 'min_child_weight': 3.8093372402148153, 'reg_alpha': 0.19471762754841937, 'reg_lambda': 0.10339216915858127}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:55:31 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '83f2965fc0f14d95a4f31d46efe403f0', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:55:53 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:55:58,625] Trial 27 finished with value: 12.726008260083649 and parameters: {'max_depth': 9, 'learning_rate': 0.048081685128972984, 'n_estimators': 751, 'subsample': 0.8249756343229655, 'colsample_bytree': 0.672532753459829, 'gamma': 4.00983726092563, 'min_child_weight': 4.562923455720088, 'reg_alpha': 0.1859693355072623, 'reg_lambda': 0.1054117882344251}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:55:59 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '897f82fd93ce48fe83894279b4964b60', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:56:58 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:57:04,362] Trial 28 finished with value: 12.067741806957464 and parameters: {'max_depth': 8, 'learning_rate': 0.014157401328455449, 'n_estimators': 1040, 'subsample': 0.7643080707754748, 'colsample_bytree': 0.6323585726328625, 'gamma': 4.592594666998508, 'min_child_weight': 1.8434640523761237, 'reg_alpha': 0.318305901366059, 'reg_lambda': 0.09972590588033578}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:57:05 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '03f2c31795a14e1dbac95d683d8b0b06', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:58:09 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:58:14,762] Trial 29 finished with value: 43.119085989535385 and parameters: {'max_depth': 4, 'learning_rate': 0.02426884638925049, 'n_estimators': 982, 'subsample': 0.8914222396588931, 'colsample_bytree': 0.8241909080677585, 'gamma': 3.192650726911488, 'min_child_weight': 1.2339729906147516, 'reg_alpha': 0.010348438969096208, 'reg_lambda': 0.10372772654542939}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:58:15 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'dbe91d7d82db405bb512a8fe61ae1e92', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:58:43 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:58:49,072] Trial 30 finished with value: 32.559660418839535 and parameters: {'max_depth': 6, 'learning_rate': 0.03724032616152533, 'n_estimators': 1155, 'subsample': 0.8244811489476258, 'colsample_bytree': 0.5130042585806177, 'gamma': 2.5323521403865534, 'min_child_weight': 5.767145294580365, 'reg_alpha': 0.15257239535096628, 'reg_lambda': 0.2610443578946208}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:58:49 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'ed12195200f34658854ef877c1643a7d', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 06:59:47 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 06:59:54,175] Trial 31 finished with value: 9.621065662658918 and parameters: {'max_depth': 9, 'learning_rate': 0.021345250600159186, 'n_estimators': 1217, 'subsample': 0.897784209718928, 'colsample_bytree': 0.7039147285447092, 'gamma': 2.8786675525452003, 'min_child_weight': 3.8021514796796514, 'reg_alpha': 0.27250629468456516, 'reg_lambda': 0.1766659478136557}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 06:59:54 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'b1f8bffe7c6342399cfac963a99e05ea', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:01:02 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:01:08,428] Trial 32 finished with value: 9.658714484154249 and parameters: {'max_depth': 9, 'learning_rate': 0.013525229985682587, 'n_estimators': 1183, 'subsample': 0.8959673761515539, 'colsample_bytree': 0.6155388525294209, 'gamma': 3.895320527515483, 'min_child_weight': 2.5321091221801084, 'reg_alpha': 0.24495025389028824, 'reg_lambda': 0.16301968007531487}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 07:01:09 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'ecc3240fa68740099de3bdf1da067f49', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:02:16 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:02:23,286] Trial 33 finished with value: 13.486679204052448 and parameters: {'max_depth': 8, 'learning_rate': 0.011914021328050006, 'n_estimators': 1198, 'subsample': 0.8884902947783615, 'colsample_bytree': 0.6238461898334897, 'gamma': 3.960813093439146, 'min_child_weight': 2.6054125337904774, 'reg_alpha': 0.2775096538591462, 'reg_lambda': 0.19182393407958548}. Best is trial 24 with value: 9.332473672942985.
2025/07/08 07:02:24 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '50ec52339f8d47348cda7f630d020794', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:03:34 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:03:40,317] Trial 34 finished with value: 8.900314778404285 and parameters: {'max_depth': 10, 'learning_rate': 0.015039144865954868, 'n_estimators': 1374, 'subsample': 0.9072299628255358, 'colsample_bytree': 0.5485611209656829, 'gamma': 4.639445397921186, 'min_child_weight': 2.4678462317269156, 'reg_alpha': 0.36093417397674615, 'reg_lambda': 0.2974792980378668}. Best is trial 34 with value: 8.900314778404285.
2025/07/08 07:03:40 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '2e7cbdd0f9854292b59c527885ebddc9', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:04:38 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:04:43,816] Trial 35 finished with value: 16.756363258049024 and parameters: {'max_depth': 7, 'learning_rate': 0.016850000164798068, 'n_estimators': 1382, 'subsample': 0.8459335049414679, 'colsample_bytree': 0.5511379469641278, 'gamma': 4.596536129196698, 'min_child_weight': 2.39503396785085, 'reg_alpha': 0.33098555707788685, 'reg_lambda': 0.27451713916857085}. Best is trial 34 with value: 8.900314778404285.
2025/07/08 07:04:44 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'f42f47125cb04afdb68ec260a5c272ea', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:05:48 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:05:54,500] Trial 36 finished with value: 8.826051693245187 and parameters: {'max_depth': 9, 'learning_rate': 0.01580372108716155, 'n_estimators': 1113, 'subsample': 0.9523375317668039, 'colsample_bytree': 0.6049044997968827, 'gamma': 1.1385901155352176, 'min_child_weight': 1.038422724685363, 'reg_alpha': 0.3569195296510095, 'reg_lambda': 0.16104864285898113}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:05:55 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'eccee44e438f436eb59100faa507bd70', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:06:42 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:06:48,386] Trial 37 finished with value: 10.926325298106082 and parameters: {'max_depth': 8, 'learning_rate': 0.023135169768908333, 'n_estimators': 1073, 'subsample': 0.9414114791972179, 'colsample_bytree': 0.5642172125113805, 'gamma': 1.3871236494473245, 'min_child_weight': 1.0366230917702177, 'reg_alpha': 0.3518945671645744, 'reg_lambda': 0.15655003990024863}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:06:48 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '3e02c78fcd2642578bf791df3322d87c', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:07:17 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:07:23,186] Trial 38 finished with value: 29.901897201949247 and parameters: {'max_depth': 6, 'learning_rate': 0.031319903413048106, 'n_estimators': 929, 'subsample': 0.7861562977208958, 'colsample_bytree': 0.5362269695598301, 'gamma': 1.0856999138677597, 'min_child_weight': 1.6364017393093762, 'reg_alpha': 0.09576753799042381, 'reg_lambda': 0.27074193700811006}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:07:23 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '26e96d9a74b8436bb8fe2175022860cf', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:08:21 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:08:27,409] Trial 39 finished with value: 10.006186307583418 and parameters: {'max_depth': 9, 'learning_rate': 0.015257075715733674, 'n_estimators': 825, 'subsample': 0.7326376312518101, 'colsample_bytree': 0.5920101531009481, 'gamma': 2.0821671691646517, 'min_child_weight': 1.6334241715361169, 'reg_alpha': 0.48963278700435997, 'reg_lambda': 0.04661188405549688}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:08:27 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '1b20aca87f854ffcb8a7d7a76b060c61', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:08:44 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:08:50,269] Trial 40 finished with value: 10.63453370429075 and parameters: {'max_depth': 10, 'learning_rate': 0.0879846124270813, 'n_estimators': 1092, 'subsample': 0.9531351885747591, 'colsample_bytree': 0.57910755437531, 'gamma': 0.7786100565158638, 'min_child_weight': 4.952036125997431, 'reg_alpha': 0.3613336285387709, 'reg_lambda': 0.3043301951278135}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:08:50 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '759b31b086cf4d72bbe30dcc3a742829', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:09:56 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:10:02,371] Trial 41 finished with value: 10.3259067882768 and parameters: {'max_depth': 9, 'learning_rate': 0.012472311107037816, 'n_estimators': 1224, 'subsample': 0.907805341088646, 'colsample_bytree': 0.6198082748938648, 'gamma': 4.8152105563420635, 'min_child_weight': 2.4762923847121607, 'reg_alpha': 0.23099781767407693, 'reg_lambda': 0.1532002853424131}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:10:02 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'd3bf54f66f084f958a0b501bbbe7bca4', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:10:59 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:11:05,153] Trial 42 finished with value: 11.978094123493024 and parameters: {'max_depth': 8, 'learning_rate': 0.016646930589847002, 'n_estimators': 1152, 'subsample': 0.9069326384134401, 'colsample_bytree': 0.6421074133562764, 'gamma': 0.07492156319395349, 'min_child_weight': 3.266241075214757, 'reg_alpha': 0.29746893181760387, 'reg_lambda': 0.21667849525387442}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:11:05 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '410b21eae2034eb2b35f5705dbbf437e', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:12:16 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:12:21,490] Trial 43 finished with value: 10.48830363087157 and parameters: {'max_depth': 9, 'learning_rate': 0.012141165919774304, 'n_estimators': 1399, 'subsample': 0.6819921021282449, 'colsample_bytree': 0.6047373587571407, 'gamma': 2.1501169470198875, 'min_child_weight': 1.8972809656596934, 'reg_alpha': 0.1341907123308819, 'reg_lambda': 0.05278464679103616}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:12:22 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'cb9c4a5840264fc49937f4a4e9ffe9ba', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:13:05 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:13:11,182] Trial 44 finished with value: 12.948353707207618 and parameters: {'max_depth': 10, 'learning_rate': 0.027641314721001974, 'n_estimators': 1271, 'subsample': 0.8577815484047985, 'colsample_bytree': 0.5370364214177105, 'gamma': 3.8043711386796293, 'min_child_weight': 8.742149915049968, 'reg_alpha': 0.42406897020130696, 'reg_lambda': 0.21733791355727924}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:13:11 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'd67c70841437488598c78055132d9bea', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:14:00 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:14:06,980] Trial 45 finished with value: 18.402820397950364 and parameters: {'max_depth': 7, 'learning_rate': 0.022712178406131787, 'n_estimators': 412, 'subsample': 0.9655408478001469, 'colsample_bytree': 0.5005714332257098, 'gamma': 4.326747561834271, 'min_child_weight': 4.3004195468253315, 'reg_alpha': 0.30085495279152275, 'reg_lambda': 0.1401639526516369}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:14:07 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'c3644b7cbc86454cb1f05194c928341a', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:15:04 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:15:10,203] Trial 46 finished with value: 13.316481817850615 and parameters: {'max_depth': 8, 'learning_rate': 0.015425182502893828, 'n_estimators': 1120, 'subsample': 0.8027598579800908, 'colsample_bytree': 0.6040287344577832, 'gamma': 3.329643470857208, 'min_child_weight': 2.911613508919771, 'reg_alpha': 0.16296584616500184, 'reg_lambda': 0.07277129688016515}. Best is trial 36 with value: 8.826051693245187.
2025/07/08 07:15:10 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '92e47ce7311543239fda773e35671809', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:15:24 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:15:29,585] Trial 47 finished with value: 7.241406239515516 and parameters: {'max_depth': 10, 'learning_rate': 0.13746742522706792, 'n_estimators': 1498, 'subsample': 0.9410169544337184, 'colsample_bytree': 0.5690616004538263, 'gamma': 2.8117383494756174, 'min_child_weight': 3.4426778600704266, 'reg_alpha': 0.06966279967143818, 'reg_lambda': 0.17887385727906888}. Best is trial 47 with value: 7.241406239515516.
2025/07/08 07:15:30 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'd02d2cfc0a53480b89f1bdaeef2a852a', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:15:42 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:15:47,399] Trial 48 finished with value: 9.003522596074749 and parameters: {'max_depth': 10, 'learning_rate': 0.17469860990694802, 'n_estimators': 1489, 'subsample': 0.9365418149669783, 'colsample_bytree': 0.5591497092345725, 'gamma': 2.7349543455176843, 'min_child_weight': 5.2323876547166055, 'reg_alpha': 0.06126732953817399, 'reg_lambda': 0.22698239588828079}. Best is trial 47 with value: 7.241406239515516.
2025/07/08 07:15:47 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '2aae57b65d1941b4ace954ea86e2f937', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current xgboost workflow


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:16:00 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
[I 2025-07-08 07:16:06,428] Trial 49 finished with value: 11.59696853107651 and parameters: {'max_depth': 10, 'learning_rate': 0.15748946094851674, 'n_estimators': 1492, 'subsample': 0.9372380787558727, 'colsample_bytree': 0.5622031919346753, 'gamma': 2.7063595066366917, 'min_child_weight': 7.051166610918642, 'reg_alpha': 0.03420802548260446, 'reg_lambda': 0.330445222653825}. Best is trial 47 with value: 7.241406239515516.


## Show Plots
Next we will have a look at the optimization plots and check which of the parameters work best for our use case.

In [0]:
%matplotlib inline

In [0]:
ov.plot_optimization_history(study)

In [0]:
ov.plot_param_importances(study)

In [0]:
ov.plot_parallel_coordinate(study)

## Retrieve Optimal Parameters
To retrieve the best parameters for our final model you can run the following code and check the output.

In [0]:
print("Number of finished trials:", len(study.trials))
print("Best MSE:", study.best_value)
print("Best params:")
for k, v in study.best_params.items():
    print(f"  {k}: {v}")

Number of finished trials: 50
Best MSE: 7.241406239515516
Best params:
  max_depth: 10
  learning_rate: 0.13746742522706792
  n_estimators: 1498
  subsample: 0.9410169544337184
  colsample_bytree: 0.5690616004538263
  gamma: 2.8117383494756174
  min_child_weight: 3.4426778600704266
  reg_alpha: 0.06966279967143818
  reg_lambda: 0.17887385727906888


## Fit Model
Now we will fit the model using the preferred parameters and finally register the model. We will set the alias `prod` for the registered model.

In [0]:
# Convert datetime column to numerical format
train_data['__TIMESTAMP'] = train_data['__TIMESTAMP'].astype('int64')
test_data['__TIMESTAMP'] = test_data['__TIMESTAMP'].astype('int64')

# Ensure categorical columns are properly encoded
categorical_columns = train_data.select_dtypes(include=['category']).columns
train_data[categorical_columns] = train_data[categorical_columns].apply(lambda x: x.cat.codes)
test_data[categorical_columns] = test_data[categorical_columns].apply(lambda x: x.cat.codes)

with mlflow.start_run(run_name="Delay_Prediction_Training") as run:
    final_xgbmodel = XGBRegressor(
        **study.best_params,
        enable_categorical=True,
        n_jobs=-1
    )
    final_xgbmodel.fit(
        train_data,
        train_target,
        verbose=False
    )
    predictions = final_xgbmodel.predict(test_data)
    mse = mean_squared_error(test_target, predictions)
    r2_metric = r2_score(test_target, predictions)
    mape_score = mean_absolute_percentage_error(test_target, predictions)



Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:21:58 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


In [0]:
registered_model = mlflow.register_model(f"runs:/{run.info.run_id}/model", "delay_prediction")

Registered model 'delay_prediction' already exists. Creating a new version of this model...


Downloading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:23:41 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false


Uploading artifacts:   0%|          | 0/11 [00:00<?, ?it/s]

2025/07/08 07:23:43 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false
Created version '2' of model 'uc_delayed_payment.grp1.delay_prediction'.


Please set the alias for the model with `prod` by replacing the value `<ALIAS>` accordingly. 

In [0]:
mlflow_client = mlflow.MlflowClient()
mlflow_client.set_registered_model_alias(name=registered_model.name, alias="prod", version=registered_model.version)