# AutoML Training on Semantic Model

###### GOAL: Train and register an AutoML regression model that predicts delivery_days_actual for shipments using the semantic model tables: shipments, carriers, warehouses.

### ⭐ 1. Imports

- pandas: dataframe manipulation
- mlflow: experiment tracking and model registry
- AutoML (FLAML): automatic model selection / tuning
- sklearn: train/test split + evaluation metrics
- sempy.fabric: read tables from Power BI semantic mode

In [17]:
import pandas as pd
import mlflow
from flaml import AutoML
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

import sempy.fabric as fabric  # read from semantic model


StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 19, Finished, Available, Finished)

### ⭐ 2. Load data from your semantic model

Update DATASET_NAME if your semantic model has a different name. Table names should match what you created in the semantic model.

 We pull the three core tables:
- shipments: fact table with delivery_days_actual + dates
- carriers: carrier-level attributes (speed_factor, name)
- warehouses: warehouse origin region

In [18]:
DATASET = "delivery semantic model"

shipments = fabric.read_table(DATASET, "shipments")
carriers = fabric.read_table(DATASET, "carriers")
warehouses = fabric.read_table(DATASET, "warehouses")


StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 20, Finished, Available, Finished)

### ⭐ 3. Join the tables (simple, clean enrichment)

We enrich shipments with carrier and warehouse info.
After the second merge, both shipments and warehouses contain an **'origin_region'** column, so pandas will create **origin_region_x **(from shipments) and **origin_region_y** (from warehouses).

 We want a single origin_region to feed the model.
- We take origin_region from the warehouse (origin of shipment)
- Then drop the intermediate _x / _y columns if they exist

In [19]:
df = (
    shipments
    .merge(carriers, on="carrier_id", how="left")
    .merge(warehouses, on="warehouse_id", how="left")
)

# Fix duplicate region columns
if "origin_region_y" in df.columns:
    df["origin_region"] = df["origin_region_y"]
elif "origin_region_x" in df.columns:
    df["origin_region"] = df["origin_region_x"]

# Drop the duplicates if they exist
df = df.drop(columns=[c for c in ["origin_region_x", "origin_region_y"] if c in df.columns])

StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 21, Finished, Available, Finished)

### ⭐ 4. Prepare features + target

Define features (X) and target (y)

Features are:
   - **carrier_id, warehouse_id:** which carrier/warehouse handled shipment
   - **origin_region, destination_region**: geography / distance proxy
   - **distance_band**: Short/Medium/Long
   - **service_level**: Ground/Fast
   - **order_to_ship_days**: warehouse handling time
   - **ship_dayofweek**, ship_month: seasonality / weekday patterns

In [20]:
# Target (what we predict)
target_col = "delivery_days_actual"

# Simple feature list
feature_cols = [
    "carrier_id",
    "warehouse_id",
    "origin_region",
    "destination_region",
    "distance_band",
    "service_level",
    "order_to_ship_days",
    "ship_dayofweek",
    "ship_month",
]

X = df[feature_cols]
y = df[target_col]


StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 22, Finished, Available, Finished)

### ⭐ 5. Train/Test Split

We hold out 20% of the data as a test set to evaluate how well the model generalizes to unseen shipments.

In [21]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)


StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 23, Finished, Available, Finished)

### ⭐ 6. AutoML Training (FLAML)

FLAML will search for a good regression model (e.g., LightGBM,RandomForest, linear, etc.) within a time budget.
- We optimize for MAE (Mean Absolute Error).

FLAML tries different model types + hyperparameters, and stores the best performing fitted model in **automl.model.**

In [26]:
automl = AutoML()

settings = {
    "time_budget": 180,  # 3 minutes
    "task": "regression",
    "metric": "mae",
    "estimator_list": [
        "rf",        # RandomForestRegressor
        "xgboost",   # XGBoostRegressor
        "extra_tree" # ExtraTreesRegressor
    ],
    "log_file_name": "automl_safe.log",
}

automl.fit(X_train=X_train, y_train=y_train, **settings)

print("Best model:", automl.best_estimator)
print("Best config:", automl.best_config)
print("Best MAE:", automl.best_loss)

StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 28, Finished, Available, Finished)

[flaml.automl.logger: 11-13 07:08:08] {1863} INFO - task = regression
[flaml.automl.logger: 11-13 07:08:08] {1874} INFO - Evaluation method: cv
[flaml.automl.logger: 11-13 07:08:08] {1973} INFO - Minimizing error metric: mae
[flaml.automl.logger: 11-13 07:08:08] {2091} INFO - List of ML learners in AutoML Run: ['rf', 'xgboost', 'extra_tree']
[flaml.automl.logger: 11-13 07:08:08] {2404} INFO - iteration 0, current learner rf
[flaml.automl.logger: 11-13 07:08:08] {2539} INFO - Estimated sufficient time budget=1589s. Estimated necessary time budget=2s.


[flaml.automl.logger: 11-13 07:08:10] {2590} INFO -  at 1.1s,	estimator rf's best error=0.7982,	best estimator rf's best error=0.7982
[flaml.automl.logger: 11-13 07:08:10] {2404} INFO - iteration 1, current learner xgboost


[flaml.automl.logger: 11-13 07:08:22] {2590} INFO -  at 2.8s,	estimator xgboost's best error=1.0131,	best estimator rf's best error=0.7982
[flaml.automl.logger: 11-13 07:08:22] {2404} INFO - iteration 2, current learner extra_tree


[flaml.automl.logger: 11-13 07:08:24] {2590} INFO -  at 15.2s,	estimator extra_tree's best error=0.7670,	best estimator extra_tree's best error=0.7670
[flaml.automl.logger: 11-13 07:08:24] {2404} INFO - iteration 3, current learner xgboost


[flaml.automl.logger: 11-13 07:08:36] {2590} INFO -  at 16.5s,	estimator xgboost's best error=1.0131,	best estimator extra_tree's best error=0.7670
[flaml.automl.logger: 11-13 07:08:36] {2404} INFO - iteration 4, current learner extra_tree


[flaml.automl.logger: 11-13 07:08:37] {2590} INFO -  at 28.6s,	estimator extra_tree's best error=0.4168,	best estimator extra_tree's best error=0.4168
[flaml.automl.logger: 11-13 07:08:37] {2404} INFO - iteration 5, current learner rf


[flaml.automl.logger: 11-13 07:08:50] {2590} INFO -  at 29.9s,	estimator rf's best error=0.4292,	best estimator extra_tree's best error=0.4168
[flaml.automl.logger: 11-13 07:08:50] {2404} INFO - iteration 6, current learner rf


[flaml.automl.logger: 11-13 07:08:51] {2590} INFO -  at 42.9s,	estimator rf's best error=0.4292,	best estimator extra_tree's best error=0.4168
[flaml.automl.logger: 11-13 07:08:51] {2404} INFO - iteration 7, current learner rf


[flaml.automl.logger: 11-13 07:08:52] {2590} INFO -  at 44.3s,	estimator rf's best error=0.3987,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:08:52] {2404} INFO - iteration 8, current learner rf


[flaml.automl.logger: 11-13 07:09:04] {2590} INFO -  at 45.3s,	estimator rf's best error=0.3987,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:04] {2404} INFO - iteration 9, current learner rf


[flaml.automl.logger: 11-13 07:09:05] {2590} INFO -  at 57.2s,	estimator rf's best error=0.3987,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:05] {2404} INFO - iteration 10, current learner xgboost


[flaml.automl.logger: 11-13 07:09:06] {2590} INFO -  at 58.0s,	estimator xgboost's best error=0.7062,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:06] {2404} INFO - iteration 11, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:07] {2590} INFO -  at 58.9s,	estimator extra_tree's best error=0.4168,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:07] {2404} INFO - iteration 12, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:08] {2590} INFO -  at 60.0s,	estimator extra_tree's best error=0.3990,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:08] {2404} INFO - iteration 13, current learner xgboost


[flaml.automl.logger: 11-13 07:09:09] {2590} INFO -  at 60.8s,	estimator xgboost's best error=0.4731,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:09] {2404} INFO - iteration 14, current learner xgboost


[flaml.automl.logger: 11-13 07:09:10] {2590} INFO -  at 61.6s,	estimator xgboost's best error=0.4731,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:10] {2404} INFO - iteration 15, current learner xgboost


[flaml.automl.logger: 11-13 07:09:11] {2590} INFO -  at 62.5s,	estimator xgboost's best error=0.4731,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:11] {2404} INFO - iteration 16, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:12] {2590} INFO -  at 63.7s,	estimator extra_tree's best error=0.3990,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:12] {2404} INFO - iteration 17, current learner rf


[flaml.automl.logger: 11-13 07:09:12] {2590} INFO -  at 64.5s,	estimator rf's best error=0.3987,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:12] {2404} INFO - iteration 18, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:14] {2590} INFO -  at 65.4s,	estimator extra_tree's best error=0.3990,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:14] {2404} INFO - iteration 19, current learner xgboost


[flaml.automl.logger: 11-13 07:09:15] {2590} INFO -  at 66.5s,	estimator xgboost's best error=0.4511,	best estimator rf's best error=0.3987
[flaml.automl.logger: 11-13 07:09:15] {2404} INFO - iteration 20, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:15] {2590} INFO -  at 67.6s,	estimator extra_tree's best error=0.3983,	best estimator extra_tree's best error=0.3983
[flaml.automl.logger: 11-13 07:09:15] {2404} INFO - iteration 21, current learner rf


[flaml.automl.logger: 11-13 07:09:28] {2590} INFO -  at 68.5s,	estimator rf's best error=0.3987,	best estimator extra_tree's best error=0.3983
[flaml.automl.logger: 11-13 07:09:28] {2404} INFO - iteration 22, current learner rf


[flaml.automl.logger: 11-13 07:09:28] {2590} INFO -  at 80.6s,	estimator rf's best error=0.3976,	best estimator rf's best error=0.3976
[flaml.automl.logger: 11-13 07:09:28] {2404} INFO - iteration 23, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:41] {2590} INFO -  at 81.6s,	estimator extra_tree's best error=0.3979,	best estimator rf's best error=0.3976
[flaml.automl.logger: 11-13 07:09:41] {2404} INFO - iteration 24, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:42] {2590} INFO -  at 94.1s,	estimator extra_tree's best error=0.3979,	best estimator rf's best error=0.3976
[flaml.automl.logger: 11-13 07:09:42] {2404} INFO - iteration 25, current learner xgboost


[flaml.automl.logger: 11-13 07:09:43] {2590} INFO -  at 94.9s,	estimator xgboost's best error=0.4235,	best estimator rf's best error=0.3976
[flaml.automl.logger: 11-13 07:09:43] {2404} INFO - iteration 26, current learner xgboost


[flaml.automl.logger: 11-13 07:09:44] {2590} INFO -  at 95.8s,	estimator xgboost's best error=0.3998,	best estimator rf's best error=0.3976
[flaml.automl.logger: 11-13 07:09:44] {2404} INFO - iteration 27, current learner extra_tree


[flaml.automl.logger: 11-13 07:09:45] {2590} INFO -  at 96.7s,	estimator extra_tree's best error=0.3979,	best estimator rf's best error=0.3976
[flaml.automl.logger: 11-13 07:09:45] {2404} INFO - iteration 28, current learner xgboost


[flaml.automl.logger: 11-13 07:09:46] {2590} INFO -  at 97.7s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:09:46] {2404} INFO - iteration 29, current learner xgboost


[flaml.automl.logger: 11-13 07:09:59] {2590} INFO -  at 98.7s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:09:59] {2404} INFO - iteration 30, current learner xgboost


[flaml.automl.logger: 11-13 07:10:00] {2590} INFO -  at 111.7s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:00] {2404} INFO - iteration 31, current learner xgboost


[flaml.automl.logger: 11-13 07:10:01] {2590} INFO -  at 112.8s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:01] {2404} INFO - iteration 32, current learner xgboost


[flaml.automl.logger: 11-13 07:10:02] {2590} INFO -  at 113.7s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:02] {2404} INFO - iteration 33, current learner xgboost


[flaml.automl.logger: 11-13 07:10:03] {2590} INFO -  at 114.8s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:03] {2404} INFO - iteration 34, current learner xgboost


[flaml.automl.logger: 11-13 07:10:04] {2590} INFO -  at 115.8s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:04] {2404} INFO - iteration 35, current learner extra_tree


[flaml.automl.logger: 11-13 07:10:05] {2590} INFO -  at 116.9s,	estimator extra_tree's best error=0.3979,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:05] {2404} INFO - iteration 36, current learner xgboost


[flaml.automl.logger: 11-13 07:10:06] {2590} INFO -  at 117.8s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:06] {2404} INFO - iteration 37, current learner xgboost


[flaml.automl.logger: 11-13 07:10:07] {2590} INFO -  at 118.6s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:07] {2404} INFO - iteration 38, current learner xgboost


[flaml.automl.logger: 11-13 07:10:07] {2590} INFO -  at 119.5s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:07] {2404} INFO - iteration 39, current learner xgboost


[flaml.automl.logger: 11-13 07:10:09] {2590} INFO -  at 120.3s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:09] {2404} INFO - iteration 40, current learner xgboost


[flaml.automl.logger: 11-13 07:10:10] {2590} INFO -  at 121.7s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:10] {2404} INFO - iteration 41, current learner rf


[flaml.automl.logger: 11-13 07:10:10] {2590} INFO -  at 122.6s,	estimator rf's best error=0.3976,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:10] {2404} INFO - iteration 42, current learner xgboost


[flaml.automl.logger: 11-13 07:10:11] {2590} INFO -  at 123.4s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:11] {2404} INFO - iteration 43, current learner extra_tree


[flaml.automl.logger: 11-13 07:10:12] {2590} INFO -  at 124.3s,	estimator extra_tree's best error=0.3979,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:12] {2404} INFO - iteration 44, current learner xgboost


[flaml.automl.logger: 11-13 07:10:13] {2590} INFO -  at 125.1s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:13] {2404} INFO - iteration 45, current learner xgboost


[flaml.automl.logger: 11-13 07:10:14] {2590} INFO -  at 125.9s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:14] {2404} INFO - iteration 46, current learner xgboost


[flaml.automl.logger: 11-13 07:10:15] {2590} INFO -  at 126.8s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:15] {2404} INFO - iteration 47, current learner rf


[flaml.automl.logger: 11-13 07:10:16] {2590} INFO -  at 127.9s,	estimator rf's best error=0.3976,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:16] {2404} INFO - iteration 48, current learner xgboost


[flaml.automl.logger: 11-13 07:10:17] {2590} INFO -  at 128.8s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:17] {2404} INFO - iteration 49, current learner xgboost


[flaml.automl.logger: 11-13 07:10:18] {2590} INFO -  at 129.7s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:18] {2404} INFO - iteration 50, current learner xgboost


[flaml.automl.logger: 11-13 07:10:18] {2590} INFO -  at 130.6s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:18] {2404} INFO - iteration 51, current learner xgboost


[flaml.automl.logger: 11-13 07:10:19] {2590} INFO -  at 131.4s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:19] {2404} INFO - iteration 52, current learner rf


[flaml.automl.logger: 11-13 07:10:20] {2590} INFO -  at 132.3s,	estimator rf's best error=0.3976,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:20] {2404} INFO - iteration 53, current learner rf


[flaml.automl.logger: 11-13 07:10:22] {2590} INFO -  at 133.4s,	estimator rf's best error=0.3955,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:22] {2404} INFO - iteration 54, current learner xgboost


[flaml.automl.logger: 11-13 07:10:23] {2590} INFO -  at 134.5s,	estimator xgboost's best error=0.3949,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:23] {2404} INFO - iteration 55, current learner rf


[flaml.automl.logger: 11-13 07:10:24] {2590} INFO -  at 135.7s,	estimator rf's best error=0.3955,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:24] {2404} INFO - iteration 56, current learner rf


[flaml.automl.logger: 11-13 07:10:25] {2590} INFO -  at 137.2s,	estimator rf's best error=0.3955,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:25] {2404} INFO - iteration 57, current learner rf


[flaml.automl.logger: 11-13 07:10:26] {2590} INFO -  at 138.1s,	estimator rf's best error=0.3955,	best estimator xgboost's best error=0.3949
[flaml.automl.logger: 11-13 07:10:26] {2404} INFO - iteration 58, current learner rf


[flaml.automl.logger: 11-13 07:10:28] {2590} INFO -  at 139.8s,	estimator rf's best error=0.3939,	best estimator rf's best error=0.3939
[flaml.automl.logger: 11-13 07:10:28] {2404} INFO - iteration 59, current learner xgboost


[flaml.automl.logger: 11-13 07:10:40] {2590} INFO -  at 140.7s,	estimator xgboost's best error=0.3949,	best estimator rf's best error=0.3939
[flaml.automl.logger: 11-13 07:10:40] {2404} INFO - iteration 60, current learner rf


[flaml.automl.logger: 11-13 07:10:43] {2590} INFO -  at 154.9s,	estimator rf's best error=0.3939,	best estimator rf's best error=0.3939
[flaml.automl.logger: 11-13 07:10:43] {2404} INFO - iteration 61, current learner xgboost


[flaml.automl.logger: 11-13 07:10:44] {2590} INFO -  at 155.8s,	estimator xgboost's best error=0.3949,	best estimator rf's best error=0.3939
[flaml.automl.logger: 11-13 07:10:44] {2404} INFO - iteration 62, current learner xgboost


[flaml.automl.logger: 11-13 07:10:45] {2590} INFO -  at 156.6s,	estimator xgboost's best error=0.3949,	best estimator rf's best error=0.3939
[flaml.automl.logger: 11-13 07:10:45] {2404} INFO - iteration 63, current learner rf


[flaml.automl.logger: 11-13 07:10:46] {2590} INFO -  at 157.9s,	estimator rf's best error=0.3938,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:10:46] {2404} INFO - iteration 64, current learner rf


[flaml.automl.logger: 11-13 07:10:58] {2590} INFO -  at 160.0s,	estimator rf's best error=0.3938,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:10:58] {2404} INFO - iteration 65, current learner xgboost


[flaml.automl.logger: 11-13 07:10:59] {2590} INFO -  at 171.2s,	estimator xgboost's best error=0.3949,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:10:59] {2404} INFO - iteration 66, current learner rf


[flaml.automl.logger: 11-13 07:11:01] {2590} INFO -  at 172.3s,	estimator rf's best error=0.3938,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:11:01] {2404} INFO - iteration 67, current learner rf


[flaml.automl.logger: 11-13 07:11:02] {2590} INFO -  at 173.6s,	estimator rf's best error=0.3938,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:11:02] {2404} INFO - iteration 68, current learner rf


[flaml.automl.logger: 11-13 07:11:03] {2590} INFO -  at 175.3s,	estimator rf's best error=0.3938,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:11:03] {2404} INFO - iteration 69, current learner xgboost


[flaml.automl.logger: 11-13 07:11:04] {2590} INFO -  at 176.2s,	estimator xgboost's best error=0.3949,	best estimator rf's best error=0.3938
[flaml.automl.logger: 11-13 07:11:04] {2404} INFO - iteration 70, current learner rf


[flaml.automl.logger: 11-13 07:11:05] {2590} INFO -  at 177.3s,	estimator rf's best error=0.3929,	best estimator rf's best error=0.3929
[flaml.automl.logger: 11-13 07:11:05] {2404} INFO - iteration 71, current learner xgboost


[flaml.automl.logger: 11-13 07:11:18] {2590} INFO -  at 178.4s,	estimator xgboost's best error=0.3949,	best estimator rf's best error=0.3929
[flaml.automl.logger: 11-13 07:11:18] {2848} INFO - retrain rf for 0.1s
[flaml.automl.logger: 11-13 07:11:18] {2851} INFO - retrained model: RandomForestRegressor(max_leaf_nodes=37, n_estimators=49, n_jobs=-1,
                      random_state=12032022)
[flaml.automl.logger: 11-13 07:11:18] {2852} INFO - Auto Feature Engineering pipeline: None
[flaml.automl.logger: 11-13 07:11:18] {2854} INFO - Best MLflow run name: placid_lemon_fnnyrt8v
[flaml.automl.logger: 11-13 07:11:18] {2855} INFO - Best MLflow run id: 02eaba5d-e4ab-4837-a077-bb310172536c
[flaml.automl.logger: 11-13 07:11:18] {2127} INFO - fit succeeded
[flaml.automl.logger: 11-13 07:11:18] {2128} INFO - Time taken to find the best model: 177.3381314277649
Best model: rf
Best config: {'n_estimators': 49, 'max_features': 1.0, 'max_leaves': 37}
Best MAE: 0.39292432505849056


### ⭐ 7. Evaluate the model performance on the test set

We compute:
- MAE: average absolute error in days
- R² : percentage of variance explained

In [27]:
preds = automl.predict(X_test)

mae = mean_absolute_error(y_test, preds)
r2 = r2_score(y_test, preds)

print("MAE:", mae)
print("R²:", r2)


StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 29, Finished, Available, Finished)

MAE: 0.3851305644354279
R²: 0.9068991100362601


### ⭐ 8. Register the model in MLflow

Log and register the best model in MLflow

FLAML's best fitted model is available at automl.model.
- This is usually a standard sklearn-compatible model (e.g., LightGBMRegressor, RandomForestRegressor).
- Register the model so it can be used by other notebooks / pipelines

We:
1) Start an MLflow run
2) Log MAE and R²
3) Log the sklearn model artifact
4) Register the model as "POC-DeliveryTimeModel"

In [33]:
import mlflow
from mlflow.tracking import MlflowClient

client = MlflowClient()

with mlflow.start_run() as run:
    
    # Log training metrics
    mlflow.log_metric("mae", mae)
    mlflow.log_metric("r2", r2)

    # Log the actual sklearn model (this works reliably)
    mlflow.sklearn.log_model(
        sk_model=best_model,
        artifact_path="model"
    )

    model_uri = f"runs:/{run.info.run_id}/model"

# Now register the model properly
registered = mlflow.register_model(
    model_uri=model_uri,
    name="POC-DeliveryTimeModel-AutoML-Safe"
)

print("Model registered.")
print("Version:", registered.version)

StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 35, Finished, Available, Finished)

Registered model 'POC-DeliveryTimeModel-AutoML-Safe' already exists. Creating a new version of this model...
Created version '2' of model 'POC-DeliveryTimeModel-AutoML-Safe'.


In [34]:
model_uri = "models:/POC-DeliveryTimeModel-AutoML-Safe/2"
model = mlflow.sklearn.load_model(model_uri)

print("Model loaded:", type(model))


StatementMeta(, 434922ae-e4bc-4c54-95e4-2c15ec72aa80, 36, Finished, Available, Finished)

Model loaded: <class 'flaml.automl.model.RandomForestEstimator'>
