# POC Delivery Time Prediction â€” Scoring Notebook

###### This notebook performs inference on delivery shipment data using a previously trained AutoML model stored in the Fabric MLflow Model Registry.
###### 
###### It:
###### 
###### 1.  Loads the trained AutoML model
###### 
###### 2.  Loads raw shipment input data from the Lakehouse
###### 
###### 3.  Applies the same preprocessing the model expects
###### 
###### 4.  Ensures features are in the correct order
###### 
###### 5.  Generates delivery-time predictions
###### 
###### 6.  Writes the results to a Lakehouse table (shipment_predictions)


### ðŸŸ¦ 1. Setup

In [13]:
import pandas as pd
import mlflow
import sempy.fabric as fabric

StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 15, Finished, Available, Finished)

### ðŸŸ¦ 2. Configuration

In [14]:
# Semantic model name
DATASET_NAME = "delivery semantic model"

# Tables from Fabric semantic model
SHIPMENTS_TABLE = "shipments"
CARRIERS_TABLE = "carriers"
WAREHOUSES_TABLE = "warehouses"

# Registered AutoML model
MODEL_NAME = "POC-DeliveryTimeModel-AutoML-Safe"
MODEL_VERSION = "2"   # Update if newer versions are created

MODEL_URI = f"models:/{MODEL_NAME}/{MODEL_VERSION}"
print("Using model:", MODEL_URI)

StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 16, Finished, Available, Finished)

Using model: models:/POC-DeliveryTimeModel-AutoML-Safe/2


### ðŸŸ¦ 3. Load the AutoML Model

Loads the AutoML model exactly as it was trained, with preprocessing and encoders baked in.
Since AutoML-Safe saves model.model as the underlying estimator, we load that.

In [15]:
# Load AutoML model (RandomForestEstimator in your case)
model = mlflow.sklearn.load_model(MODEL_URI)

print("Loaded model:", type(model))

StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 17, Finished, Available, Finished)

Loaded model: <class 'flaml.automl.model.RandomForestEstimator'>


### ðŸŸ¦ 4. Load Semantic Model Tables

Reads the three Lakehouse tables: shipments, carriers, warehouses
Then joins to reconstruct the same columns used during training.

In [16]:
shipments = fabric.read_table(DATASET_NAME, SHIPMENTS_TABLE)
carriers = fabric.read_table(DATASET_NAME, CARRIERS_TABLE)
warehouses = fabric.read_table(DATASET_NAME, WAREHOUSES_TABLE)

print("Shipments:", shipments.shape)
print("Carriers :", carriers.shape)
print("Warehouses:", warehouses.shape)


StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 18, Finished, Available, Finished)

Shipments: (5000, 14)
Carriers : (3, 3)
Warehouses: (4, 3)


### ðŸŸ¦ 5. Join Tables Into a Single Scoring Dataset

- The model was trained on a joined dataset.
Inference requires the same join logic.
- Depending on training; merge may produce origin_region_x & origin_region_y

In [19]:
df = (
    shipments
    .merge(carriers, on="carrier_id", how="left")
    .merge(warehouses, on="warehouse_id", how="left")
)

#Clean Column Names from Join Collisions
#Training sometimes created origin_region_x / y depending on merge order

if "origin_region_y" in df.columns:
    df["origin_region"] = df["origin_region_y"]
elif "origin_region_x" in df.columns:
    df["origin_region"] = df["origin_region_x"]

    df = df.drop(columns=[c for c in ["origin_region_x", "origin_region_y"] if c in df.columns])


StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 21, Finished, Available, Finished)

### ðŸŸ¦ 6. Infer Feature Order From the AutoML Model

In [20]:
trained_feature_order = list(model.model.feature_names_in_)
print("Model expects features in this order:")
print(trained_feature_order)

X_score = df[trained_feature_order].copy()
X_score.head()

#Converge Categorical Variables -> Numeric Codes
categorical_cols = [
    'origin_region',
    'destination_region',
    'distance_band',
    'service_level'
]

for col in categorical_cols:
    X_score[col] = X_score[col].astype("category").cat.codes

StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 22, Finished, Available, Finished)

Model expects features in this order:
['origin_region', 'destination_region', 'distance_band', 'service_level', 'carrier_id', 'warehouse_id', 'order_to_ship_days', 'ship_dayofweek', 'ship_month']


### ðŸŸ¦ 7. Run Predictions

In [21]:
df["predicted_delivery_days"] = model.model.predict(X_score)

df[["shipment_id", "predicted_delivery_days"]].head()


StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 23, Finished, Available, Finished)

Unnamed: 0,shipment_id,predicted_delivery_days
0,2501,3.095927
1,2502,0.922295
2,2503,1.104245
3,2504,3.075344
4,2505,2.415477


### ðŸŸ¦ 8. Write Predictions Back Into a Fabric Table

Recommended for POC

Keeps raw and predicted data together

Allows Power BI or semantic model to consume results easily

Avoids modifying the semantic model directly

In [22]:
predictions_pdf = df[["shipment_id", "predicted_delivery_days"]].copy()
predictions_spark = spark.createDataFrame(predictions_pdf)

predictions_spark.write.mode("overwrite").saveAsTable("shipment_predictions")

print("âœ… Wrote predictions to Lakehouse table: shipment_predictions")

StatementMeta(, dad233ac-c863-4768-b83c-d9c2ab16a2d0, 24, Finished, Available, Finished)

âœ… Wrote predictions to Lakehouse table: shipment_predictions
