**Goal:**
Automatically triage late-shipment risk and take the right next action with guardrails:
1. Detect risk (ETA deviation)
2. Classify severity (rules + ML)
3. Decide action (policy)
4. Execute (simulated connectors: TMS/Email/Slack)
5. Learn (feedback loop from outcomes) (store outcomes → retrain weekly)














**1. Problem Identification → Requirements**

  A. Operational pain you’re solving

    1. Late shipments create manual triage:
    2. Hundreds of alerts
    3. Low signal-to-noise
    4. Slow escalation
    5. Missed recovery options (rebook, expedite, appointment change)

  B. The agent’s “job story”

     When a shipment is at risk of missing delivery,
     the agent should detect it early, classify urgency,
     propose and/or execute the best recovery action,
     and learn which actions actually work.

  C. Inputs your agent needs (minimal viable)

    1. Scheduled ETA, latest predicted ETA, last known event time/location
    2. Carrier reliability / lane baseline variability (optional)
    3. Customer priority / service level
    4. Time buffers (appointment windows, SLA)
    5. Available actions: notify / expedite / rebook / escalate

  D. Outputs

    1. risk_score (0–1)
    2. severity (low/med/high/critical)
    3. recommended_action
    4. confidence
    5. explanation (why this decision)

In [None]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta

np.random.seed(42)

N = 4000

customers = ["Walmart", "CardinalHealth", "Target", "AcmeCPG", "MedSupplyCo"]
lanes = ["CHI->DAL", "CHI->ATL", "LAX->PHX", "NYC->BOS", "SEA->PDX", "DFW->HOU"]
carriers = ["CarrierA", "CarrierB", "CarrierC", "CarrierD"]
event_types = ["pickup", "in_transit", "delay_notice", "arrived_hub", "out_for_delivery"]
service_levels = ["standard", "expedite"]

base_start = datetime(2025, 1, 1)

rows = []
for i in range(N):
    shipment_id = f"S{i:06d}"
    customer = np.random.choice(customers, p=[0.25, 0.18, 0.18, 0.22, 0.17]) # Probability is based on several factors like Cust demand share, market share, traffic routing etc.,
    lane = np.random.choice(lanes)
    carrier = np.random.choice(carriers)

    priority = np.random.choice([1,2,3,4,5], p=[0.10,0.20,0.35,0.25,0.10])
    service = np.random.choice(service_levels, p=[0.85, 0.15])

    # reliability: some carriers better than others
    carrier_base = {"CarrierA":0.92, "CarrierB":0.86, "CarrierC":0.80, "CarrierD":0.88}[carrier]
    carrier_reliability = np.clip(np.random.normal(carrier_base, 0.05), 0.6, 0.98)

    weather_risk = np.random.binomial(1, 0.12)

    created_ts = base_start + timedelta(days=int(np.random.uniform(0, 365)))
    planned_transit_hours = np.random.uniform(8, 72) * (0.9 if service=="expedite" else 1.0)

    planned_eta_ts = created_ts + timedelta(hours=float(planned_transit_hours))
    promised_buffer_hours = np.random.uniform(1, 18) + (6 if priority >= 4 else 0)
    promised_delivery_ts = planned_eta_ts + timedelta(hours=float(promised_buffer_hours))

    # current predicted ETA deviates due to risk factors ( Weather impact, carrier quality, service level(Std vs expidite), random disruption etc., )
    deviation_hours = np.random.normal(0, 3)
    deviation_hours += (8 * weather_risk)
    deviation_hours += (6 * (1 - carrier_reliability))
    deviation_hours += (4 if service=="standard" else -2)
    deviation_hours += (3 if np.random.rand() < 0.12 else 0)  # random incident

    current_predicted_eta_ts = planned_eta_ts + timedelta(hours=float(deviation_hours))

    last_event_ts = created_ts + timedelta(hours=float(np.random.uniform(1, planned_transit_hours)))
    last_event_type = np.random.choice(event_types, p=[0.10,0.55,0.08,0.18,0.09])

    distance_to_destination_km = np.clip(np.random.normal(600, 250), 30, 1800)

    # Actual delivery depends on predicted + noise; add mitigation sometimes
    mitigation = (np.random.rand() < 0.18)  # sometimes ops took action
    mitigation_hours = -np.random.uniform(1, 6) if mitigation else 0

    actual_eta = current_predicted_eta_ts + timedelta(hours=float(np.random.normal(0, 2) + mitigation_hours))
    actual_delivery_ts = actual_eta

    was_late = int(actual_delivery_ts > promised_delivery_ts)

    rows.append({
        "shipment_id": shipment_id,
        "customer": customer,
        "lane": lane,
        "carrier": carrier,
        "service_level": service,
        "priority": priority,
        "carrier_reliability": float(carrier_reliability),
        "weather_risk": int(weather_risk),
        "created_ts": created_ts,
        "planned_eta_ts": planned_eta_ts,
        "promised_delivery_ts": promised_delivery_ts,
        "current_predicted_eta_ts": current_predicted_eta_ts,
        "last_event_ts": last_event_ts,
        "last_event_type": last_event_type,
        "distance_to_destination_km": float(distance_to_destination_km),
        "actual_delivery_ts": actual_delivery_ts,
        "was_late": was_late
    })

df = pd.DataFrame(rows)

# Derived features
df["eta_deviation_hours"] = (df["current_predicted_eta_ts"] - df["planned_eta_ts"]).dt.total_seconds()/3600
df["time_to_promised_hours"] = (df["promised_delivery_ts"] - df["last_event_ts"]).dt.total_seconds()/3600
df["predicted_late_hours"] = (df["current_predicted_eta_ts"] - df["promised_delivery_ts"]).dt.total_seconds()/3600

df.head()

Unnamed: 0,shipment_id,customer,lane,carrier,service_level,priority,carrier_reliability,weather_risk,created_ts,planned_eta_ts,promised_delivery_ts,current_predicted_eta_ts,last_event_ts,last_event_type,distance_to_destination_km,actual_delivery_ts,was_late,eta_deviation_hours,time_to_promised_hours,predicted_late_hours
0,S000000,CardinalHealth,SEA->PDX,CarrierC,standard,4,0.788292,0,2025-11-13,2025-11-14 22:28:16.898706,2025-11-15 17:30:30.940467,2025-11-15 06:02:21.105216,2025-11-14 21:06:11.234823,arrived_hub,454.780466,2025-11-15 04:59:19.882604,0,7.567835,20.405474,-11.469399
1,S000001,CardinalHealth,DFW->HOU,CarrierA,standard,3,0.869358,0,2025-05-14,2025-05-15 13:11:18.524364,2025-05-16 03:32:11.293201,2025-05-15 18:54:54.253166,2025-05-14 19:36:33.703813,in_transit,630.554791,2025-05-15 16:33:32.187725,0,5.726591,31.927108,-8.6214
2,S000002,MedSupplyCo,NYC->BOS,CarrierB,standard,4,0.874552,0,2025-02-14,2025-02-15 15:41:28.760090,2025-02-15 23:16:33.337582,2025-02-15 18:32:14.398174,2025-02-14 11:00:45.142773,delay_notice,652.215899,2025-02-15 14:37:04.773282,0,2.846011,36.263387,-4.738594
3,S000003,Walmart,CHI->ATL,CarrierB,standard,3,0.868635,0,2025-07-28,2025-07-29 17:20:00.213510,2025-07-30 10:40:23.941400,2025-07-29 23:05:39.050997,2025-07-29 07:08:31.039552,in_transit,715.204075,2025-07-29 23:15:57.699993,0,5.760788,27.531362,-11.579136
4,S000004,CardinalHealth,SEA->PDX,CarrierC,standard,2,0.706216,0,2025-01-06,2025-01-06 20:43:33.289902,2025-01-07 09:49:07.417410,2025-01-06 22:23:17.773716,2025-01-06 12:57:11.211358,out_for_delivery,765.571128,2025-01-06 22:52:13.853287,0,1.662357,20.865613,-11.430457


In [None]:
#Save Data

df.to_csv("sample_shipments.csv", index=False)
print("Wrote sample_shipments.csv with rows:", len(df))

Wrote sample_shipments.csv with rows: 4000


In [None]:
#Baseline rule-based risk + severity
def severity_from_rules(row):
    late_hours = row["predicted_late_hours"]
    dev = row["eta_deviation_hours"]
    priority = row["priority"]
    weather = row["weather_risk"]

    # simple scoring
    score = 0
    score += max(0, late_hours) * 0.12
    score += max(0, dev) * 0.08
    score += (priority - 1) * 0.10
    score += 0.25 if weather == 1 else 0
    score = min(1.0, score)

    # severity tiers
    if late_hours >= 12 or (priority >= 4 and late_hours > 4):
        sev = "critical"
    elif late_hours >= 6 or (priority >= 4 and late_hours > 2):
        sev = "high"
    elif late_hours >= 2 or dev >= 4:
        sev = "medium"
    else:
        sev = "low"

    return score, sev

df[["risk_score_rules","severity_rules"]] = df.apply(
    lambda r: pd.Series(severity_from_rules(r)), axis=1
)
df[["predicted_late_hours","eta_deviation_hours","risk_score_rules","severity_rules"]].head()

Unnamed: 0,predicted_late_hours,eta_deviation_hours,risk_score_rules,severity_rules
0,-11.469399,7.567835,0.905427,medium
1,-8.6214,5.726591,0.658127,medium
2,-4.738594,2.846011,0.527681,low
3,-11.579136,5.760788,0.660863,medium
4,-11.430457,1.662357,0.232989,low


In [None]:
#ML model to predict probability of being late

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import roc_auc_score, classification_report
from sklearn.ensemble import RandomForestClassifier

features_num = ["eta_deviation_hours","time_to_promised_hours","distance_to_destination_km",
                "carrier_reliability","priority","weather_risk","predicted_late_hours"]
features_cat = ["customer","lane","carrier","service_level","last_event_type"]

X = df[features_num + features_cat]
y = df["was_late"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, stratify=y)

preprocess = ColumnTransformer([
    ("num", "passthrough", features_num),
    ("cat", OneHotEncoder(handle_unknown="ignore"), features_cat)
])

model = RandomForestClassifier(
    n_estimators=250,
    max_depth=12,
    random_state=42,
    class_weight="balanced"
)

clf = Pipeline([
    ("prep", preprocess),
    ("rf", model)
])

clf.fit(X_train, y_train)
proba = clf.predict_proba(X_test)[:,1]
pred = (proba >= 0.5).astype(int)

print("AUC:", roc_auc_score(y_test, proba))
print(classification_report(y_test, pred))

AUC: 0.9682455071580871
              precision    recall  f1-score   support

           0       0.97      0.93      0.95       804
           1       0.75      0.89      0.81       196

    accuracy                           0.92      1000
   macro avg       0.86      0.91      0.88      1000
weighted avg       0.93      0.92      0.92      1000



In [None]:
#ML probability:
df["p_late_ml"] = clf.predict_proba(X)[:,1]
df[["p_late_ml","risk_score_rules","severity_rules","was_late"]].head()

Unnamed: 0,p_late_ml,risk_score_rules,severity_rules,was_late
0,0.010258,0.905427,medium,0
1,0.045711,0.658127,medium,0
2,0.001129,0.527681,low,0
3,0.047162,0.660863,medium,0
4,0.018979,0.232989,low,0


In [None]:
#Decision policy (action selection)

def decide_action(row):
    p = row["p_late_ml"]
    sev = row["severity_rules"]
    priority = row["priority"]
    late_hours = row["predicted_late_hours"]

    # Autonomy mode: start conservative
    # auto-execute only when high confidence and low risk actions
    if p >= 0.85 and sev in ["high","critical"]:
        if late_hours >= 10:
            return "escalate_to_ops", "recommend", 0.80
        else:
            return "notify_customer_and_carrier", "auto", 0.90

    if p >= 0.70:
        if priority >= 4:
            return "escalate_to_ops", "recommend", 0.75
        return "notify_planner", "auto", 0.80

    if p >= 0.50:
        return "monitor_and_notify_if_worsens", "auto", 0.70

    return "no_action", "auto", 0.60

df[["agent_action","autonomy_mode","action_confidence"]] = df.apply(
    lambda r: pd.Series(decide_action(r)), axis=1
)
df[["p_late_ml","severity_rules","agent_action","autonomy_mode","action_confidence"]].head()

Unnamed: 0,p_late_ml,severity_rules,agent_action,autonomy_mode,action_confidence
0,0.010258,medium,no_action,auto,0.6
1,0.045711,medium,no_action,auto,0.6
2,0.001129,low,no_action,auto,0.6
3,0.047162,medium,no_action,auto,0.6
4,0.018979,low,no_action,auto,0.6


In [None]:
#Execution Layer (simulated TMS/Slack/email)
from datetime import datetime

actions = []
now = datetime(2026, 2, 2)

for _, r in df.sample(600, random_state=1).iterrows():
    # simulate human override sometimes (more likely when confidence lower)
    override_prob = 0.05 if r["action_confidence"] > 0.85 else 0.18
    human_override = int(np.random.rand() < override_prob)

    # simulate resolution effectiveness by action type
    base_resolve = {
        "notify_customer_and_carrier": 0.35,
        "notify_planner": 0.25,
        "monitor_and_notify_if_worsens": 0.10,
        "escalate_to_ops": 0.45,
        "no_action": 0.02
    }[r["agent_action"]]

    # higher priority & earlier detection improves odds
    bonus = 0.08 if r["priority"] >= 4 else 0
    bonus += 0.07 if r["time_to_promised_hours"] > 10 else 0
    resolved = int((np.random.rand() < min(0.90, base_resolve + bonus)) and human_override == 0)

    res_hours = np.random.uniform(0.5, 6) if resolved else np.random.uniform(6, 30)
    resolution_ts = now + timedelta(hours=float(res_hours))

    actions.append({
        "shipment_id": r["shipment_id"],
        "agent_action": r["agent_action"],
        "action_ts": now,
        "autonomy_mode": r["autonomy_mode"],
        "human_override": human_override,
        "override_reason": ("planner_disagreed" if human_override else ""),
        "resolved": resolved,
        "resolution_ts": resolution_ts
    })

actions_log = pd.DataFrame(actions)
actions_log.head()

Unnamed: 0,shipment_id,agent_action,action_ts,autonomy_mode,human_override,override_reason,resolved,resolution_ts
0,S000200,no_action,2026-02-02,auto,0,,0,2026-02-03 01:34:40.184694
1,S001078,notify_planner,2026-02-02,auto,0,,0,2026-02-03 05:35:22.906938
2,S000610,no_action,2026-02-02,auto,0,,0,2026-02-02 14:46:15.261843
3,S002159,notify_planner,2026-02-02,auto,0,,0,2026-02-03 03:55:51.156827
4,S001169,notify_customer_and_carrier,2026-02-02,auto,0,,1,2026-02-02 04:30:24.657462


In [None]:
#Step 6 — Metrics (automation, MTTR proxy, workload)
# % issues auto-resolved = resolved AND not overridden AND autonomy in auto
auto_resolved = actions_log[
    (actions_log["resolved"]==1) &
    (actions_log["human_override"]==0) &
    (actions_log["autonomy_mode"]=="auto")
].shape[0] / actions_log.shape[0]

# MTTR proxy
actions_log["mttr_hours"] = (actions_log["resolution_ts"] - actions_log["action_ts"]).dt.total_seconds()/3600
mttr = actions_log["mttr_hours"].mean()

# planner workload proxy = % requiring escalation or override
planner_work = actions_log[
    (actions_log["agent_action"]=="escalate_to_ops") | (actions_log["human_override"]==1)
].shape[0] / actions_log.shape[0]

print("Auto-resolved rate:", round(auto_resolved*100,2), "%")
print("Avg MTTR (hours):", round(mttr,2))
print("Planner workload proxy:", round(planner_work*100,2), "%")

Auto-resolved rate: 10.0 %
Avg MTTR (hours): 16.24
Planner workload proxy: 18.5 %


In [None]:
#OTIF proxy improvement (simple)

#Compare shipments with agent actions vs baseline “no action” group:

# Join label to actions
joined = actions_log.merge(df[["shipment_id","was_late","priority"]], on="shipment_id", how="left")

baseline_late = joined[joined["agent_action"]=="no_action"]["was_late"].mean()
acted_late = joined[joined["agent_action"]!="no_action"]["was_late"].mean()

print("Baseline late rate (no_action):", round(baseline_late*100,2), "%")
print("Late rate when acted:", round(acted_late*100,2), "%")

Baseline late rate (no_action): 0.42 %
Late rate when acted: 89.26 %
