### 🛠️ Day 12 Tasks:

1. Train simple regression model
2. Log parameters, metrics, model
3. View in MLflow UI
4. Compare runs

## Task 1: Train a Simple Regression Model

Load Feature Data 

In [0]:
df = spark.table("ecommerce_catalog.gold.daily_sales_features")
df.display()


order_date,day_of_week,is_weekend,total_orders,total_revenue,avg_order_value,prev_day_revenue
2019-10-01,3,0,1242116,370635242.2599296,298.3902004804137,
2019-10-02,4,0,1189507,357573538.82987684,300.6065023828165,370635242.2599296
2019-10-03,5,0,1125950,339481302.6499579,301.5065523779545,357573538.82987684
2019-10-04,6,0,1415671,423692819.4202042,299.2876306855224,339481302.6499579
2019-10-05,7,1,1329047,395676497.2899051,297.7144504971646,423692819.4202042
2019-10-06,1,1,1317309,397025067.050164,301.39099258424864,395676497.2899051
2019-10-07,2,0,1198695,355449038.4999612,296.5300084675094,397025067.050164
2019-10-08,3,0,1365036,380871709.1899185,279.01953442247566,355449038.4999612
2019-10-09,4,0,1342556,379775354.0200936,282.8748700390103,380871709.1899185
2019-10-10,5,0,1281337,372054919.30024576,290.3646107934492,379775354.0200936


Prepare Pandas Data (Simple ML)

In [0]:
pdf = df.select(
    "total_orders",
    "is_weekend",
    "total_revenue"
).dropna().toPandas()


Train Linear Regression Model

In [0]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

X = pdf[["total_orders", "is_weekend"]]
y = pdf["total_revenue"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = LinearRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)


Evaluate Model

In [0]:
import numpy as np
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

rmse, r2

(np.float64(20281105.90859352), 0.9975949779445243)

## Task 2: Log Parameters, Metrics & Model (MLflow)

In [0]:
# --------------------------------------------------
# STEP 1: Import required MLflow libraries
# --------------------------------------------------
import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature


# --------------------------------------------------
# STEP 2: Set (or create) an MLflow experiment
# All runs will be tracked under this experiment
# --------------------------------------------------
mlflow.set_experiment("/Day12_Ecommerce_Revenue_Prediction")


# --------------------------------------------------
# STEP 3: Create an input example
# This helps MLflow understand expected model inputs
# We take one row from training data
# --------------------------------------------------
input_example = X_train.iloc[:1]


# --------------------------------------------------
# STEP 4: Infer model signature
# Signature captures:
# - input schema (feature names & types)
# - output schema (prediction type)
# This is critical for deployment & validation
# --------------------------------------------------
signature = infer_signature(
    X_train,
    model.predict(X_train)
)


# --------------------------------------------------
# STEP 5: Start an MLflow run
# Everything inside this block belongs to one run
# --------------------------------------------------
with mlflow.start_run():

    # ----------------------------------------------
    # STEP 6: Log model parameters
    # Parameters describe how the model was built
    # ----------------------------------------------
    mlflow.log_param("model_type", "LinearRegression")
    mlflow.log_param("features_used", "total_orders, is_weekend")

    # ----------------------------------------------
    # STEP 7: Log evaluation metrics
    # Metrics describe model performance
    # ----------------------------------------------
    mlflow.log_metric("rmse", rmse)
    mlflow.log_metric("r2_score", r2)

    # ----------------------------------------------
    # STEP 8: Log the trained model
    # - Stores model artifact
    # - Attaches signature
    # - Attaches input example
    # ----------------------------------------------
    mlflow.sklearn.log_model(
        sk_model=model,
        artifact_path="model",
        input_example=input_example,
        signature=signature
    )

# --------------------------------------------------
# STEP 9: Run automatically ends here
# MLflow assigns a Run ID and saves everything
# --------------------------------------------------




## Task 3: View in MLflow UI

📌 Commands / Steps
1. Open the Databricks notebook used for Day 12
2. Click the Experiments icon (🧪 flask) on the right sidebar
3. Select experiment: Day12_Ecommerce_Revenue_Prediction
4. View the list of experiment runs
5. Click on a Run ID


🔍 What you will 


- Run ID
- Start / End time
- Parameters (model_type, features_used)
- Metrics (rmse, r2_score)
- Model artifacts (sklearn model, signature, input example)


## Task 4 : Compare Runs in MLflow UI

📌 Commands / Steps
1. Open MLflow Experiments UI
2. Select experiment: Day12_Ecommerce_Revenue_Prediction
3. Select multiple runs using checkboxes
4. Click Compare


📊 Comparison View Shows
- Side-by-side parameters
- RMSE comparison
- R2 score comparison
- Feature differences
- Parallel coordinates plot

✅ Decision Rule
- Lower RMSE = better model
- Higher R2 score = better model

🧠 Final Conclusion 

Both models achieved similar performance.
Total_orders alone explains most revenue variance.
Additional features improve robustness but not accuracy for this dataset.