## HeatRiskModel with MLflow and LinearRegression

A custom MLflow pyfunc model wrapping a scikit-learn `LinearRegression` to predict heat risk.


In [0]:
from sklearn.linear_model import LinearRegression
import mlflow
import mlflow.pyfunc
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

class HeatRiskModel(mlflow.pyfunc.PythonModel):
    def __init__(self):
        self.model = LinearRegression()  # 

    def load_context(self, context):
        pass  # Nothing needed here anymore

    def fit(self, X, y):
        self.model.fit(X, y)

    def predict(self, context, model_input):
        return self.model.predict(model_input)


## HeatRiskModel with Inline LinearRegression Initialization

An MLflow pyfunc model that initializes and trains `LinearRegression` inside the `fit` method for heat risk prediction.


In [0]:
class HeatRiskModel(mlflow.pyfunc.PythonModel):
    def load_context(self, context):
        pass

    def fit(self, X, y):
        self.model = LinearRegression()  # ✅ define model here
        self.model.fit(X, y)

    def predict(self, context, model_input):
        return self.model.predict(model_input)


## Training and Logging HeatRiskModel with MLflow on Weather Data

Loads climate features from a Delta table, trains the `HeatRiskModel`, and logs it to MLflow along with mean squared error.


In [0]:
import mlflow
mlflow.set_registry_uri("databricks")
# Load the combined dataset
df = spark.read.table("climaguard.singapore.climate_weather_features").toPandas()

# Feature engineering
features = ["rainfall", "relative_humidity", "wind_speed", "wind_direction"]
X = df[features]
y = df["temperature"]

# Split for training
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train model
model = HeatRiskModel()
model.fit(X_train, y_train)

# Start MLflow run
with mlflow.start_run():
    mlflow.pyfunc.log_model(artifact_path="heat_risk_model", python_model=model)
    preds = model.predict(None, X_test)
    mse = mean_squared_error(y_test, preds)
    mlflow.log_metric("mse", mse)
    print("MSE:", mse)


## Predicting Heat Risk and Saving Results to Delta Table

Generates temperature predictions, classifies heat risk, and stores the results in a Delta table using Spark.


In [0]:
# Step 1: Predict temperature and classify risk
df["predicted_temp"] = model.predict(None, X)


# Step 2: Convert Pandas DataFrame to Spark DataFrame
spark_df = spark.createDataFrame(df)

# Step 3: Save as Delta Table in desired location
spark_df.write \
    .format("delta") \
    .mode("overwrite") \
    .option("overwriteSchema", "true") \
    .saveAsTable("climaguard.singapore.climate_heat_risk")
