# Table of Contents
1. [Importing Libraries](#importing-libraries)
2. [Load Data](#load-data)
3. [MLflow Setup](#mlflow-setup)
4. [Data Preparation](#data-preparation)
5. [Feature Scaling](#feature-scaling)
6. [Linear Regression Model](#linear-regression-model)
7. [Regularized Linear Models (Lasso & Ridge)](#regularized-linear-models-lasso--ridge)


## 1. Importing Libraries
Import necessary libraries for data manipulation, machine learning, and experiment tracking.

In [14]:
#Importing Libraries
import pandas as pd
import numpy as np
import mlflow
import mlflow.sklearn
from mlflow.models import infer_signature
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

## 2. Load Data
Load the processed training and evaluation datasets.

In [2]:
train_df = pd.read_csv(r"../data/processed/train.csv")
eval_df = pd.read_csv(r"../data/processed/eval.csv")

## 3. MLflow Setup
Enable MLflow autologging for scikit-learn to automatically track parameters, metrics, and models.

In [3]:
# Enable autologging for scikit-learn
mlflow.sklearn.autolog()

Configure the MLflow tracking URI and set the experiment name.

In [4]:
### Set our tracking server uri for logging
mlflow.set_tracking_uri(uri="http://127.0.0.1:8000")

# Create a new MLflow Experiment
mlflow.set_experiment("Experiment Tracking - House Price Prediction")

2025/12/04 08:21:33 INFO mlflow.tracking.fluent: Experiment with name 'Experiment Tracking - House Price Prediction' does not exist. Creating a new experiment.


<Experiment: artifact_location='mlflow-artifacts:/289806203540587791', creation_time=1764829293310, experiment_id='289806203540587791', last_update_time=1764829293310, lifecycle_stage='active', name='Experiment Tracking - House Price Prediction', tags={}>

## 4. Data Preparation
Separate the target variable ('price') from the features for both training and evaluation sets.

In [None]:
# ================================================
# Define target & features
# ================================================
target = "price"
X_train = train_df.drop(columns=[target])
y_train = train_df[target]

X_eval = eval_df.drop(columns=[target])
y_eval = eval_df[target]

## 5. Feature Scaling
Initialize and fit a StandardScaler to normalize the feature data.

In [12]:
##Standard Scaler
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_eval_scaled  = scaler.transform(X_eval)

## 6. Linear Regression Model
Train a standard Linear Regression model and evaluate its performance using MAE, RMSE, and R¬≤.

In [15]:
# --- Linear Regression ---
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)
y_pred_lr = lr.predict(X_eval_scaled)

print("Linear Regression:")
print(" MAE:", mean_absolute_error(y_eval, y_pred_lr))
print(" RMSE:", np.sqrt(mean_squared_error(y_eval, y_pred_lr)))
print(" R¬≤:", r2_score(y_eval, y_pred_lr))

2025/12/04 08:27:41 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '8bc7bdb8d2b342719705d75cc66600ce', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow


üèÉ View run sincere-bird-551 at: http://127.0.0.1:8000/#/experiments/289806203540587791/runs/8bc7bdb8d2b342719705d75cc66600ce
üß™ View experiment at: http://127.0.0.1:8000/#/experiments/289806203540587791
Linear Regression:
 MAE: 80756.77957289686
 RMSE: 155974.1437511413
 R¬≤: 0.8119958713807821


## 7. Regularized Linear Models (Lasso & Ridge)
Train and evaluate Lasso and Ridge regression models with specific regularization strengths.

In [17]:
models = {
    "Lasso": Lasso(alpha=0.01),
    "Ridge": Ridge(alpha=1.0)
}

for model_name, model in models.items():
    with mlflow.start_run(run_name=model_name):
        print(f"Training ... {model_name}")
        model.fit(X_train_scaled, y_train)
        
        predictions = model.predict(X_eval_scaled)

        print(" MAE:", mean_absolute_error(y_eval, predictions))
        print(" RMSE:", np.sqrt(mean_squared_error(y_eval, predictions)))
        print(" R¬≤:", r2_score(y_eval, predictions))
        print("_" * 100)




Training ... Lasso


  model = cd_fast.enet_coordinate_descent(


 MAE: 80666.97249449498
 RMSE: 155906.27275352075
 R¬≤: 0.8121594530007019
____________________________________________________________________________________________________
üèÉ View run Lasso at: http://127.0.0.1:8000/#/experiments/289806203540587791/runs/6797e8aa035b4ad49c499964ebb3e2bf
üß™ View experiment at: http://127.0.0.1:8000/#/experiments/289806203540587791




Training ... Ridge




 MAE: 80757.22637655938
 RMSE: 155974.59683609768
 R¬≤: 0.8119947791232567
____________________________________________________________________________________________________
üèÉ View run Ridge at: http://127.0.0.1:8000/#/experiments/289806203540587791/runs/5c9b7398416d49f89cc68d6e19da0cd1
üß™ View experiment at: http://127.0.0.1:8000/#/experiments/289806203540587791
