In [2]:
import pandas as pd
dataset = pd.read_csv("house_data.csv")
dataset["Age"] = dataset["Age"].fillna(dataset["Age"].mean())
x = dataset.iloc[:,:-1]
y = dataset["Price"] 

The dataset is prepared by handling missing values and selecting relevant features.

This setup will be used to compare different regression models.


In [4]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

x_train,x_test,y_train,y_test = train_test_split(
    x, y, test_size=0.2, random_state = 42
)

scaler = StandardScaler()
x_train_scaled = scaler.fit_transform(x_train)
x_test_scaled = scaler.transform(x_test)

In [17]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

model_lr = LinearRegression()
model_lr.fit(x_train_scaled, y_train)

y_pred = model_lr.predict(x_test_scaled)
lr_r2 = r2_score(y_test,y_pred)
print("Multiple Linear Regression R2: ", lr_r2)

Multiple Linear Regression R2:  1.0


In [18]:
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree=2, include_bias=False)
x_train_poly = poly.fit_transform(x_train_scaled)
x_test_poly = poly.transform(x_test_scaled)

model_poly = LinearRegression()
model_poly.fit(x_train_poly, y_train)

y_pred_poly = model_poly.predict(x_test_poly)
poly_r2 = r2_score(y_test,y_pred_poly)
print("polynomial regression r2: ", poly_r2)

polynomial regression r2:  0.8919239331200692


In [19]:
from sklearn.tree import DecisionTreeRegressor

model_tree = DecisionTreeRegressor(random_state=42)
model_tree.fit(x_train,y_train)

y_tree = model_tree.predict(x_test)
tree_r2 = r2_score(y_test,y_tree)
print("Decision Tree Regressor R2: ", tree_r2)

Decision Tree Regressor R2:  0.7959183673469388


In [20]:
comparison = pd.DataFrame({
    "Model": ["Linear Regression", "Polynomial Regression", "Decision Tree"],
    "R2 Score": [lr_r2, poly_r2, tree_r2]
})

comparison


Unnamed: 0,Model,R2 Score
0,Linear Regression,1.0
1,Polynomial Regression,0.891924
2,Decision Tree,0.795918


Linear regression provides a stable baseline but may underfit complex patterns.

Polynomial regression improves performance by capturing non-linear relationships.

Decision Tree regression can model complex patterns but may risk overfitting on small datasets.


# Bias–Variance conclusion:

This comparison demonstrates the bias–variance tradeoff.

Simpler models like linear regression have higher bias but lower variance,

while complex models like decision trees have lower bias but higher variance.

Model selection should balance performance and generalization.
