# Tasks

1. Train a linear regression model on a dataset (e.g., California Housing) and compute MAE, MSE, RMSE, and R².

2. Compare results of different models (e.g., LinearRegression vs. DecisionTreeRegressor) using the same metrics.

3. Normalize your dataset and check if metrics improve.

In [None]:
# Task 1. Train a linear regression model on the Boston housing dataset.
import numpy as np
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor  
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

In [15]:
data = fetch_california_housing()

X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
print(f"MAE: {mae:.2f}")
print(f"MSE: {mse:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.2f}") 


MAE: 0.53
MSE: 0.56
RMSE: 0.75
R²: 0.58


In [16]:
# Task 2. Compare results of different models (e.g., LinearRegression vs. DecisionTreeRegressor) using the same metrics.

data = fetch_california_housing()

X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

model = DecisionTreeRegressor(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print(f"MAE: {mae:.2f}")
print(f"MSE: {mse:.2f}")
print(f"RMSE: {rmse:.2f}")
print(f"R²: {r2:.2f}")

MAE: 0.45
MSE: 0.49
RMSE: 0.70
R²: 0.62


**Error metrics (MAE, MSE, RMSE):**
- The Decision Tree has lower errors across all metrics compared to Linear Regression.

**R² score:**
- Linear Regression: 0.58 → explains ~58% of the variance.

- Decision Tree: 0.62 → explains ~62% of the variance.

Slight improvement, but not a huge jump.

**Model nature:**
- Linear Regression assumes a straight-line relationship between features and target. It struggles if the relationship is more complex.
- Decision Tree can capture non-linear patterns and interactions, which explains its better performance here.