## 1. Import Libraries & Load Dataset
We use the California Housing dataset available in `sklearn.datasets`.


In [1]:
import numpy as np
from sklearn.datasets import fetch_california_housing

# Load California housing dataset
data = fetch_california_housing(as_frame=True)
df = data.frame

# Preview dataset
df.head()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,MedHouseVal
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


## 2. Split Data into Train/Test Sets
We separate features (X) and target (y), then split the dataset.


In [2]:
from sklearn.model_selection import train_test_split

# Features and target
X = df.drop(columns=['MedHouseVal'])
y = df['MedHouseVal']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print("Train shape:", X_train.shape)
print("Test shape:", X_test.shape)


Train shape: (16512, 8)
Test shape: (4128, 8)


## 3. Fit Linear Regression Model
We train a Linear Regression model on the training data.


In [3]:
from sklearn.linear_model import LinearRegression

# Initialize model
model = LinearRegression()

# Fit model
model.fit(X_train, y_train)

print("Model coefficients:", model.coef_)
print("Model intercept:", model.intercept_)


Model coefficients: [ 4.48674910e-01  9.72425752e-03 -1.23323343e-01  7.83144907e-01
 -2.02962058e-06 -3.52631849e-03 -4.19792487e-01 -4.33708065e-01]
Model intercept: -37.023277706064064


## 4. Evaluate Model Performance
We predict on the test set and evaluate using **RMSE** and **R² score**.


In [4]:
from sklearn.metrics import mean_squared_error, r2_score

# Predictions
y_pred = model.predict(X_test)

# Evaluation
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

print("Root Mean Squared Error (RMSE):", rmse)
print("R² Score:", r2)


Root Mean Squared Error (RMSE): 0.7455813830127761
R² Score: 0.5757877060324511
