## Load and Convert to DataFrame
The fetch_california_housing dataset contains information about housing in California, including features like population, median income, and house age. It comes in dictionary format, so we will convert it into a pandas DataFrame.




In [None]:
#load the California Housing dataset

from sklearn.datasets import fetch_california_housing
import pandas as pd

dataset = fetch_california_housing()

In [2]:
# Convert to a DataFrame

df = pd.DataFrame(dataset.data, columns=dataset.feature_names)
df['Target'] = dataset.target

df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,Target
0,8.3252,41.0,6.984127,1.02381,322.0,2.555556,37.88,-122.23,4.526
1,8.3014,21.0,6.238137,0.97188,2401.0,2.109842,37.86,-122.22,3.585
2,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
3,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
4,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422


## Handle Missing Values
The dataset typically does not contain missing values, but we will verify and handle them if necessary. If missing values exist, we can:

Use mean/median imputation for numerical data.

In [4]:
#standardisation

from sklearn.preprocessing import StandardScaler

object= StandardScaler()
X_scale = object.fit_transform(df)

As there are no missing values we can proceed with this.

## Feature Scaling
Feature scaling is crucial because:

The dataset contains features like house age (small range) and median income (larger range).
Algorithms like gradient descent-based models (e.g., Linear Regression, Neural Networks) and distance-based models (e.g., k-NN, SVMs) perform better when features are standardized.
We will apply Standardization (Z-score normalization).


Standardization formula: $X' = \frac{X - \mu}{\sigma}$

In [5]:
X_scale

array([[ 2.34476576,  0.98214266,  0.62855945, ...,  1.05254828,
        -1.32783522,  2.12963148],
       [ 2.33223796, -0.60701891,  0.32704136, ...,  1.04318455,
        -1.32284391,  1.31415614],
       [ 1.7826994 ,  1.85618152,  1.15562047, ...,  1.03850269,
        -1.33282653,  1.25869341],
       ...,
       [-1.14259331, -0.92485123, -0.09031802, ...,  1.77823747,
        -0.8237132 , -0.99274649],
       [-1.05458292, -0.84539315, -0.04021111, ...,  1.77823747,
        -0.87362627, -1.05860847],
       [-0.78012947, -1.00430931, -0.07044252, ...,  1.75014627,
        -0.83369581, -1.01787803]])

In [71]:
#linear Regression

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

X_train, X_test, y_train, y_test = train_test_split(
    X_scale, dataset.target, test_size=0.33, random_state=42)

lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

## Linear Regression
### How It Works:
Linear Regression finds the best-fitting line to predict the target variable 
𝑌 based on features 𝑋

The equation for Linear Regression is $Y = w_1X_1 + w_2X_2 + ... + w_nX_n + b$.
where 
𝑤
𝑖
w 
i
​
  are weights (coefficients) and 
𝑏
b is the intercept.

It minimizes the Mean Squared Error (MSE) to find the best line.

In [73]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

### Mean Squared Error (MSE)

The Mean Squared Error (MSE) is given by the formula:

$
MSE = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$

 
Measures the average squared difference between actual and predicted values.

Penalizes large errors more than small ones because of squaring.

Lower MSE means better model performance.

### Mean Absolute Error (MAE)

$$
MAE = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i|
$$
    
Measures the absolute difference between actual and predicted values.
    
Unlike MSE, it doesn’t penalize large errors more than small errors.
    
Lower MAE means better model accuracy.

## R-squared Score (𝑅^2)

$$
R^2 = 1 - \frac{\sum (y_i - \hat{y}_i)^2}{\sum (y_i - \bar{y})^2}
$$

The **R-squared (R²)** score represents how well the model explains the variance in the data.

- $R^2 = 1$: The model perfectly explains the data.
- $R^2 = 0$: The model explains none of the variance (equivalent to predicting the mean).
- $R^2 < 0$: The model is worse than simply predicting the mean.

In [8]:
mean_squared_error(y_pred_lr, y_test)

1.2446582939033036e-30

In [9]:
mean_absolute_error(y_pred_lr, y_test)

7.059825419507913e-16

 This is an extremely small value, essentially close to zero. A very low MSE indicates that the model's predictions are extremely close to the actual values, and the model has very little error.
This indicates excellent performance, with almost no discrepancy between the predicted and actual values.

In [10]:
r2_score(y_pred_lr, y_test)

1.0

This is also an extremely small value, indicating that the average absolute difference between the predicted and actual values is almost negligible.
 Like the MSE, this suggests minimal error and indicates that the model has performed very well.

In [11]:
from sklearn.tree import DecisionTreeRegressor


dt = DecisionTreeRegressor(random_state=42)
dt.fit(X_train, y_train)  
y_pred_dt = dt.predict(X_test)

An R^2 score of 1.0 means that the model explains 100% of the variance in the target variable. The model has perfectly fitted the data and explains the relationship between the features and the target perfectly.
This indicates that the model has perfectly captured the data's underlying pattern and provides the best possible fit.

The results suggest that the model has excellent performance:

It makes very accurate predictions, with negligible error (both MSE and MAE are extremely small).
The model explains the target variable perfectly, as evidenced by the 
𝑅^2 = 1

In [12]:
mean_squared_error(y_pred_dt, y_test)

1.8403552701115834e-06

## Decision Tree Regressor
### How It Works:
A Decision Tree splits data based on feature values, creating a tree-like structure.

It chooses splits that minimize variance in each subset.

The final predictions are made by taking the average of values in each leaf node.

Decision Trees capture non-linear relationships in data.


In [13]:
mean_absolute_error(y_pred_dt, y_test)

0.0003097489724045361

This is a very small value, indicating that the model's predictions are extremely close to the actual values. However, it's still greater than zero, so there is some small error, but it's negligible in practical terms.
The model has very low error, but there may still be slight discrepancies between the predicted and actual values.

In [14]:
r2_score(y_pred_dt, y_test)

0.9999986189132986

 This value is very small, indicating that on average, the model's predictions deviate from the actual values by just a tiny amount.
This is a sign of excellent predictive accuracy. The model is very close to the actual values on average

In [15]:
from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(X_train, y_train)  
y_pred_rf = rf.predict(X_test)

An R^2 score of 0.9999986 means the model explains almost 100% of the variance in the data. This is a very high score, suggesting that the model has captured the underlying patterns in the data with remarkable precision.This indicates an excellent fit of the model to the data, meaning the Decision Tree Regressor has learned the relationships between the features and target variable very well.

## Random Forest Regressor
### How It Works:
It is an ensemble of Decision Trees, where multiple trees are trained on different data samples (bagging).

Predictions are made by averaging outputs from all trees.

Reduces overfitting and improves accuracy compared to a single Decision Tree.


In [16]:
mean_squared_error(y_pred_rf, y_test)

1.1978431515986009e-06

In [17]:
mean_absolute_error(y_pred_rf, y_test)

0.0002538481062841674

The Random Forest model has made predictions with very low error, which suggests good performance.

In [18]:
r2_score(y_pred_rf, y_test)

0.9999991010612432

The MAE is also very small, showing that the average difference between the predicted and actual values is minimal.

r2_score(y_pred_rf, y_test)

An 
𝑅^
2
score of 0.9999991 indicates that the model explains almost 100% of the variance in the data, suggesting that the Random Forest Regressor has fitted the data very well.

## Gradient Boosting Regressor
### How It Works:

Uses Boosting, meaning it builds trees sequentially, with each new tree correcting errors from the previous ones.
    
It minimizes residual errors using gradient descent.

In [20]:
mean_squared_error(y_pred_gb, y_test)

7.276096209518953e-05

In [21]:
mean_absolute_error(y_pred_gb, y_test)

0.0061615204489604

The MSE is small but larger than the values seen in other models, indicating that the Gradient Boosting model has some residual error in its predictions.

In [22]:
r2_score(y_pred_gb, y_test)

0.9999453992473412

This value shows that on average, the predictions deviate from the actual values by around 0.006. While still small, this suggests that the model's predictions are less accurate compared to the extremely low MAE seen in other models.

In [23]:
from sklearn.svm import SVR
svr = SVR(kernel='rbf')
svr.fit(X_train, y_train)  # Needs scaling
y_pred_svr = svr.predict(X_test)

The value is very close to 1, meaning the model explains nearly 100% of the variance in the data.

## Support Vector Regressor (SVR)
### How It Works:

SVR finds a hyperplane that best fits the data, with a margin of tolerance (𝜖).

Uses kernel trick (e.g., RBF kernel) to model non-linear relationships.

In [24]:
mean_squared_error(y_pred_svr, y_test)

0.00564516069778184

In [25]:
mean_absolute_error(y_pred_svr, y_test)

0.05361453748098604

In [26]:
r2_score(y_pred_svr, y_test)

0.9958713606071191

In [None]:
Low error values in both MSE and MAE 
Very high R² score, showing that the model captures the majority of the variance in the data.

In [None]:
# Final conclusion

# Final conclusion

## Based on the metrics, All models works good but Linear Regressor is the best performing model.