# California House Price Prediction
This notebook demonstrates the process of predicting house prices in California using linear regression, Lasso, and Ridge regression. We will explore data preprocessing, model training, and comparison using error calculation for each model.

## Importing necessary libraries

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, Lasso, Ridge
from sklearn.metrics import mean_squared_error, mean_absolute_error
from sklearn.preprocessing import StandardScaler


## Load the Dataset
We will load the dataset and prepare it for analysis by separating the features and the target variable.


In [3]:
data = pd.read_csv('California_Houses.csv', header=0)
y = data['Median_House_Value']                 # Define the target
X = data.drop(columns=['Median_House_Value'])  # Drop the target column


## Split the Data
We will split the data into training, validation, and test sets.


In [4]:
# First split: 70% train, 30% temp
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, shuffle=True)

# Second split: 15% validation, 15% test from the 30% temp
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, shuffle=True)


## Train Linear Regression Model
We will initialize and train a Linear Regression model on the training data.


In [5]:
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)


## Evaluate the Model
We will evaluate the model's performance using Mean Squared Error (MSE) and Mean Absolute Error (MAE) for the validation set and the test set.


In [6]:
y_val_pred = linear_model.predict(X_val)
linear_val_mse = mean_squared_error(y_val, y_val_pred)
linear_val_mae = mean_absolute_error(y_val, y_val_pred)

y_test_pred = linear_model.predict(X_test)
linear_test_mse = mean_squared_error(y_test, y_test_pred)
linear_test_mae = mean_absolute_error(y_test, y_test_pred)

print("Model\tMSE(validation set)\tMAE(validation set)\tMSE(test set)\tMAE(test set)\tBest Alpha")
print("==================================================================================================")
print(f"Linear\t{linear_val_mse:.2f}\t\t{linear_val_mae:.2f}\t\t{linear_test_mse:.2f}\t{linear_test_mae:.2f}\t--")


Model	MSE(validation set)	MAE(validation set)	MSE(test set)	MAE(test set)	Best Alpha
Linear	4907211997.37		50790.06		4400953150.61	48782.03	--


## Scale the Data
Scaling the features using StandardScaler to improve model performance for Lasso and Ridge regression.


In [7]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_val_scaled = scaler.transform(X_val)
X_test_scaled = scaler.transform(X_test)


## Train Lasso and Ridge Regression Models
We will train Lasso and Ridge regression models using scaled features and evaluate their performance.


In [8]:
alphas = np.logspace(-4, 1, 10)
lasso_mse = []
lasso_mae = []
ridge_mse = []
ridge_mae = []

for alpha in alphas:
    lasso_model = Lasso(alpha=alpha, max_iter=5000)
    lasso_model.fit(X_train_scaled, y_train)
    y_val_pred_lasso = lasso_model.predict(X_val_scaled)
    
    lasso_mse.append(mean_squared_error(y_val, y_val_pred_lasso))
    lasso_mae.append(mean_absolute_error(y_val, y_val_pred_lasso))

    ridge_model = Ridge(alpha=alpha)
    ridge_model.fit(X_train_scaled, y_train)
    y_val_pred_ridge = ridge_model.predict(X_val_scaled)
    
    ridge_mse.append(mean_squared_error(y_val, y_val_pred_ridge))
    ridge_mae.append(mean_absolute_error(y_val, y_val_pred_ridge))


## Find Best Alpha Values
We will identify the best alpha values for both Lasso and Ridge regression based on validation performance.


In [9]:
best_lasso_alpha = alphas[np.argmin(lasso_mse)]
lasso_val_mse = min(lasso_mse)
lasso_val_mae = lasso_mae[np.argmin(lasso_mse)]

best_ridge_alpha = alphas[np.argmin(ridge_mse)]
ridge_val_mse = min(ridge_mse)
ridge_val_mae = ridge_mae[np.argmin(ridge_mse)]


## Final Model Evaluation
We will evaluate the final models on the test set using the best alpha values.


In [11]:
final_lasso_model = Lasso(alpha=best_lasso_alpha, max_iter=5000)
final_lasso_model.fit(X_train_scaled, y_train)
y_test_pred_lasso = final_lasso_model.predict(X_test_scaled)
lasso_test_mse = mean_squared_error(y_test, y_test_pred_lasso)
lasso_test_mae = mean_absolute_error(y_test, y_test_pred_lasso)

final_ridge_model = Ridge(alpha=best_ridge_alpha)
final_ridge_model.fit(X_train_scaled, y_train)
y_test_pred_ridge = final_ridge_model.predict(X_test_scaled)
ridge_test_mse = mean_squared_error(y_test, y_test_pred_ridge)
ridge_test_mae = mean_absolute_error(y_test, y_test_pred_ridge)

print(f"Lasso\t{lasso_val_mse:.2f}\t\t{lasso_val_mae:.2f}\t\t{lasso_test_mse:.2f}\t{lasso_test_mae:.2f}\t{best_lasso_alpha}")
print(f"Ridge\t{ridge_val_mse:.2f}\t\t{ridge_val_mae:.2f}\t\t{ridge_test_mse:.2f}\t{ridge_test_mae:.2f}\t{best_ridge_alpha}")


Lasso	4907211998.88		50790.06		4400953139.89	48782.03	0.0001
Ridge	4907212001.13		50790.06		4400953105.24	48782.03	0.0001


## Final output of code

![Image of output](output.png)


## Conclusion
Because the smallest alpha was chosen that means the model generalizes well with new data and does not need regularizer so MSE and MAE for both lasso and ridge were the same as linear without regularization