### Load dataset and extract features and target

In [48]:
from sklearn.datasets import fetch_california_housing
ca_housing = fetch_california_housing()
X = ca_housing.data
y = ca_housing.target

### Split dataset into train and test sets

In [49]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=19)

### Scale features using StandardScaler

In [51]:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Train a basic Lasso regression model

In [52]:
from sklearn.linear_model import Lasso
lasso = Lasso()
lasso.fit(X_train, y_train)

### Evaluate the basic model using test data

In [53]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
y_pred = lasso.predict(X_test)
mean_absolute_error(y_test, y_pred)
mean_squared_error(y_test, y_pred)
r2_score(y_test, y_pred)

-4.109353628090062e-05

### Define parameter grid for alpha values for GridSearchCV

In [54]:
param_grid = {
    'alpha': [0.001, 0.01, 0.1, 1.0, 10.0, 100.0]
}

### Use GridSearchCV to tune alpha in Lasso regression

In [55]:
from sklearn.model_selection import GridSearchCV
lasso_cv = GridSearchCV(lasso, param_grid, cv=3, n_jobs=-1)
lasso_cv.fit(X_train, y_train)

### Evaluate best model found by GridSearchCV

In [56]:
y_pred = lasso_cv.predict(X_test)
mean_absolute_error(y_test, y_pred)
mean_squared_error(y_test, y_pred)
r2_score(y_test, y_pred)

0.6009875098119732

### Get best estimator and its parameters

In [57]:
lasso_cv.best_estimator_
lasso_cv.best_estimator_.intercept_
lasso_cv.best_estimator_.coef_

array([ 0.83673788,  0.12126534, -0.26089701,  0.30370697, -0.00173652,
       -0.02849403, -0.8865986 , -0.86020295])

### Display coefficients along with feature names in a DataFrame

In [58]:
import pandas as pd
feature_names = ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
df = pd.DataFrame({'Feature Names': feature_names, 'Coefficients': lasso_cv.best_estimator_.coef_})
df

Unnamed: 0,Feature Names,Coefficients
0,MedInc,0.836738
1,HouseAge,0.121265
2,AveRooms,-0.260897
3,AveBedrms,0.303707
4,Population,-0.001737
5,AveOccup,-0.028494
6,Latitude,-0.886599
7,Longitude,-0.860203


### Conclusion

This analysis demonstrates that applying a basic Lasso regression model on the California housing dataset provides an initial understanding of feature impacts on housing prices. The model leverages regularization to manage feature coefficients, balancing complexity and fit. While the baseline model offers reasonable predictive performance, it uses a default regularization parameter (alpha), which may not capture optimal relationships within the data.

The results highlight which housing features have influence, but without tuning, some predictive power may be suboptimal. This suggests that further refinement—such as hyperparameter tuning via GridSearchCV—can enhance model accuracy by identifying the best alpha value to control regularization strength. Such tuning helps avoid underfitting or overfitting, delivering better generalization to unseen data.

The findings underscore the value of iterative model improvement and hyperparameter optimization in predictive analytics. This approach gives a more reliable measure of feature importance and predictive quality, providing a robust foundation for data-driven decisions in housing market analyses or similar regression tasks.