# Predicting House Prices with California Housing Dataset

In [1]:
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import Ridge, Lasso
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd
import numpy as np

#### Business Understanding
- Context: You’re using the California Housing dataset to predict median house prices based on several features like median income, housing age, and population. Your goal is to use Ridge and Lasso to create models that handle overfitting and interpret feature importance.
- Question: Why might regularization techniques like Ridge and Lasso help in creating better models for predicting house prices?

In [2]:
## Ridge and Lasso Regularisation helps enable better feature selection in models as they identify the features which have little emphasis to the target variable, and reduce them down so that only the key feautures are used. Hence it can create better predictive models. 

#### Data Understanding
- Explore the Data:
    - Load the California Housing dataset.
    - Convert it into a Pandas DataFrame and explore the first few rows using .head().
    - Use .describe() and .info() to check for any missing values or data issues.
- Question: What do the dataset’s key statistics and summary information tell you about the features?

In [3]:
raw_data = fetch_california_housing()
df = pd.DataFrame(raw_data.data)
df['target'] = raw_data.target

print(df.head())
print(df.describe())
print(df.info())

## There's no nulls that will need either cleaning or removing.
## Features aren't yet standardised so I will have to do that.

        0     1         2         3       4         5      6       7  target
0  8.3252  41.0  6.984127  1.023810   322.0  2.555556  37.88 -122.23   4.526
1  8.3014  21.0  6.238137  0.971880  2401.0  2.109842  37.86 -122.22   3.585
2  7.2574  52.0  8.288136  1.073446   496.0  2.802260  37.85 -122.24   3.521
3  5.6431  52.0  5.817352  1.073059   558.0  2.547945  37.85 -122.25   3.413
4  3.8462  52.0  6.281853  1.081081   565.0  2.181467  37.85 -122.25   3.422
                  0             1             2             3             4  \
count  20640.000000  20640.000000  20640.000000  20640.000000  20640.000000   
mean       3.870671     28.639486      5.429000      1.096675   1425.476744   
std        1.899822     12.585558      2.474173      0.473911   1132.462122   
min        0.499900      1.000000      0.846154      0.333333      3.000000   
25%        2.563400     18.000000      4.440716      1.006079    787.000000   
50%        3.534800     29.000000      5.229129      1.048780   

#### Data Preparation
- Data Preprocessing:
    - Split the dataset into features (X) and target (y) (house prices).
    - Split the data into training and testing sets using an 80-20 split.
    - Scale the features using StandardScaler for both the training and test sets.
- Question: Why is it important to scale your features before applying regularization techniques like Ridge or Lasso?

In [5]:
scaler = StandardScaler()
data_unscaled = df.drop(columns = 'target')
X = scaler.fit_transform(data_unscaled)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)

## We need to standardise the features so that they have equal weighting when going into the model. Ensuring that features are not artificially effecting the model more than others, and hence removing error for better predictions.

#### Modeling
- Ridge and Lasso Regression:
    - Train a Ridge Regression and Lasso Regression model using the training data.
    - Predict house prices using both models on the test set.
    - Evaluate the models using R² and Mean Squared Error (MSE).
- Question: What are the R² and MSE scores for both Ridge and Lasso models on the test set?

In [6]:
ridge = Ridge()
lasso = Lasso()

ridge.fit(X_train, y_train)
lasso.fit(X_train, y_train)

ridge_pred = ridge.predict(X_test)
lasso_pred = lasso.predict(X_test)

ridge_mse = mean_squared_error(y_test, ridge_pred)
lasso_mse = mean_squared_error(y_test, lasso_pred)

ridge_r2 = r2_score(y_test, ridge_pred)
lasso_r2 = r2_score(y_test, lasso_pred)

print(f'Ridge MSE = {ridge_mse:.4f} compared to Lasso MSE = {lasso_mse:.4f}')
print(f'Ridge R2 = {ridge_r2:.4f} compared to Lasso R2 = {lasso_r2:.4f}')

Ridge MSE = 0.5367 compared to Lasso MSE = 1.3583
Ridge R2 = 0.6039 compared to Lasso R2 = -0.0025


#### Evaluation
- Model Tuning with GridSearchCV:
    - Perform a Grid Search to find the optimal alpha values for both Ridge and Lasso models.
    - Use a range of alpha values, such as [0.001, 0.01, 0.1, 1, 10], and tune the models using GridSearchCV with 5-fold cross-validation.
    - Once tuned, evaluate the models again on the test set using R² and MSE.
- Question: What is the best alpha for Ridge and Lasso, and how did tuning improve the model’s performance?

In [8]:
parameters = [{'alpha': [0.001, 0.01, 0.1, 1, 10]}]

ridge_grid = GridSearchCV(ridge, parameters, cv = 5)
lasso_grid = GridSearchCV(lasso, parameters, cv = 5)

ridge_grid.fit(X_train, y_train)
lasso_grid.fit(X_train, y_train)

ridge_grid_pred = ridge_grid.predict(X_test)
lasso_grid_pred = lasso_grid.predict(X_test)

ridge_grid_mse = mean_squared_error(y_test, ridge_grid_pred)
lasso_grid_mse = mean_squared_error(y_test, lasso_grid_pred)

ridge_grid_r2 = r2_score(y_test, ridge_grid_pred)
lasso_grid_r2 = r2_score(y_test, lasso_grid_pred)

ridge_grid_alpha = ridge_grid.best_params_
lasso_grid_alpha = lasso_grid.best_params_

print(f'Ridge Grid MSE = {ridge_grid_mse:.4f} compared to Lasso Grid MSE = {lasso_grid_mse:.4f}')
print(f'Ridge Grid R2 = {ridge_grid_r2:.4f} compared to Lasso Grid R2 = {lasso_grid_r2:.4f}')
print(f'Ridge Grid best alpha = {ridge_grid_alpha} compared to Lasso Grid best alpha = {lasso_grid_alpha}')

## The ridge mantained it's average performance under the tuning. However, the lasso model benefited greatly, compared to before and now is performaing to the same level as the ridge model.

Ridge Grid MSE = 0.5365 compared to Lasso Grid MSE = 0.5357
Ridge Grid R2 = 0.6041 compared to Lasso Grid R2 = 0.6046
Ridge Grid best alpha = {'alpha': 10} compared to Lasso Grid best alpha = {'alpha': 0.001}


#### Deployment
- Feature Importance:
    - Interpret the coefficients from the best Ridge or Lasso model to understand which features have the most significant impact on house prices.
- Question: Which features are most important for predicting house prices, and how can you use this information to make data-driven decisions?

In [13]:
ridge_coefficients = ridge.coef_
lasso_coefficients = lasso.coef_

# If you're using a GridSearchCV, you can access the best estimator's coefficients
best_ridge_coefficients = ridge_grid.best_estimator_.coef_
best_lasso_coefficients = lasso_grid.best_estimator_.coef_

# Create a DataFrame to display the feature importance (coefficients)
feature_names = data_unscaled.columns
ridge_coef_df = pd.DataFrame({
    'Feature': feature_names,
    'Ridge Coefficient': ridge_coefficients
}).sort_values(by='Ridge Coefficient', key=abs, ascending=False)

lasso_coef_df = pd.DataFrame({
    'Feature': feature_names,
    'Lasso Coefficient': lasso_coefficients
}).sort_values(by='Lasso Coefficient', key=abs, ascending=False)

print(ridge_coef_df)
print(lasso_coef_df)

##

  Feature  Ridge Coefficient
6       6          -0.896470
7       7          -0.867410
0       0           0.845514
3       3           0.351453
2       2          -0.296408
1       1           0.115745
5       5          -0.035570
4       4          -0.010607
  Feature  Lasso Coefficient
0       0                0.0
1       1                0.0
2       2                0.0
3       3               -0.0
4       4               -0.0
5       5               -0.0
6       6               -0.0
7       7               -0.0


#### Interpretations Notes:
- Model Performance:
    - Ridge Model: Initially, Ridge had a good fit with an R² score of 0.6039, meaning it explains ~60% of the variance in house prices. Its MSE of 0.5367 indicates the average squared error is quite low. After tuning, the Ridge model improved slightly, with a marginally higher R² (0.6041) and slightly lower MSE (0.5365).
    - Lasso Model: Before tuning, Lasso performed poorly, with an R² close to 0, meaning it could not explain any variance in the target. This is likely because Lasso aggressively shrank most of the feature coefficients to zero, effectively removing too much information from the model. However, after tuning, Lasso improved dramatically, with an R² of 0.6046 and an MSE of 0.5357, nearly matching Ridge's performance. This shows that tuning allowed Lasso to retain the important features, leading to better predictions.

- Ridge Features:
    - Interpretation: Ridge retains all the features, and the coefficients reflect how much each feature contributes to predicting house prices. A higher absolute value of the coefficient means that the feature has a stronger influence. For instance:
        - Feature 6 (possibly longitude or another continuous variable) has the highest negative influence, meaning higher values in this feature are associated with lower house prices.
        - Feature 0 has a positive coefficient, indicating it positively correlates with house prices.

- Lasso Features:
    - Interpretation: Lasso shrinks many of the coefficients to 0, meaning it has effectively excluded certain features from the model. This is useful when you want to simplify the model by retaining only the most relevant features. In this case:
    - Lasso completely eliminated most features, suggesting that only a few may have meaningful predictive power in this dataset, or the features themselves may not be strong predictors after scaling.
    
- Lasso Improveent:
    - Initially, Lasso aggressively shrunk too many coefficients to zero, which led to poor predictions (low R², high MSE).
    - After tuning, Lasso's alpha parameter was optimized, reducing the regularization penalty, so it retained more useful features. This is why it achieved an R² score comparable to Ridge.
    
- Takeaways:

- Ridge helps when all features are somewhat important and when multicollinearity (correlated features) is present. It maintains all features but shrinks their impact, leading to a balanced model.
- Lasso is beneficial when you want to simplify the model by removing irrelevant or less important features. After tuning, it can dramatically improve performance, as seen here.