
#Question 4:Using 2 data sets from our econometrics course, write Python code that compares:

LS and WLS
LS and LASSO
LS and LOESS
Pick 1 of these 3 pairs. Compute a test statistic that would show that 1 model does better than the other.

For example, if you picked LS and WLS, you would compute a statistic for each model, and show that one of those test statistics is better than the other.

Answer:
I would like to pick Least Squares (LS) and LASSO regression and compare their performance using Mean Squared Error (MSE) as the test statistic. MSE measures the average squared difference between predicted and actual values — a lower MSE reflects better model accuracy.

For this comparison, we’ll use the California Housing dataset, a widely used alternative in econometrics

In [3]:
# Step 1: Import necessary libraries
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression, LassoCV
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import pandas as pd

In [4]:
# Step 2: Load the California Housing dataset
housing = fetch_california_housing()
X = pd.DataFrame(housing.data, columns=housing.feature_names)
y = housing.target


In [5]:
# Step 3: Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
# Step 4: Fit a Least Squares (Linear Regression) model
lr_model = LinearRegression()
lr_model.fit(X_train, y_train)
y_pred_lr = lr_model.predict(X_test)
mse_lr = mean_squared_error(y_test, y_pred_lr)

In [7]:
# Step 5: Fit a LASSO Regression model with cross-validation to tune alpha
lasso_model = LassoCV(cv=5, random_state=42)
lasso_model.fit(X_train, y_train)
y_pred_lasso = lasso_model.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)

In [8]:
# Step 6: Print the MSE for both models
print(f"Linear Regression MSE: {mse_lr:.4f}")
print(f"LASSO Regression MSE: {mse_lasso:.4f}")
print(f"Best alpha for LASSO: {lasso_model.alpha_:.4f}")

Linear Regression MSE: 0.5559
LASSO Regression MSE: 0.5556
Best alpha for LASSO: 0.0342


In [9]:
# Step 7: Optional — Compare number of non-zero features used by LASSO
print(f"Number of features used by LASSO: {sum(lasso_model.coef_ != 0)} out of {X.shape[1]}")

Number of features used by LASSO: 7 out of 8


In [10]:
improvement = mse_lr - mse_lasso
print(f"Improvement in MSE by using LASSO: {improvement:.4f}")


Improvement in MSE by using LASSO: 0.0003
