# Predicting Housing Prices with Regularized Regression

# 1. Data Preparation:

a. Load the dataset using pandas.
b. Explore and clean the data. Handle missing values and outliers.
c. Split the dataset into training and testing sets.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv('Housing.csv')

# Define your features (X) and target vaiable (y)
X = df[['area', 'bedrooms', 'bathrooms', 'stories']]  
y = df['price']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


# 2. Implement Lasso Regression:

a. Choose a set of features (independent variables, X) and house prices as the dependent variable (y).
b. Implement Lasso regression using scikit-learn to predict house prices based on the selected features.
c. Discuss the impact of L1 regularization on feature selection and coefficients.

In [2]:
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_absolute_error, mean_squared_error, mean_squared_error

# Create a Lasso regression model
lasso = Lasso(alpha=0.01)  # You can adjust the alpha (penalty parameter) for regularization

# Fit the model to the training data
lasso.fit(X_train, y_train)

# Make predictions on the test data
y_pred = lasso.predict(X_test)


In [5]:
# Assuming you have a trained Lasso model (lasso) and new data in a DataFrame (new_data)
new_data = pd.DataFrame({
    'area': [6000],
    'bedrooms': [4],
    'bathrooms': [1],
    'stories': [2],
    # Add values for other features
})

# Make predictions for new data
predicted_prices_lasso = lasso.predict(new_data)

print("Predicted House Prices (Lasso Model):")
for i, price in enumerate(predicted_prices_lasso):
    print(f"Prediction {i+1}: ${price:.2f}")


Predicted House Prices (Lasso Model):
Prediction 1: $4954326.83


# 3. Evaluate the Lasso Regression Model:

a. Calculate the Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE) for the Lasso regression model.
b. Discuss how the Lasso model helps prevent overfitting and reduces the impact of irrelevant features.

In [6]:
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = mean_squared_error(y_test, y_pred, squared=False)

print("Mean Absolute Error:", mae)
print("Mean Squared Error:", mse)
print("Root Mean Squared Error:", rmse)


Mean Absolute Error: 1158970.480386592
Mean Squared Error: 2457741643358.91
Root Mean Squared Error: 1567718.6110265164


# 4. Implement Ridge Regression:

a. Select the same set of features as independent variables (X) and house prices as the dependent variable (y).
b. Implement Ridge regression using scikit-learn to predict house prices based on the selected features.
c. Explain how 12 regularization in Ridge regression differs from L1 regularization in Lasso

In [8]:
from sklearn.linear_model import Ridge

# Create a Ridge regression model
ridge = Ridge(alpha=1.0)  # You can adjust the alpha (penalty parameter) for regularization

# Fit the model to the training data
ridge.fit(X_train, y_train)

# Make predictions on the test data
y_pred_ridge = ridge.predict(X_test)



In [9]:
# Assuming you have a trained Ridge model (ridge) and new data in a DataFrame (new_data)
new_data = pd.DataFrame({
    'area': [6000],
    'bedrooms': [4],
    'bathrooms': [1],
    'stories': [2],
    # Add values for other features
})

# Make predictions for new data
predicted_prices = ridge.predict(new_data)

print("Predicted House Prices:")
for i, price in enumerate(predicted_prices):
    print(f"Prediction {i+1}: ${price:.2f}")

Predicted House Prices:
Prediction 1: $4961460.04


# 5. Evaluate the Ridge Regression Model:

a. Calculate the MAE, MSE, and RMSE for the Ridge regression model.
b. Discuss the benefits of Ridge regression in handling multicollinearity among features and its impact on the model's coefficients.

In [10]:
# Calculate evaluation metrics
mae_ridge = mean_absolute_error(y_test, y_pred_ridge)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
rmse_ridge = mean_squared_error(y_test, y_pred_ridge, squared=False)

print("Ridge Regression Metrics:")
print("Mean Absolute Error:", mae_ridge)
print("Mean Squared Error:", mse_ridge)
print("Root Mean Squared Error:", rmse_ridge)


Ridge Regression Metrics:
Mean Absolute Error: 1158471.4534767317
Mean Squared Error: 2456765538413.524
Root Mean Squared Error: 1567407.2662883517


# 6. Model Comparison:

a. Compare the results of the Lasso and Ridge regression models.
b. Discuss when it is preferable to use Lasso, Ridge, or plain linear regression.

# 7. Hyperparameter Tuning:

a. Explore hyperparameter tuning for Lasso and Ridge, such as the strength of regularization, and discuss how different hyperparameters affect the models.

# 8. Model Improvement:

a. Investigate any feature engineering or data preprocessing techniques that can enhance the performance of the regularized regression models.

# 9. Conclusion:

a. Summarize the findings and provide insights into how Lasso and Ridge regression can be valuable tools for estimating house prices and handling complex datasets.

# Diagnosing and Remedying Heteroscedasticity and Multicollinearity