# Comparison of Linear Regression Models

This notebook compares two linear regression models to predict real estate prices:
1. A model that incorporates zip codes as a predictor variable.
2. A model that is specific to a single zip code.

We aim to understand how the inclusion of geographic segmentation (zip codes) as a variable affects the model performance compared to a model trained on data from a specific zip code.


In [10]:
import pandas as pd
import numpy as np
from model_training_funcs import get_model_with_zip, get_model_for_zip
import warnings
warnings.filterwarnings("ignore")


In [11]:
# This is handled within the functions, so no need to load data here.
# Just an informational cell.
print("Data loading and cleaning functions are encapsulated within the model functions.")


Data loading and cleaning functions are encapsulated within the model functions.


In [12]:
# Model using zip codes as a predictor
model_with_zip, rmse_with_zip, r_squared_with_zip = get_model_with_zip()
print(f"Model with Zip Codes as Predictors: RMSE = {rmse_with_zip}, R-squared = {r_squared_with_zip}")


Model with Zip Codes as Predictors: RMSE = 377200.6817884638, R-squared = 0.5909159887906463


In [13]:
# Model for a specific zip code (e.g., 78660)
zip_code = 78660
model_for_zip, rmse_for_zip, r_squared_for_zip = get_model_for_zip(zip_code)
print(f"Model for Zip Code {zip_code}: RMSE = {rmse_for_zip}, R-squared = {r_squared_for_zip}")


Model for Zip Code 78660: RMSE = 42708.71002938853, R-squared = 0.9005311593567898


## Model Comparison Results

The comparison of the two models indicates the following:
- **Model with Zip Codes as Predictors**:
  - RMSE: `rmse_with_zip`
  - R-squared: `r_squared_with_zip`
- **Model for Specific Zip Code (78660)**:
  - RMSE: `rmse_for_zip`
  - R-squared: `r_squared_for_zip`

From these results, we can observe that (provide a brief analysis based on the RMSE and R-squared values).


## Conclusion

This analysis provides insight into how the scope of training data, either generalized or specific, influences the performance of predictive modeling in real estate. Depending on the results, we can conclude whether broader geographic factors or local specificity is more predictive of real estate prices in our models.
