# HOPUS

HOPUS (**HO**using **P**ricing **U**tilitie**S**) contains a variety of routines used to predict real estate prices.

This notebook highlights what HOPUS can do, namely
- clean the raw data,
- perform exploratory analysis of the data,
- train a variety of models for the prediction of real estate prices, and
- evaluate the performance of these models.

## Technical preliminaries

First we clone the HOPUS repo to have access to all the data and routines therein.

In [1]:
!git clone https://github.com/aremondtiedrez/hopus.git
%cd hopus

Cloning into 'hopus'...
remote: Enumerating objects: 86, done.[K
remote: Counting objects: 100% (86/86), done.[K
remote: Compressing objects: 100% (61/61), done.[K
remote: Total 86 (delta 32), reused 69 (delta 18), pack-reused 0 (from 0)[K
Receiving objects: 100% (86/86), 596.60 KiB | 2.79 MiB/s, done.
Resolving deltas: 100% (32/32), done.
/content/hopus


We can now import the requisite modules from HOPUS.

In [2]:
import preprocessing

## Problem: Real estate prices vary over time

Predicting how real estate prices vary over time is a valuable and complicated endeavour that requires accounting for the evolution of macro-economic phenomena such as mortgage rates, housing supply, public policy, and good ole' demographics.

HOPUS is *not* built to do that. Instead, the focus of HOPUS is on predicting the sale price of a home given the *characteristics* of that home. Nonetheless, home prices in the U.S. grown at a rate of about 5\% a year for the past 30 years, so clearly accounting for *when* a home is bought or sold is crucial in estimating its price accurately.

Thankfully, Standard & Poor's publishes, every month, an index tracking the price of single-family homes in the U.S. We will use this index to, admittedly coarsely, account for the temporal variation in home prices.

In [5]:
# Load the home price index as a pandas DataFrame
hpi = preprocessing.home_price_index.load()

In [6]:
hpi

Unnamed: 0,observation_date,CSUSHPINSA
0,1987-01-01,63.732
1,1987-02-01,64.131
2,1987-03-01,64.467
3,1987-04-01,64.972
4,1987-05-01,65.546
...,...,...
461,2025-06-01,331.627
462,2025-07-01,330.986
463,2025-08-01,329.885
464,2025-09-01,328.978
