## Housing Market Analysis
Last Edited: 2021/10/10, amacias

Things that are covered in this notebook:

* download public datatset
* prep data into dataframes

## Datasets 

### FHFA Datasets - (Index Only - no hard sales data)

* <a href="https://www.fhfa.gov/DataTools/Downloads" target="_blank">FHFA Datasets</a>

* <a href="https://www.fhfa.gov/DataTools/Downloads/Pages/House-Price-Index.aspx" target="_blank">House-Price-Index-Datasets</a>

* <a href="https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_PO_monthly_hist.xls" target="_blank">Direct Download: Monthly-House-Price-Index-Datasets</a>

* <a href="https://www.fhfa.gov/DataTools/Downloads/Pages/Public-Use-Databases.aspx" target="_blank">Public-Use-Datasets, incl Multi-Family</a>

* <a href="https://www.fhfa.gov/DataTools/Downloads/Documents/Enterprise-PUDB/Multi-Family_National_File_/2020_MFNationalFile2020.zip" target="_blank">Direct Download: Public-Use-Datasets, incl Multi-Family</a>


### Other Datasets - .

* <a href="https://www.huduser.gov/portal/pdrdatas_landing.html" target="_blank">HUD Datasets</a>

* <a href="https://www.census.gov/construction/chars/microdata.html" target="_blank">US Census Survey of Construction (SOC) Micro Datasets</a>





In [6]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import pandas as pd

### Data Collection

In [15]:
import os
import datetime
#import tarfile
import urllib.request

# setting the date format also sets the frequency that the data will be refreshed
today = datetime.date.today()
strdir=today.strftime("%Y%m") # monthly
#strdir=today.strftime("%Y%m%d") # daily

#File IO
PROJECT_ROOT_DIR = "."
#HOUSING_URL = "https://www.fhfa.gov/HPI_master.csv"
HOUSING_URL = "https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_PO_monthly_hist.xls"
HOUSING_PATH = os.path.join(PROJECT_ROOT_DIR, strdir, 'hpimaster')

# only update the data if the directory doesn't exist
def fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):
    if not os.path.isdir(housing_path):
        print('making new directory: ', housing_path)
        os.makedirs(housing_path)
        csv_path = os.path.join(housing_path, "housing.csv")
        urllib.request.urlretrieve(housing_url, csv_path)
    #housing_tgz = tarfile.open(tgz_path)
    #housing_tgz.extractall(path=housing_path)
    #housing_tgz.close()

def load_housing_data(housing_path=HOUSING_PATH):
    csv_path = os.path.join(housing_path, "housing.csv")
    return pd.read_csv(csv_path)


In [16]:
HOUSING_URL = "https://www.fhfa.gov/DataTools/Downloads/Documents/HPI/HPI_PO_monthly_hist.xls"
HOUSING_PATH = os.path.join(PROJECT_ROOT_DIR, strdir, 'hpimaster')

fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH)
housing = load_housing_data()


### Data Analysis

In [17]:
housing.tail()

#housing.info()

Unnamed: 0,hpi_type,hpi_flavor,frequency,level,place_name,place_id,yr,period,index_nsa,index_sa
116433,developmental,purchase-only,quarterly,Puerto Rico,Puerto Rico,PR,2020,2,164.88,163.6
116434,developmental,purchase-only,quarterly,Puerto Rico,Puerto Rico,PR,2020,3,166.57,168.33
116435,developmental,purchase-only,quarterly,Puerto Rico,Puerto Rico,PR,2020,4,174.85,171.3
116436,developmental,purchase-only,quarterly,Puerto Rico,Puerto Rico,PR,2021,1,182.45,187.17
116437,developmental,purchase-only,quarterly,Puerto Rico,Puerto Rico,PR,2021,2,190.68,188.83
