# Housing Data Preprocessing
_Calvin Whealton_

This notebook takes the raw Zillow housing data and converts it into an monthly precentage change in the housing value. The Zillow data is available from https://www.zillow.com/research/data/ and specifically the Zillow Home Value Index (ZHVI) is used in this analysis. The result of this notebook will be a incorporated in the feature matrix for each zip code-time interval and in the predictions following floods.

In [None]:
import pandas as pd 
import numpy as np
import os

Reading file from the designated URL.

In [None]:
zillow_data = pd.read_csv('http://files.zillowstatic.com/research/public_v2/zhvi/Zip_zhvi_uc_sfrcondo_tier_0.33_0.67_sm_sa_mon.csv')

In [None]:
zillow_data.head()

Extracting the column codes that indicate time index.

In [None]:
cols_time = zillow_data.columns[9:zillow_data.shape[1]]
cols_time

Some calculations to determine the number of null values in the time series.

In [None]:
# number of nulls
zillow_data[cols_time].isnull().sum(1).sum()

In [None]:
# number of possible values
zillow_data.shape[0]*len(cols_time)

In [None]:
# number of non-null values
8960532-2110998

Completing calculations for the monthly percentage increase in the Zillow Housing Value Index (ZHVI). The formula used will be:


<div align="center">Pct Increase i = 100x(zhvi_(i)-zhvi(i-1))/zhvi(i-1).</div>


Therefore, if the value is 100 in month _i-1_ and 110 in month _i_, the result will be 100x(110-100)/100 = 10%.

Making a new dataframe that will hold the month-over-month percent difference.

In [None]:
zillow_mon_pct_val = pd.DataFrame()

In [None]:
zillow_mon_pct_val['GEOID10_str'] = zillow_data['RegionName'].apply(lambda x: '{0:0>5}'.format(x))

In [None]:
# will loop over the time columns
# first itertation takes second month relative to first month
# i index not over whole range because need to have one less month for the percentages
for i in range(len(cols_time)-1):
    zillow_mon_pct_val[cols_time[i+1]] = 100*(zillow_data[cols_time[i+1]]-zillow_data[cols_time[i]])/(zillow_data[cols_time[i]])

In [None]:
zillow_mon_pct_val.head()

Saving data

In [None]:
os.chdir('/Users/calvinwhealton/Documents/GitHub/floods_housing_zipcode/data/processed_data')
zillow_mon_pct_val.to_csv('zillow_mon_pct_val.csv')