In [None]:
The sudden rush of people to formerly-sparse city centers has created some undesirable effects such as increased housing prices, and raised concerns over gentrification in historically working-class neighborhoods. Currently the housing market is tight in our hottest cities, leading to a housing crisis.
https://www.tennessean.com/story/opinion/columnists/david-plazas/2018/07/13/affordable-housing-nashville-urban-crisis-grows-more-severe/777899002/

Importantly, it's common that gentrification complaints are much ado about nothing. Research has failed to find much displacement caused by gentrification in America, and this could mean that urban house prices are, in general, not rising very fast.

https://www.planetizen.com/blogs/105866-gentrification-mania

In [None]:
#Plotting average PPS per month
df.groupby(df['Sale Date'].dt.to_period('M'))['PPS'].median().plot(figsize=(9,9))
plt.ylabel('Median PPS', fontsize=15)
plt.xlabel('Sale Date', fontsize=15)
plt.title('Median PPS Over Time', fontsize=15)
#Look for luxury development completed in Jan 2018

In [None]:
#Plotting median sale price to compare to PPS
df.groupby(df['Sale Date'].dt.to_period('M'))['Sale Price'].median().plot(figsize=(8,8))
plt.xlabel('Sale Date', fontsize=15)
plt.ylabel('Median Sale Price (thousands)', fontsize=15)
plt.title('Median Sale Price Over Time', fontsize=15)

In [1]:
#Plotting Sale Count to see if there's any correlation
#Note that sale price correlates with the sale volume (exhibits similar seasonality)
df.groupby(df['Sale Date'].dt.to_period('M'))['Sale Price'].count().plot(fontsize=12, figsize=(12,8))
plt.ylabel('Sales Count', fontsize=15)
plt.xlabel('Sale Date', fontsize=15)
plt.title('Sale Count by Month', fontsize=15)
#Check against another city or housing index, MLS has publicly available data and Redfin to see if the pattern is normal

NameError: name 'df' is not defined

## Median sale price of a house in Nashville has increased over 40% in 5 years (according to Davidson County data).

Meanwhile the Consumer Price Index shows an inflation rate of [8.3%](https://www.usinflationcalculator.com/) over the same time period.

Nationally, median sale price of a house has grown very close to the inflation rate. Nashvillian prices are rising much faster.

![Source](https://fred.stlouisfed.org/graph/fredgraph.png?g=oNxC)

### Do I think this means Nashville's market is absolutely on fire right now? No.

### My theory is: Nashville had almost 0 high-end multifamily units (southern suburban-style city) and now people are building luxury condo towers and luxury mid-rise, filling a giant gap in the market. Additionally, construction of these luxury multifamily units raises prices of real estate nearby.
### But ultimately, I think a lot of this rise in median price can be attributed to the increasing poularity of renting among Americans. More Americans are renting now than at any point in the last 50 years.
https://www.pewresearch.org/fact-tank/2017/07/19/more-u-s-households-are-renting-than-at-any-point-in-50-years/
### Thus, homeowning is becoming a more exclusive activity.
### I'd like to make a time series graph showing which segments of the housing market grew. Is the sale price rising because all house prices are rising, or is it just because of an influx of high-end units, or some combination thereof?

### This trend shows heavy seasonality, but appears roughly on track with the general trend of housing sales in the US.

![Source](https://fred.stlouisfed.org/graph/fredgraph.png?g=oNxy)

https://tradingeconomics.com/united-states/housing-starts

The St. Louis Federal Reserve also has a great data interface for housing starts, and their data derives from the US Bureau of Labor Statistics. 

https://fred.stlouisfed.org/series/HOUST

In general housing starts have cooled off in 2019 (probably due to tariffs on lumber, steel and aluminum, crackdowns on undocumented labor). But a strain of "NIMBYism" is persistent in all stripes of Americans, which seems particularly resentful of mid-rise development. We could verify this by checking building permit data and seeing how long it takes for the permit to get approved, or if permits are denied, period. One recent example is a building moratorium in the DC area (which is currently experiencing rapid increases in house prices). This antipathy towards "urbanist" development could be stronger in Sunbelt cities because of their traditionally sprawled nature.

https://wamu.org/story/19/04/16/despite-housing-crunch-montgomery-county-expected-to-freeze-new-development/

But don't get the impression the anti-development attitude is limited to the Southeast. California, also in a housing crisis, recently shot down one of the most celebrated upzoning measures in decades.

https://sf.curbed.com/2019/5/10/18563360/senate-bill-50-chart-sb50-explainer-housing-transit

If the supply of new housing is heavily constricted, prices will probably keep rising.

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd 
%matplotlib inline
sns.set(style='darkgrid')
import matplotlib.dates as mdates

In [2]:
df1=pd.read_csv('nashville_20190827200234.csv', parse_dates = ['Most Recent Sale Date', 'Sale Date'], dtype={'Zone': str, 'Neighborhood': str})

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
#Correcting badly-entered data
df1.loc[176025, 'Sale Price'] = 37000
df1.loc[27056, 'Sale Price'] = 161000
df1.loc[191004, 'Sale Price'] = 810000
df1.loc[239278, 'Sale Price'] = 280395
df1.loc[138891, 'Sale Price'] = 200000
df1.loc[241961, 'Sale Date'] = '2019-03-04'
df1.loc[241961, 'Sale Date'] = pd.to_datetime(df1.loc[241961, 'Sale Date'])
df1.loc[241961, 'Sale Price'] = 370000
df1.loc[230115, 'Sale Price'] = 325000
df1.loc[53151, 'Sale Price'] = 310900
df1.loc[259815, 'Sale Price'] = 1513142
df1.loc[129682, 'Sale Price'] = 1300000
df1.loc[154271, 'Square Footage Improved'] = 10094
df1.drop(190142, inplace=True) #Has a massive ADU ~4 times the size of the 'main' structure. Main struct is low-grade, ADU is luxury.
df1.drop(128094, inplace=True) #Can't find out anything about this house. Improbably large sqft and acreage, low price.

In [4]:
#Creating new dataframe without duplicate entires
df=df1.sort_values(by='Sale Date').drop_duplicates(subset='Map & Parcel', keep = 'last')
#Want to create a new column which is the mean sale price per square foot of a parcel in that neighborhood
df['PPS']=df['Sale Price']/df['Square Footage Improved']
df['PPS']=df['PPS'].replace(np.inf, np.nan)
meanpps=df.groupby('Neighborhood')['PPS'].mean().to_frame().rename(columns={'PPS':'NeighborhoodPPS'})
df=df.merge(meanpps, how='left', left_on = 'Neighborhood',right_index=True)
#Dropping parcels that were involved in multi-parcel sales
df=df[df['Multiple Parcels Involved in Sale'] == 'No']
#Testing the averaged assessment ratio idea.
df['Assessment Ratio'] = df['Assessment Land Improved'] / df['Total Appraisal Value Improved']
nbhdratio=df.groupby('Neighborhood')['Assessment Ratio'].mean().to_frame().rename(columns={'Assessment Ratio':'Nbhd Ratio'})
df=df.merge(nbhdratio, how='left', left_on = 'Neighborhood',right_index=True)
df['Month']= df['Sale Date'].dt.month
df['Quarter'] = df['Sale Date'].dt.quarter
df['Year'] = df['Sale Date'].dt.year
#Dropping some obscure categories we can't decipher
df['Building Grade']= df['Building Grade'].str.replace(r'\w\w\w', '').dropna()
#Dropping some outliers
df=df[df.PPS <= 1150]
df=df[df.Fixtures <=23]
df=df[df['Land Area Acres'] <=10]

In [6]:
import statsmodels as sm
from statsmodels.tsa.statespace.sarimax import SARIMAX