## Get Zillow Values

This notebook will take the King County data for properties that are in the danger zone, run the addresses through a Zillow API to get the market values of the property.

It will then determine the difference and see if it's statistically significant

In [None]:
import urllib.parse
import xml.etree.ElementTree as ET
import pandas as pd
import numpy as np
import requests
from scipy import stats

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

In [None]:
# Set some Zillow stuff

baseurl = 'http://www.zillow.com/webservice/GetSearchResults.htm?'
zws_id = 'zws-id=X1-ZWz1h0jrvdalmz_52lx1'

In [None]:
# This will take in a dataframe of address and return the zestimate as part of the dataframe

def get_zillow_values(df):
    zillow_values = []

    for i, data in df.iterrows():
        
# Create a url-encoded version of the street address
        urlstr_addy = urllib.parse.quote(data['ADDR_FULL'])
    
# Create a url-encoded version of the city, state and zip
        url_city_state = urllib.parse.quote(data['CTYNAME'] + ' WA ' + str(data['ZIP5']))
    
# Create the zillow api url
        url = baseurl + zws_id + '&address=' + urlstr_addy + '&citystatezip=' + url_city_state
        response = requests.get(url)
        root = ET.fromstring(response.text)

        response_code = root[1][1].text

# If Zillow returns a value, cature it and put it in the list. If there is no value due to an error, simply
# put a -1 in the list and move on
        if response_code == '0':
            try:
                zillow_values.append(int(root[2][0][0][3][0].text))
            except:
                zillow_values.append(0)

        else:
            zillow_values.append(0)


    
    return zillow_values

In [None]:
# Read the csv file that has all of the threatened King County properties.

king_df = pd.read_csv('../data/danger_king_robinson.csv')
king_df.head()

In [None]:
# Since I only need 1000 values, I'm going to drop any nulls

king_df.dropna(inplace=True)

In [None]:
# Add a column for total taxable values

king_df['TOTALTAX'] = king_df['TAX_IMPR'] = king_df['TAX_LNDVAL']

In [None]:
# Get only the ones with a taxable value, and then pull 1k of those.

zillow_df = king_df[ king_df['TOTALTAX'] != 0].sample(1000, random_state = 42)

In [None]:
# Get the zillow values

zillow_values = get_zillow_values(zillow_df)

In [None]:
# Add them to the dataframe so we can more easily compare

king_df['ZILLOW'] = zillow_df

In [None]:
# Drop any 0 zillow values, as we don't want them messing up the mean

king_df = king_df[ king_df['ZILLOW'] != 0]

In [None]:
# Check to see how many remain

king_df.shape

In [None]:
zil_mean = king_df['ZILLOW'].mean()
kc_mean = king_df['TOTALTAX'].mean()
diff = np.round( ((zil_mean / kc_mean) * 100), 4)
print(diff)

In [None]:
# Time for a t-test

stats.ttest_rel(king_df['ZILLOW'], king_df['TOTALTAX'])

#### Analysis:

Statistically significant. That's a pretty low P-Value.

So, Zillow is 120.1053% greater than taxable.