# Read before running the code

1. The Zillow API have 1000 daily request limitation, please use your own zws_id and gkey (google api) to run the code for your city (each person responsible for two city)
2. Various function were written to pull: 1) 200 lat/long by city per request; 2) Address generated by lat/long; 3) Zillow information based on address. Nothing need to change / update for the function when run through the code.
3. When first run the code, please go the the next Markdown to read the instruction. Before running the code to get housing data, please __enter the city you want to get the data with__.
4. To gather valid data, we removed data that could not generate complete and valid zillow housing information. In addition, we will remove any data that did not belong to the right city.
5. To insure that we have at least 50 valid data for each city, we run a loop at the end if valid data count is less than 50
6. Once we have more than 50 valid data for each city, the code will save the file into the Clean Data Folder
7. When code related to pull Lat/Lng information from Google API have "Index" error, that means we have reached the API limit and need to change the API key
8. When message code form zillow is 7, it means we have reached the API limit and we need to change to another key

In [22]:
# ZILLOW DATA EXTRACTION WRITTEN BY SONIA YANG

# Dependencies
import requests
import urllib
import random
import math
import pandas as pd
import xml.etree.ElementTree as ET
import time
# from config import zws_id, gkey # please use your own Zillow & Google API keys!
zws_id='X1-ZWz1gbvc8dh5vv_1vfab'
gkey="AIzaSyDuR6Ej6fNbaY-gjZRaA0t3THaJw-UNai8"
from urllib.request import urlopen

In [23]:
# FUNCTION to grab the exact address based on longitude and latitude
# modified from here https://gist.github.com/bradmontgomery/5397472
# their example didn't include an API key, but I added it otherwise you'd hit the rate limit easily

def reverse_geocode(latitude, longitude):
    # Did the geocoding request comes from a device with a
    # location sensor? Must be either true or false
    sensor = 'true'

    # Hit Google's reverse geocoder directly
    # NOTE: I *think* their terms state that you're supposed to
    # use google maps if you use their api for anything.
    base = "https://maps.googleapis.com/maps/api/geocode/json?"
    params = "latlng={lat},{lon}&sensor={sen}&key={key}".format(
        lat=latitude,
        lon=longitude,
        sen=sensor,
        key=gkey
    )
    url = "{base}{params}".format(base=base, params=params)
    #print(url)
    response = requests.get(url).json()
    address = response['results'][0]['formatted_address']
    return address

In [24]:
# FUNCTION to generate random lat & lng within a certain radius 
# modified from here: http://hadoopguru.blogspot.com/2014/12/python-generate-random-latitude-and.html
# changed to take in an empty initial dataframe and load in the data + return it
# this calls the reverse geocode function to grab the addresses of each randomly generated lat & lng

def generate_addresses(latitude, longitude, df):
    
    radius = 5000                         #Choose your own radius
    radiusInDegrees=radius/111300            
    r = radiusInDegrees

    counter = 0
    
    for i in range(1,50):                 #Choose number of Lat Long to be generated

        u = float(random.uniform(0.0,1.0))
        v = float(random.uniform(0.0,1.0))

        w = r * math.sqrt(u)
        t = 2 * math.pi * v
        x = w * math.cos(t) 
        y = w * math.sin(t)

        xLat  = x + latitude
        yLng = y + longitude

        df.set_value(counter, "latitude", xLat)
        df.set_value(counter, "longitude", yLng)
        
        #print(format(counter) + ": " + format(xLat) + ", " + format(yLng))
        address = reverse_geocode(xLat, yLng).split(',')
        citystatezip = address[1] + address[2]
        
        df.set_value(counter, "address", address[0])
        df.set_value(counter, "city_state_zip", citystatezip)
        
        # Add to counter
        counter = counter + 1
    
    return df

In [25]:
# FUNCTION to call Zillow API's GetSearchResults and will check to see if a house exists at that address
# message code will be written to dataframe
# zillow url format
# http://www.zillow.com/webservice/GetSearchResults.htm?zws-id=<ZWSID>&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
        
def get_message_codes(df):

    for index, row in df.iterrows():

        try:
            url = 'https://www.zillow.com/webservice/GetSearchResults.htm?zws-id='
            address = row['address']
            citystatezip =row['city_state_zip']


            query_url = url + zws_id + '&address=' + urllib.parse.quote(address) + '&citystatezip=' + urllib.parse.quote(citystatezip) 
            #print(query_url)

            root = ET.parse(urlopen(query_url)).getroot()

            for message in root.iter('message'):
                message_code = message[1].text

            print(format(index) + ": " + message_code)

            df.set_value(index, 'message_code', message_code)

            time.sleep(0.5) #necessary bc bombarding Zillow with API calls doesn't allow enough time to respond to each

        except:
            break
    

In [26]:
# FUNCTION to call Zillow's GetDeepSearchResults and look up Zestimate, bed, and bath
# http://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=<ZWSID>&address=2114+Bigelow+Ave&citystatezip=Seattle%2C+WA
# there are some limitations such as multiple zestimates depending on when the house was sold/if it was sold multiple times
# the code to handle that would get too convoluted so I am just writing in the most recent (according to the API) values
# probably not what we would do in real life
# but a decision we made re: the scope of a classroom project on a short time constraint

def search_zillow(df):
    
    for index, row in df.iterrows():
        try:
            url = 'https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id='
            address = df['address'][index]
            citystatezip = df['city_state_zip'][index]


            query_url = url + zws_id + '&address=' + urllib.parse.quote(address) + '&citystatezip=' + urllib.parse.quote(citystatezip) 


            root = ET.parse(urlopen(query_url)).getroot()

            print("row " + format(index) + ": " + address + citystatezip)
            print(query_url)

            '''
               "year built","lot size","finished sq ft"'''
            
            #zpid
            for zpid in root.iter('zpid'):
                df.set_value(index,'zpid', zpid.text)
            
            # we already have the address from the address + citystatezip variables
            # so we don't need to grab it again
            # same with lat & lng already being in the table
            
            #valuation (high and low)
            for valuation in root.iter('valuationRange'):
                highValuation = valuation[1].text
                lowValuation = valuation[0].text
                df.set_value(index, 'valuation_high', highValuation)
                df.set_value(index, 'valuation_low', lowValuation)
            
            #zestimate
            for zestimate in root.iter('zestimate'):
                zestimate_value = zestimate[0].text

                if zestimate_value is None:
                    print('not for sale')
                else:
                    print ('zestimate (value): ' + format(zestimate[0].text)) 
                    df.set_value(index, 'zestimate', zestimate_value)
             
            #home value index
            for zindexValue in root.iter('zindexValue'):
                df.set_value(index, 'home value index', zindexValue.text)
            
            #tax assessment
            for taxAssessment in root.iter('taxAssessment'):
                df.set_value(index, 'tax assessment', taxAssessment.text)
                
            #tax assessment year
            for taxAssessmentYear in root.iter('taxAssessmentYear'):
                df.set_value(index, 'tax assess year', taxAssessmentYear.text)
                
            #year built
            for yearBuilt in root.iter('yearBuilt'):
                df.set_value(index, 'year built', yearBuilt.text)
             
            #lot size sq ft
            for lotSizeSqFt in root.iter('lotSizeSqFt'):
                df.set_value(index, 'lot size', lotSizeSqFt.text)
            
            #finished sq ft
            for finishedSqFt in root.iter('finishedSqFt'):
                df.set_value(index, 'finished sq ft', finishedSqFt.text)
            
            #bedrooms
            for bedroom in root.iter('bedrooms'):
                bedrooms = bedroom.text
                #print("bedrooms: " + bedrooms)
                df.set_value(index, 'bedrooms', bedrooms)

            #bathrooms
            for bathroom in root.iter('bathrooms'):
                bathrooms = bathroom.text
                #print("bathrooms: " + bathrooms + "\n")
                df.set_value(index, 'bathrooms', bathrooms)           
            
            print('\n')

            time.sleep(0.5) 


        except:
            break


<h2>HOW TO RUN THIS CODE</h2>
<ol>
<li>Initialize an empty dataframe with the fields as marked below</li>
<li>Call the <strong>generate_addresses</strong> function passing in your empty dataframe</li>
<li>Call the <strong>get_message_codes</strong> to update your dataframe with message codes indicating whether or not a valid property exists at each address. <strong>IMPORTANT:</strong> please register your own Zillow account/get your own key for this!! If we all keep using the same one we'll easily hit the rate limit </li>
<li>Drop the rows in the dataframe for which a property does not exist at that address</li>
<li>Call the <strong>search_zillow</strong> function to get the zestimate (aka price of the property), # of bedrooms, and # of bathrooms</li>
<li>I did not include it in my code, but once you get a sample size of data that you are satisfied with for the city, maybe write it out to a CSV so you don't have to keep running this code/can use it later</li>
</ol>

feel free to comment out my print statements while the functions are running if you find them distracting

In [36]:
# Read cities file to pull the Latitude and Longtitude
Cities=pd.read_csv('../Raw_Data/LA_cities_Lat_lng_codes_data.csv')
print(f'{Cities["address"]}')
city1 = input("Please input first city your want to pull data")
selectcity = Cities.loc[Cities["address"] == city1, :]
LAT = selectcity.iloc[0,1]
LNG = selectcity.iloc[0,2]

0       Los Angeles
1        Long Beach
2          Glendale
3         Lancaster
4          Palmdale
5     Santa Clarita
6            Pomona
7          Torrance
8          Pasadena
9         Inglewood
10          Compton
11           Downey
12      West Covina
13          Norwalk
14          Burbank
15       South Gate
16         El Monte
17         Whittier
18         Alhambra
Name: address, dtype: object
Please input first city your want to pull dataSanta Clarita


In [37]:
# HOW TO RUN ALL THE FUNCTIONS, USING LOS ANGELES AS AN EXAMPLE

# coordinates taken from the CitiesGeo_Output.csv
# we should manually run the following code on each individual city instead of nesting it in another loop
# while this may be hardcoded, it's better than waiting on one gigantic loop that takes forever

# STEP 1: INITALIZE THE DATAFRAME
# if we need any more fields, let me know
la_df = pd.DataFrame({"zpid": '',
                      "address":'',
                      "city_state_zip":'',
                      "latitude":'',
                      "longitude":'',
                      "message_code":'',
                      "zestimate":'',
                      "valuation_high":'',
                      "valuation_low": '',
                      "home value index":'',
                      "tax assessment":'',
                      "tax assess year":'',
                      "year built":'',
                      "lot size":'',
                      "finished sq ft":'',
                      "bedrooms":'',
                      "bathrooms":''}, index=[0])

# reorder the columns
la_df = la_df[["zpid", "address","city_state_zip","latitude","longitude","message_code","zestimate",
               "valuation_high","valuation_low","home value index","tax assessment","tax assess year",
               "year built","lot size","finished sq ft","bedrooms","bathrooms"]]

# STEP 2: GENERATE RANDOM ADDRESSES IN THE DESIGNATED AREA
# pass in the coordinates for Los Angeles plus the empty dataframe
generate_addresses(LAT,LNG, la_df) 

#la_df

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,,I-5,Santa Clarita CA 91355,34.3912,-118.57,,,,,,,,,,,,
1,,22920-22926 Posada Dr,Santa Clarita CA 91354,34.4342,-118.538,,,,,,,,,,,,
2,,25532 Tournament Rd,Santa Clarita CA 91355,34.3851,-118.557,,,,,,,,,,,,
3,,26564-26598 N Cocklebur Ln,Santa Clarita CA 91351,34.42,-118.51,,,,,,,,,,,,
4,,23957-24053 Magic Mountain Pkwy,Santa Clarita CA 91355,34.4179,-118.555,,,,,,,,,,,,
5,,25000-25198 Barnhill Rd,Saugus CA 91350,34.3937,-118.518,,,,,,,,,,,,
6,,23871 McBean Pkwy,Santa Clarita CA 91355,34.3966,-118.555,,,,,,,,,,,,
7,,22977 Sierra Hwy,Newhall CA 91321,34.3583,-118.515,,,,,,,,,,,,
8,,23662 White Oak Ct,Santa Clarita CA 91321,34.3618,-118.55,,,,,,,,,,,,
9,,25674 Oak Meadow Dr,Valencia CA 91381,34.3968,-118.582,,,,,,,,,,,,


In [38]:
# STEP 3: CALL THE ZILLOW API TO GET MESSAGE CODES
# 0 means there is a valid property at that address
# 508 and anything else means there isn't
# if you get nothing but invalid message codes, re-run STEP 2
# you might have to sign up for a new Zillow account if you keep getting invalid results here
# there is a possibility you hit the rate limit

get_message_codes(la_df)

0: 508
1: 508
2: 508
3: 508
4: 508
5: 508
6: 508
7: 508
8: 0
9: 0
10: 0
11: 508
12: 0
13: 508
14: 508
15: 508
16: 0
17: 508
18: 508
19: 508
20: 508
21: 508
22: 508
23: 508
24: 508
25: 508
26: 508
27: 508
28: 508
29: 0
30: 508
31: 0
32: 508
33: 508
34: 508
35: 0
36: 0
37: 508
38: 0
39: 508
40: 508
41: 508
42: 508
43: 508
44: 508
45: 0
46: 508
47: 508
48: 508


In [39]:
# STEP 4: DROP INVALID ENTRIES FROM DATAFRAME 
# cull all the rows where houses do not exist at the address
# take what is valid (message code of '0')
# the code sometimes might break/not get a response from the server so it's better to take what IS valid

la_df = la_df[la_df.message_code == '0']

# take out items that does not belong to the select city
la_df=la_df[la_df.city_state_zip.str.contains(city1) == True]

la_df

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
8,,23662 White Oak Ct,Santa Clarita CA 91321,34.3618,-118.55,0,,,,,,,,,,,
10,,21563 Cleardale St,Santa Clarita CA 91321,34.3895,-118.51,0,,,,,,,,,,,
12,,24001 Briardale Way,Santa Clarita CA 91321,34.3578,-118.54,0,,,,,,,,,,,
16,,26327 Emerald Dove Dr,Santa Clarita CA 91355,34.4047,-118.565,0,,,,,,,,,,,
29,,22405 Cardiff Dr,Santa Clarita CA 91350,34.4033,-118.525,0,,,,,,,,,,,
31,,26215 Paolino Pl,Santa Clarita CA 91355,34.3992,-118.566,0,,,,,,,,,,,
35,,25458 Ave Escalera,Santa Clarita CA 91355,34.3865,-118.553,0,,,,,,,,,,,
36,,23649 Newhall Ave,Santa Clarita CA 91321,34.3687,-118.518,0,,,,,,,,,,,
38,,23614 Neargate Dr,Santa Clarita CA 91321,34.3728,-118.547,0,,,,,,,,,,,
45,,24902 Avignon Dr,Santa Clarita CA 91355,34.4198,-118.571,0,,,,,,,,,,,


In [40]:
# STEP 5: SEARCH ZILLOW AND GET ZESTIMATE, BEDROOMS, & BATHROOMS
# fill the dataframe with the data

search_zillow(la_df)
la_df

row 8: 23662 White Oak Ct Santa Clarita CA 91321
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=23662%20White%20Oak%20Ct&citystatezip=%20Santa%20Clarita%20CA%2091321
zestimate (value): 1070100


row 10: 21563 Cleardale St Santa Clarita CA 91321
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=21563%20Cleardale%20St&citystatezip=%20Santa%20Clarita%20CA%2091321
zestimate (value): 413606


row 12: 24001 Briardale Way Santa Clarita CA 91321
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=24001%20Briardale%20Way&citystatezip=%20Santa%20Clarita%20CA%2091321
zestimate (value): 743508


row 16: 26327 Emerald Dove Dr Santa Clarita CA 91355
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=26327%20Emerald%20Dove%20Dr&citystatezip=%20Santa%20Clarita%20CA%2091355
zestimate (value): 514185


row 29: 2240

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
8,20205962,23662 White Oak Ct,Santa Clarita CA 91321,34.3618,-118.55,0,1070100,1166409,1016595,459200,18.0,2017,,137476,,,
10,20208623,21563 Cleardale St,Santa Clarita CA 91321,34.3895,-118.51,0,413606,454967,351565,476300,128569.0,2017,1915.0,436280,534.0,1.0,1.0
12,20205856,24001 Briardale Way,Santa Clarita CA 91321,34.3578,-118.54,0,743508,780683,706333,459200,593077.0,2017,1985.0,15921,3059.0,4.0,3.0
16,20227938,26327 Emerald Dove Dr,Santa Clarita CA 91355,34.4047,-118.565,0,514185,539894,483334,530000,436742.0,2017,1988.0,7083,1532.0,3.0,3.0
29,20210604,22405 Cardiff Dr,Santa Clarita CA 91350,34.4033,-118.525,0,768470,806893,730047,476300,672258.0,2017,1988.0,21498,2573.0,4.0,3.0
31,20227860,26215 Paolino Pl,Santa Clarita CA 91355,34.3992,-118.566,0,656110,688915,623305,530000,579477.0,2017,1987.0,12000,1839.0,3.0,3.0
35,20222345,25458 Ave Escalera,Santa Clarita CA 91355,34.3865,-118.553,0,697020,731871,662169,530000,631734.0,2017,1969.0,7069,2327.0,4.0,3.0
36,20205334,23649 Newhall Ave,Santa Clarita CA 91321,34.3687,-118.518,0,638108,689157,555154,459200,289438.0,2017,1948.0,118401,1530.0,3.0,1.0
38,20207160,23614 Neargate Dr,Santa Clarita CA 91321,34.3728,-118.547,0,769070,807523,730617,459200,415000.0,2017,1964.0,12673,2200.0,4.0,4.0
45,20228830,24902 Avignon Dr,Santa Clarita CA 91355,34.4198,-118.571,0,621945,653042,590848,530000,196847.0,2017,1998.0,15447,1919.0,2.0,3.0


In [41]:
# do any further data cleaning you need to yourself
# for example, dropping any rows with NaN values
la_df = la_df.dropna(axis=0, how='any')
la_df

# maybe write to CSV to store the data for usage later/before doing plots? so you don't have to rerun everything

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
10,20208623,21563 Cleardale St,Santa Clarita CA 91321,34.3895,-118.51,0,413606,454967,351565,476300,128569.0,2017,1915,436280,534,1,1.0
12,20205856,24001 Briardale Way,Santa Clarita CA 91321,34.3578,-118.54,0,743508,780683,706333,459200,593077.0,2017,1985,15921,3059,4,3.0
16,20227938,26327 Emerald Dove Dr,Santa Clarita CA 91355,34.4047,-118.565,0,514185,539894,483334,530000,436742.0,2017,1988,7083,1532,3,3.0
29,20210604,22405 Cardiff Dr,Santa Clarita CA 91350,34.4033,-118.525,0,768470,806893,730047,476300,672258.0,2017,1988,21498,2573,4,3.0
31,20227860,26215 Paolino Pl,Santa Clarita CA 91355,34.3992,-118.566,0,656110,688915,623305,530000,579477.0,2017,1987,12000,1839,3,3.0
35,20222345,25458 Ave Escalera,Santa Clarita CA 91355,34.3865,-118.553,0,697020,731871,662169,530000,631734.0,2017,1969,7069,2327,4,3.0
36,20205334,23649 Newhall Ave,Santa Clarita CA 91321,34.3687,-118.518,0,638108,689157,555154,459200,289438.0,2017,1948,118401,1530,3,1.0
38,20207160,23614 Neargate Dr,Santa Clarita CA 91321,34.3728,-118.547,0,769070,807523,730617,459200,415000.0,2017,1964,12673,2200,4,4.0
45,20228830,24902 Avignon Dr,Santa Clarita CA 91355,34.4198,-118.571,0,621945,653042,590848,530000,196847.0,2017,1998,15447,1919,2,3.0


In [42]:
# review current data and see if more data is needed (at least 50 valid data per city)
add_df = pd.DataFrame(la_df)
final_df = add_df
final_df = final_df.reset_index(drop=True)
len(final_df)

9

In [43]:
# If minimum 50 valid data counts is not met, we will loop through the codes above to make sure we have sufficient data
while(len(final_df)<50):
    generate_addresses(LAT,LNG, la_df) 
    get_message_codes(la_df)
    la_df = la_df[la_df.message_code == '0']
    la_df=la_df[la_df.city_state_zip.str.contains(city1) == True]
    search_zillow(la_df)
    la_df = la_df.dropna(axis=0, how='any')
    add_df = add_df.append(la_df, ignore_index=True)
    final_df = add_df.drop_duplicates()
len(final_df)

10: 508
12: 508
16: 0
29: 508
31: 508
35: 508
36: 508
38: 508
45: 508
0: 508
1: 508
2: 0
3: 508
4: 0
5: 508
6: 508
7: 508
8: 0
9: 508
11: 508
13: 0
14: 508
15: 508
17: 508
18: 508
19: 0
20: 508
21: 508
22: 508
23: 508
24: 508
25: 508
26: 508
27: 508
28: 508
30: 508
32: 508
33: 508
34: 508
37: 508
39: 508
40: 508
41: 508
42: 508
43: 508
44: 0
46: 0
47: 0
48: 508
row 16: 23649 Newhall Ave Santa Clarita CA 91321
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=23649%20Newhall%20Ave&citystatezip=%20Santa%20Clarita%20CA%2091321
zestimate (value): 638108


row 2: 22503 Los Tigres Dr Santa Clarita CA 91350
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=22503%20Los%20Tigres%20Dr&citystatezip=%20Santa%20Clarita%20CA%2091350
zestimate (value): 526268


row 8: 23401 Westford Pl Santa Clarita CA 91354
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=2

33: 508
34: 508
35: 508
36: 508
37: 508
39: 7
40: 508
41: 0
42: 7
43: 508
44: 508
46: 508
47: 508
48: 508
row 0: 24234 Lema Dr Santa Clarita CA 91355
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=24234%20Lema%20Dr&citystatezip=%20Santa%20Clarita%20CA%2091355
zestimate (value): 459677


row 11: 22415 Cardiff Dr Santa Clarita CA 91350
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=22415%20Cardiff%20Dr&citystatezip=%20Santa%20Clarita%20CA%2091350
zestimate (value): 686203


row 19: 23944 Windward Ln Santa Clarita CA 91355
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=23944%20Windward%20Ln&citystatezip=%20Santa%20Clarita%20CA%2091355
zestimate (value): 873162


row 25: 22601 White Wing Way Santa Clarita CA 91350
https://www.zillow.com/webservice/GetDeepSearchResults.htm?zws-id=X1-ZWz1gbvc8dh5vv_1vfab&address=22601%20White%20Wing%20Way&ci

55

In [44]:
# review final city data before save the file
final_df.head()

Unnamed: 0,zpid,address,city_state_zip,latitude,longitude,message_code,zestimate,valuation_high,valuation_low,home value index,tax assessment,tax assess year,year built,lot size,finished sq ft,bedrooms,bathrooms
0,20208623,21563 Cleardale St,Santa Clarita CA 91321,34.3895,-118.51,0,413606,454967,351565,476300,128569.0,2017,1915,436280,534,1,1.0
1,20205856,24001 Briardale Way,Santa Clarita CA 91321,34.3578,-118.54,0,743508,780683,706333,459200,593077.0,2017,1985,15921,3059,4,3.0
2,20227938,26327 Emerald Dove Dr,Santa Clarita CA 91355,34.4047,-118.565,0,514185,539894,483334,530000,436742.0,2017,1988,7083,1532,3,3.0
3,20210604,22405 Cardiff Dr,Santa Clarita CA 91350,34.4033,-118.525,0,768470,806893,730047,476300,672258.0,2017,1988,21498,2573,4,3.0
4,20227860,26215 Paolino Pl,Santa Clarita CA 91355,34.3992,-118.566,0,656110,688915,623305,530000,579477.0,2017,1987,12000,1839,3,3.0


In [46]:
# Save file into Clean Data folder
final_df.to_csv(f'../Clean_Data/5-1.{city1}_zillow_data.csv')