## Download Zillow Properties That Are For Sale

Using scraped Zillow data from RapidAPI.

Based on <https://colab.research.google.com/drive/1yKXmGjFgDeBuq2l2PkvyAuDiywl8DF2c?usp=sharing&utm_source=google#scrollTo=Gu9OGlGTrx4k>

Note that the Zillow API limits you to 40 rows of data at a time and API uses per month for free. After that, it is $0.08 per use. The "free" plan also allows you only 10 requests per minute.

In [4]:
import pandas as pd
import requests
import json
import time

# show all columns
pd.set_option('display.max_columns', None)


The search is set to look at Raleigh. You can use cities, counties, and zip codes. See the API documentation.

In [10]:

city = 'raleigh'
state = 'nc'
search_str = city + ', ' + state
print('Search string:', search_str)


Search string: raleigh, nc


This is the code chunk that goes to the Rapid API webpage and gets the Zillow data. The important items are in the `querystring`. You can change these options, or add new ones. See the documentation.

Remember, you can only get 40 rows at a time. This means 40 houses. So, I am searching with different min and max prices to try to pull less than 40 at a time.

Finally, you need to insert your own API key.

In [34]:
# get data
url = "https://zillow-com1.p.rapidapi.com/propertyExtendedSearch"

querystring = {"location":search_str,
               "home_type":"Houses",
               "status_type":"ForSale",
               "minPrice":"375001",
               "maxPrice":"400000"}

headers = {
    'x-rapidapi-host': "zillow-com1.p.rapidapi.com",
    'x-rapidapi-key': "INSERT YOUR API CODE HERE"
    }

z_for_sale_resp = requests.request("GET", url, headers=headers, params=querystring)

# transform to json
z_for_sale_resp_json = z_for_sale_resp.json()
z_for_sale_resp_json

{'props': [{'dateSold': None,
   'propertyType': 'SINGLE_FAMILY',
   'lotAreaValue': 0.28,
   'address': '7912 Flanagan Pl, Raleigh, NC 27612',
   'daysOnZillow': -1,
   'price': 389000,
   'listingDateTime': None,
   'longitude': -78.72406,
   'latitude': 35.86662,
   'contingentListingType': None,
   'listingStatus': 'FOR_SALE',
   'zpid': '53455833',
   'listingSubType': {'is_FSBA': True},
   'imgSrc': 'https://photos.zillowstatic.com/fp/8f121315a1ac62204d52c3f4bfbf2f58-p_e.jpg',
   'livingArea': 1389,
   'bathrooms': 3,
   'lotAreaUnit': 'acres',
   'country': 'USA',
   'currency': 'USD',
   'bedrooms': 3,
   'hasImage': True},
  {'dateSold': None,
   'propertyType': 'SINGLE_FAMILY',
   'lotAreaValue': 6969.6,
   'address': '9208 Shallcross Way, Raleigh, NC 27617',
   'daysOnZillow': -1,
   'price': 384500,
   'listingDateTime': None,
   'longitude': -78.73121,
   'latitude': 35.90533,
   'contingentListingType': None,
   'listingStatus': 'FOR_SALE',
   'zpid': '50117060',
   'list

The data gets downloaded into something called a JSON file. This is a particular type of data structure and is very common. We can convert it to a DataFrame.

Note the number of rows. If your search returns 40 rows, you're probably hitting the download limit and not downloading everything that you could.

In [35]:
# view data
df_z_for_sale = pd.json_normalize(data=z_for_sale_resp_json['props'])
print('Num of rows:', len(df_z_for_sale))
print('Num of cols:', len(df_z_for_sale.columns))
df_z_for_sale.head()

Num of rows: 30
Num of cols: 24


Unnamed: 0,dateSold,propertyType,lotAreaValue,address,daysOnZillow,price,listingDateTime,longitude,latitude,contingentListingType,listingStatus,zpid,imgSrc,livingArea,bathrooms,lotAreaUnit,country,currency,bedrooms,hasImage,listingSubType.is_FSBA,listingSubType.is_openHouse,listingSubType.is_comingSoon,listingSubType.is_newHome
0,,SINGLE_FAMILY,0.28,"7912 Flanagan Pl, Raleigh, NC 27612",-1,389000,,-78.72406,35.86662,,FOR_SALE,53455833,https://photos.zillowstatic.com/fp/8f121315a1a...,1389,3,acres,USA,USD,3,True,True,,,
1,,SINGLE_FAMILY,6969.6,"9208 Shallcross Way, Raleigh, NC 27617",-1,384500,,-78.73121,35.90533,,FOR_SALE,50117060,https://photos.zillowstatic.com/fp/2dfb7fe5dc9...,1443,3,sqft,USA,USD,3,True,True,,,
2,,SINGLE_FAMILY,0.46,"3101 Sherry Dr, Raleigh, NC 27604",-1,389000,,-78.585014,35.820602,,FOR_SALE,6377140,https://photos.zillowstatic.com/fp/ce6420acbfd...,1983,3,acres,USA,USD,4,True,True,True,,
3,,SINGLE_FAMILY,0.6,"1509 Yakimas Rd, Raleigh, NC 27603",-1,399900,,-78.65298,35.608692,,FOR_SALE,215490802,https://photos.zillowstatic.com/fp/f0a2f1a719a...,2221,3,acres,USA,USD,3,True,True,,,
4,,SINGLE_FAMILY,0.46,"5612 Cardinal Landing Dr, Raleigh, NC 27603",-1,398500,,-78.655945,35.62761,,FOR_SALE,50118263,https://photos.zillowstatic.com/fp/f44e77027df...,2121,3,acres,USA,USD,3,True,True,,,


This is where I am taking the downloaded data, renaming it based on my search parameters (min and max prices), and saving it as a CSV file.

This is really ugly code that I should just create parameter values as local variables and then loop through a list. But, it's tough to know how many observations you'll get with each search, so I'm doing it by hand.

In [36]:
df_z_for_sale_375001_400000 = df_z_for_sale

In [38]:
df_z_for_sale_0_100000.to_csv('df_z_for_sale_0_100000.csv')
df_z_for_sale_100001_200000.to_csv('df_z_for_sale_100001_200000.csv')
df_z_for_sale_200001_300000.to_csv('df_z_for_sale_200001_300000.csv')
df_z_for_sale_300001_350000.to_csv('df_z_for_sale_300001_350000.csv')
df_z_for_sale_350001_375000.to_csv('df_z_for_sale_350001_375000.csv')
df_z_for_sale_375001_400000.to_csv('df_z_for_sale_375001_400000.csv')

