##  **Step 1:  Necessary Packages**

Before we start, make sure you have the necessary libraries installed to scrape Zillow rental listings and perform data analysis.

- **requests**: To interact with web resources
- **pandas**: For data manipulation
- **plotly**: For visualizations

In [None]:
!pip install requests pandas plotly -q

In [1]:
import requests
import pandas as pd
import getpass

In [15]:
# Function to get listing data from Scrapeak API
def get_zillow_listings(api_key, listing_url):
    api_url = "https://app.scrapeak.com/v1/scrapers/zillow/listing"
    params = {"api_key": api_key, "url": listing_url}
    response = requests.get(api_url, params=params)
    return response

In [16]:
def extract_listings(response):
    listings = []
    try:
        rental_listings = response.json()['data']['cat1']['searchResults']['listResults']
        for x in rental_listings:
            if 'hdpData' in x and x['hdpData']['homeInfo']['homeType'] == 'SINGLE_FAMILY':
                d = {
                    'zpid': x.get('zpid'),
                    'detailUrl': x.get('detailUrl'),
                    'imgSrc': x.get('imgSrc'),
                    'price': x.get('unformattedPrice'),
                    'address': x.get('address'),
                    'beds': x.get('beds'),
                    'baths': x.get('baths'),
                    'area': x.get('area'),
                    'homeType': x['hdpData']['homeInfo'].get('homeType'),
                    'latitude': x.get('latLong', {}).get('latitude'),
                    'longitude': x.get('latLong', {}).get('longitude'),
                    'zestimate': x['hdpData']['homeInfo'].get('zestimate'),
                    'rentZestimate': x['hdpData']['homeInfo'].get('rentZestimate'),
                    'daysOnZillow': x['hdpData']['homeInfo'].get('daysOnZillow'),
                    'priceChange': x['hdpData']['homeInfo'].get('priceChange'),
                    'datePriceChanged': x['hdpData']['homeInfo'].get('datePriceChanged'),
                    'availabilityDate': x.get('availabilityDate'),
                    'marketingTreatments': x.get('marketingTreatments')
                }
                listings.append(d)
    except Exception as e:
        print(f"⚠️ Error extracting data: {e}")
    return listings


In [17]:
# List of Zillow URLs to scrape
urls = [
    "https://www.zillow.com/houston-tx/?searchQueryState=%7B%22isMapVisible%22%3Atrue%2C%22mapBounds%22%3A%7B%22north%22%3A30.155416657868145%2C%22south%22%3A29.478655994984194%2C%22east%22%3A-94.88464413085939%2C%22west%22%3A-95.96405086914064%7D%2C%22usersSearchTerm%22%3A%22Houston%2C%20TX%22%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%7D%2C%22isListVisible%22%3Atrue%2C%22category%22%3A%22cat1%22%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A39051%2C%22regionType%22%3A6%7D%5D%7D",
    "https://www.zillow.com/san-francisco-ca/?searchQueryState=%7B%22isMapVisible%22%3Atrue%2C%22mapBounds%22%3A%7B%22north%22%3A37.85232054612083%2C%22south%22%3A37.69818302232692%2C%22east%22%3A-122.29840365771484%2C%22west%22%3A-122.56825534228516%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%7D%2C%22isListVisible%22%3Atrue%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22mapZoom%22%3A12%7D"
]

In [18]:
# Ask for API key
api_key = getpass.getpass("🔑 Enter your Scrapeak API key: ")

🔑 Enter your Scrapeak API key: ··········


In [20]:
response = get_zillow_listings(api_key, urls)

In [21]:
# view keys
response.json().keys()

dict_keys(['is_success', 'data', 'message', 'info'])

In [22]:
len(response.json()['data']['cat1']['searchResults']['listResults'])

41

In [23]:
# Scrape all URLs and combine into one list
all_listings = []
for url in urls:
    print(f"📡 Scraping: {url}")
    response = get_zillow_listings(api_key, url)
    listings = extract_listings(response)
    all_listings.extend(listings)
    print(f"✅ Found {len(listings)} listings.")

📡 Scraping: https://www.zillow.com/houston-tx/?searchQueryState=%7B%22isMapVisible%22%3Atrue%2C%22mapBounds%22%3A%7B%22north%22%3A30.155416657868145%2C%22south%22%3A29.478655994984194%2C%22east%22%3A-94.88464413085939%2C%22west%22%3A-95.96405086914064%7D%2C%22usersSearchTerm%22%3A%22Houston%2C%20TX%22%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%7D%2C%22isListVisible%22%3Atrue%2C%22category%22%3A%22cat1%22%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A39051%2C%22regionType%22%3A6%7D%5D%7D
✅ Found 39 listings.
📡 Scraping: https://www.zillow.com/san-francisco-ca/?searchQueryState=%7B%22isMapVisible%22%3Atrue%2C%22mapBounds%22%3A%7B%22north%22%3A37.85232054612083%2C%22south%22%3A37.69818302232692%2C%22east%22%3A-122.29840365771484%2C%22west%22%3A-122.56825534228516%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%7D%2C%22isListVisible%22%3Atrue%2C%22regionSel

In [24]:
# Convert to DataFrame
df = pd.DataFrame(all_listings)
print(f"\n📊 Total listings collected: {len(df)}")
df.head(3)


📊 Total listings collected: 66


Unnamed: 0,zpid,detailUrl,imgSrc,price,address,beds,baths,area,homeType,latitude,longitude,zestimate,rentZestimate,daysOnZillow,priceChange,datePriceChanged,availabilityDate,marketingTreatments
0,28150253,https://www.zillow.com/homedetails/5934-Miller...,https://photos.zillowstatic.com/fp/4ef050c4264...,250000,"5934 Miller Valley Dr, Houston, TX 77066",4,3.0,2312,SINGLE_FAMILY,29.975105,-95.51731,243200.0,2159.0,5,,,2025-07-17 00:00:00,
1,28308613,https://www.zillow.com/homedetails/10534-Twili...,https://photos.zillowstatic.com/fp/5425b61698f...,279000,"10534 Twilight Moon Dr, Houston, TX 77064",3,2.0,1650,SINGLE_FAMILY,29.924118,-95.573845,273500.0,1862.0,6,,,,
2,28005745,https://www.zillow.com/homedetails/10521-Cathe...,https://photos.zillowstatic.com/fp/42891cc1ad4...,185900,"10521 Cathedral Dr, Houston, TX 77051",4,2.0,1368,SINGLE_FAMILY,29.64514,-95.37496,188000.0,1389.0,36,-10000.0,1753081000000.0,,


In [25]:
# Save to CSV in Colab environment
df.to_csv("zillow_listings.csv", index=False)

# Download to your computer
from google.colab import files
files.download("zillow_listings.csv")


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [26]:
df.groupby(['beds']).agg({'zpid': 'count', 'price': 'mean'}).reset_index()

Unnamed: 0,beds,zpid,price
0,2,7,985571.4
1,3,19,805462.5
2,4,27,669462.8
3,5,12,883125.0
4,10,1,26000000.0


In [27]:
# create features
df_features = df.copy()
df_features['price_per_sqft'] = df_features['price'] / df_features['area']
df_features['rent_to_price_ratio'] = (df_features['price']*12) / df_features['zestimate']
df_features['one_percent_rule'] = (df_features['price'] / df_features['zestimate']) > 0.01
df_features.head(2)

Unnamed: 0,zpid,detailUrl,imgSrc,price,address,beds,baths,area,homeType,latitude,...,zestimate,rentZestimate,daysOnZillow,priceChange,datePriceChanged,availabilityDate,marketingTreatments,price_per_sqft,rent_to_price_ratio,one_percent_rule
0,28150253,https://www.zillow.com/homedetails/5934-Miller...,https://photos.zillowstatic.com/fp/4ef050c4264...,250000,"5934 Miller Valley Dr, Houston, TX 77066",4,3.0,2312,SINGLE_FAMILY,29.975105,...,243200.0,2159.0,5,,,2025-07-17 00:00:00,,108.131488,12.335526,True
1,28308613,https://www.zillow.com/homedetails/10534-Twili...,https://photos.zillowstatic.com/fp/5425b61698f...,279000,"10534 Twilight Moon Dr, Houston, TX 77064",3,2.0,1650,SINGLE_FAMILY,29.924118,...,273500.0,1862.0,6,,,,,169.090909,12.241316,True


In [12]:
df_features.groupby(['beds']).agg({'zpid': 'count', 'price': 'mean'}).reset_index()

Unnamed: 0,beds,zpid,price
0,2,7,985571.428571
1,3,19,805462.526316
2,4,28,700910.535714
3,5,12,883125.0


In [28]:
df_features.tail(4)

Unnamed: 0,zpid,detailUrl,imgSrc,price,address,beds,baths,area,homeType,latitude,...,zestimate,rentZestimate,daysOnZillow,priceChange,datePriceChanged,availabilityDate,marketingTreatments,price_per_sqft,rent_to_price_ratio,one_percent_rule
62,15181516,https://www.zillow.com/homedetails/4258-26th-S...,https://photos.zillowstatic.com/fp/524cb1fa3f9...,5495000,"4258 26th St, San Francisco, CA 94131",5,5.0,3854,SINGLE_FAMILY,37.748222,...,,21144.0,4,,,,,1425.791386,,False
63,15142447,https://www.zillow.com/homedetails/1035-Natoma...,https://photos.zillowstatic.com/fp/e2f2b4bf1a5...,1195000,"1035 Natoma St, San Francisco, CA 94103",3,2.0,1370,SINGLE_FAMILY,37.77299,...,,,13,,,,,872.262774,,False
64,15152616,https://www.zillow.com/homedetails/1338-York-S...,https://photos.zillowstatic.com/fp/b6803a1d766...,1088999,"1338 York St, San Francisco, CA 94110",3,1.0,1446,SINGLE_FAMILY,37.75059,...,,4988.0,13,,,,,753.111342,,False
65,15080791,https://www.zillow.com/homedetails/2898-Broadw...,https://photos.zillowstatic.com/fp/727bccc7a89...,26000000,"2898 Broadway St, San Francisco, CA 94115",10,7.0,11155,SINGLE_FAMILY,37.793377,...,,7706.0,79,,,,,2330.793366,,False


In [31]:
df.shape

(66, 18)

In [None]:
# save to a csv file (i.e. open it up in google sheets or excel)
#df_score.to_csv('sample_rental_listings.csv', index=False)



✅ Saved to Downloads folder.







### 1. **Data Accuracy & Enrichment**
   - **APIs for Additional Data**: Consider integrating more data sources, such as property history and owner financing information, to enrich your dataset and make more informed decisions.
   - **Skip Tracing Services**: Once you’ve identified potential leads, using a skip tracing service to find the landlord's contact information can streamline your outreach efforts.

### 2. **Automating Data Retrieval**
   - **Scheduling the Scrape**: Instead of manually running this notebook, you could set up a process to scrape rental listings at regular intervals using a cloud service (e.g., AWS Lambda or Google Cloud Functions) to stay updated with new leads.
   - **Web Scraping with Automations**: If Zillow's API becomes restrictive, consider using tools like **Browse AI** or **PhantomBuster** for scraping listings dynamically without violating terms of service.

### 3. **Enhanced Analysis**
   - **Sentiment Analysis on Listings**: You could perform a text analysis on the listing descriptions to detect motivated sellers. For example, phrases like "price negotiable" can provide additional insights into seller motivation.
   - **Time-on-Market Analysis**: Visualize the time on market over several months to detect trends in particular neighborhoods, indicating areas where landlords are struggling to rent out properties.

### 4. **Contact Management & Outreach**
   - **CRM Integration**: Once you have a list of tired landlord leads, consider integrating the data into a CRM like **GoHighLevel** or **HubSpot** for automated outreach. This would allow you to create automated email or SMS campaigns targeting landlords.
   - **Personalization in Outreach**: The more personalized your outreach, the better. Consider using the listing details in your email or SMS templates to make your offers more appealing.

### 5. **Predictive Modeling**
   - **Machine Learning for Lead Scoring**: You could build a predictive model that scores each property based on how likely the landlord is to sell. Inputs could include time on market, number of price reductions, and rental market conditions in the area.
   - **Market Demand Prediction**: Predict future rental demand in certain areas using external data such as employment trends, population growth, and new developments.

---

By incorporating some or all of these enhancements, you can take your lead generation efforts to the next level and stay competitive in finding off-market real estate deals.

# End Notebook

In [None]:
# Investigate the structure of the data returned by the API call
data_response = response.json()['data']
print(f"Type of response.json()['data']: {type(data_response)}")
if isinstance(data_response, dict):
    print(f"Keys in response.json()['data']: {data_response.keys()}")

Type of response.json()['data']: <class 'dict'>
Keys in response.json()['data']: dict_keys(['user', 'mapState', 'regionState', 'searchPageSeoObject', 'requestId', 'cat1', 'categoryTotals'])


In [None]:
print(f"The DataFrame has {df.shape[0]} rows.")