## Web Scraping Zillow Part 1

### Project Overview

In this mini-project we’ll demonstrate a streamlined workflow for extracting real-estate listings directly from Zillow. Using `requests` to fetch each results page and **BeautifulSoup** for HTML parsing, we’ll harvest key attributes—address, price, beds, baths, and square footage—and load them into a **pandas** DataFrame for analysis. The goal is to keep the scraping logic lightweight and transparent while laying the groundwork for more advanced features (pagination, error handling, and API interception) that can be added later.


## I. Preperation

First, import the core libraries—**pandas** and **NumPy**—to handle and prepare the data. We’ll then use **Requests** to fetch the webpage and **BeautifulSoup** to parse and display the HTML content.

In [213]:
## Necessary Libraries
import pandas as pd 
import numpy as np 

## Reading the HTML
import requests 
from bs4 import BeautifulSoup

Next, assign the target URL to a variable so you can reference it throughout the script.

In [214]:
url = 'https://www.zillow.com/sacramento-ca/'

Many sites reject traffic that looks automated, so we’ll add a realistic `User-Agent` (and any other needed headers) to make the request resemble a normal browser visit. After sending the request with `requests.get()`, we’ll check `response.status_code` to confirm the page loaded successfully and handle any errors.


In [215]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Cookie": "zgcus_aeut=AEUUT_f8285d35-d62a-11ef-84f9-3685c7040b65; zgcus_aeuut=AEUUT_f8285d35-d62a-11ef-84f9-3685c7040b65; optimizelyEndUserId=oeu1748681974282r0.7023196952060992; zguid=24|%245fcea5b3-17e1-4f31-9bec-46406835c8f5; zjs_anonymous_id=%225fcea5b3-17e1-4f31-9bec-46406835c8f5%22; zjs_user_id=null; zg_anonymous_id=%221582b5e4-ab92-4629-a63d-3dab7241331a%22; _pxvid=92193fe5-3dfd-11f0-9e08-aa8e85a15028; _ga=GA1.2.855722002.1748681975; zjs_user_id_type=%22encoded_zuid%22; _gcl_au=1.1.1107335690.1748681977; _scid=T0MaaKLYvgqlo8mGS15e80zY1uQsPeOUBkZrMg; _tt_enable_cookie=1; _ttp=01JWJS2VTA2DJWZVC3KWWBQYCJ_.tt.1; _pin_unauth=dWlkPU1qaGxZMk5sWmpjdE1HSXdOaTAwTTJSakxXRXpaREl0WldFME1ETXdOekZsTVRGbQ; _fbp=fb.1.1748681977986.1014500896111575; optimizelySession=1748682064919; _gid=GA1.2.983266200.1752018609; _ScCbts=%5B%22236%3Bchrome.2%3A2%3A5%22%5D; _sctr=1%7C1751958000000; _clck=2ftutl%7C2%7Cfxg%7C0%7C1977; _lr_env_src_ats=false; _lr_sampling_rate=100; zgsession=1|24c48bb5-9fd6-4c0d-9100-dc987418c2a6; pxcts=368d501c-5cf3-11f0-aa03-c6ecd6c7b9a5; DoubleClickSession=true; AWSALB=0zulJC/UDPYSurW5e8itP94hdXrvtAGwRYotCYIimBuY1dT9mR2tMAh+/O2x4bIFYfn8yUcWDyqoMrdvB4LDVywzun5Vd05D5XvHEtTDl6Pua6q4wjsMOCELqxF+; AWSALBCORS=0zulJC/UDPYSurW5e8itP94hdXrvtAGwRYotCYIimBuY1dT9mR2tMAh+/O2x4bIFYfn8yUcWDyqoMrdvB4LDVywzun5Vd05D5XvHEtTDl6Pua6q4wjsMOCELqxF+; JSESSIONID=531AEFE98F2F210DD11A6061327BD095; connectId=%7B%22puid%22%3A%22c13d67b00ee65378ea66a4efe20714a232295878014dc419589dcfeef2baa883%22%2C%22vmuid%22%3A%22EMtNg3cAVC12XuUKowM8SnsqgaQQwkmDb8VljAb_SJmwfNw33JqP_uhh7FXRyiZZ8AOfs-DEKJBRdfjJc8FUoQ%22%2C%22connectid%22%3A%22EMtNg3cAVC12XuUKowM8SnsqgaQQwkmDb8VljAb_SJmwfNw33JqP_uhh7FXRyiZZ8AOfs-DEKJBRdfjJc8FUoQ%22%2C%22connectId%22%3A%22EMtNg3cAVC12XuUKowM8SnsqgaQQwkmDb8VljAb_SJmwfNw33JqP_uhh7FXRyiZZ8AOfs-DEKJBRdfjJc8FUoQ%22%2C%22ttl%22%3A86400000%2C%22lastSynced%22%3A1752019802305%2C%22lastUsed%22%3A1752090425606%7D; _lr_retry_request=true; _scid_r=V0MaaKLYvgqlo8mGS15e80zY1uQsPeOUBkZrOg; _rdt_uuid=1748681977097.4fc67117-4812-4301-9a10-8145dd2d7a31; _uetsid=48adfea05c5611f0b4897d0926926b7e; _uetvid=f97b4a30d62a11efb61c0b9de1eb7738; _px3=3c01ba1b5a8c320ce70ce5efd72ac83951a7b307752b486c0313e8c745cc0325:Xad+q0p70YTSfJuXFIM3G93kuWAma141w5QWiEmkUPr//vQYfew7u4kTh323Ukk6tjOZn52caGfAsXnoM/SOIw==:1000:IaAO7tLVSMABl+2Vt4pRqL2K//3xSAVyjjLC3iMYB7geVZE3FMndi/IDyaWOXtOkpv0wVXfzs2MiFO07+Wt6Y3iPe1LcEAyCq/WsIATdvb/fAHAi429iS+ce8P/A1666N6hBMgH6UsTMXZB7JbciDYvGbgxHdiWXTwfX7JKiVDaRLXTGjq0wRQuUNwlcXPSez2KOHqpDhsqXEnQNoK7RT29hYXPqSGj8Jo27sCMhZAE=; tfpsi=53c6625c-5a80-41b4-b206-dc89e7dfcea3; ttcsid=1752092388150::UOWwNVdCEeAEwyEyMoYq.8.1752092874328; _clsk=127bt1u%7C1752092874786%7C3%7C0%7Cd.clarity.ms%2Fcollect; search=6|1754684872032%7Crect%3D39.18123381104551%2C-120.24362099609375%2C38.03192689431649%2C-122.68807900390625%26rid%3D20288%26disp%3Dmap%26mdm%3Dauto%26p%3D1%26listPriceActive%3D1%26lt%3Dfsba%2Cfsbo%2Cnew%2Ccmsn%26fs%3D1%26fr%3D0%26mmm%3D0%26rs%3D0%26singlestory%3D0%26housing-connector%3D0%26parking-spots%3Dnull-%26abo%3D0%26garage%3D0%26pool%3D0%26ac%3D0%26waterfront%3D0%26finished%3D0%26unfinished%3D0%26cityview%3D0%26mountainview%3D0%26parkview%3D0%26waterview%3D0%26hoadata%3D1%26zillow-owned%3D0%263dhome%3D0%26showcase%3D0%26featuredMultiFamilyBuilding%3D0%26onlyRentalStudentHousingType%3D0%26onlyRentalIncomeRestrictedHousingType%3D0%26onlyRentalMilitaryHousingType%3D0%26onlyRentalDisabledHousingType%3D0%26onlyRentalSeniorHousingType%3D0%26commuteMode%3Ddriving%26commuteTimeOfDay%3Dnow%09%0920288%09%7B%22isList%22%3Atrue%2C%22isMap%22%3Atrue%7D%09%09%09%09%09; ttcsid_CN5P33RC77UF9CBTPH9G=1752092388150::Je_O1JzH5MAHxu8gYyBm.8.1752092875944; __gads=ID=c83766f28519d0c9:T=1752019799:RT=1752092873:S=ALNI_MZBTIaWPhARLmWAuP9XUYKs_g_jqw; __gpi=UID=000010f89a9f7723:T=1752019799:RT=1752092873:S=ALNI_MZJPVsJlL9dZZRRCM5sK7V-YGs8Tw; __eoi=ID=c7dea6c347eed0eb:T=1752019799:RT=1752092873:S=AA-AfjZ_dIt-mlBMVfOiX8wRqtcC"
}

response = requests.get(url, headers=headers)

response.status_code

200

An HTTP **200 OK** status means the request succeeded and the page content is available—so we can proceed with parsing.


## II. Soup

Now that we have a clean 200 OK status, we can hand the page to **BeautifulSoup** for parsing and inspection. Before doing so, we set `response.encoding` (if it wasn’t provided) and use `response.content.decode(response.encoding, errors="replace")` to ensure the HTML is decoded with the correct character set, avoiding any stray-character or encoding errors during parsing.


In [216]:
encod = response.encoding
contents = response.content.decode(encod, errors="replace")

soup = BeautifulSoup(contents, 'html.parser')
soup.prettify()

'<!DOCTYPE html>\n<html lang="en">\n <head>\n  <script id="scripts.pfs.appInfo">\n   if (typeof Object.assign === \'function\') {\n            window.appInfo = Object.assign(\n                typeof window.appInfo === \'object\' ? window.appInfo : {},\n                {"@zillow/page-frame-content":"728621ce"}\n            );\n        }\n  </script>\n  <script id="scripts.client-profiler.config">\n   window.CLIENT_PROFILER_CONFIG = window.CLIENT_PROFILER_CONFIG || {\n            staticDimensions: {\n                ABDecisionToken: "WwERFZ_1753134191",\n            },\n        };\n  </script>\n  <script id="scripts.ua">\n   (function(i,s,o,g,r,a,m){i[\'GoogleAnalyticsObject\']=r;i[r]=i[r]||function(){\n        (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),\n        m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)\n        })(window,document,\'script\',\'https://www.google-analytics.com/analytics.js\',\'ga\');\n  </script>\n 

With the full HTML in hand, our first step is to locate the section header that contains the data we need—in this case, the heading **“Sacramento CA Real Estate.”**


In [217]:
titles = soup.title.text

print(titles)

Sacramento CA Real Estate - Sacramento CA Homes For Sale | Zillow


Finally, we’ll build a DataFrame and populate it with the information extracted from the HTML. We’ll loop through every property card, pull out the relevant fields, and append each record to the DataFrame. Wrapping the extraction logic in a `try / except` block lets the loop continue even if a particular card is malformed or a field is missing.

In [218]:
## Create a dataframe
table1 = pd.DataFrame(columns = {'address', 'price', 'detail', 'link', 'posted_note'})

## Find all listing cards
houses = soup.find_all("div", {"class":"StyledCard-c11n-8-109-3__sc-1w6p0lv-0 icLwbG StyledPropertyCardBody-c11n-8-109-3__sc-1danayh-0 iHqwT PropertyCardWrapper__StyledPropertyCardBody-srp-8-109-3__sc-16e8gqd-3 iSbqZf" })

## Loop through the listing cards that will populate our dataframe 
for house in houses: 
    try:
        address = house.find("address").get_text(strip=True)
        price = house.find("span", {"data-test":"property-card-price"}).get_text(strip=True)
        link = house.find("div", {"class":"StyledPhotoCarouselSlide-c11n-8-109-3__sc-jwte3-0 bYzbke"}).a["href"]
        posted = house.find("div", class_ = "StyledPropertyCardBadgeArea-c11n-8-109-3__sc-11omngf-0 izNoTZ").get_text(strip=True)
        detail = house.find("ul", {"class":"StyledPropertyCardHomeDetailsList-c11n-8-109-3__sc-1j0som5-0 dFYPvN"}).get_text(strip=True)
    except Exception as e: 
        address = None
        price = None
        link = None
        posted = None
        detail = None
        
    table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link, 
                               "posted_note": posted}, ignore_index=True)
        
table1

  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,
  table1 = table1.append({"address": address, "price": price, "detail": detail, "link": link,


Unnamed: 0,price,posted_note,address,detail,link
0,"$585,000",3D Tour,"1925 Woodstock Way, Sacramento, CA 95825","4bds3ba2,092sqft",https://www.zillow.com/homedetails/1925-Woodst...
1,"$559,000","Price cut: $6,000 (6/28)","2312 Endeavor Way, Sacramento, CA 95834","4bds3ba2,148sqft",https://www.zillow.com/homedetails/2312-Endeav...
2,"$595,000",Private bath,"3312 Mas Amilos Way, Sacramento, CA 95835","4bds2ba2,078sqft",https://www.zillow.com/homedetails/3312-Mas-Am...
3,"$574,950","Price cut: $24,050 (7/16)","3301 Myna Way, Sacramento, CA 95834","4bds3ba2,090sqft",https://www.zillow.com/homedetails/3301-Myna-W...
4,"$564,999",Upstairs laundry room,"560 Willie Hausey Way, Sacramento, CA 95838","4bds2.5ba2,370sqft",https://www.zillow.com/homedetails/560-Willie-...
5,"$569,000","Price cut: $5,900 (7/19)","2560 Greg Jarvis Avenue, Sacramento, CA 95834","4bds4ba2,220sqft",https://www.zillow.com/homedetails/2560-Greg-J...
6,"$555,000",Beautifully landscaped yard,"7043 Catlen Way, Sacramento, CA 95831","4bds3ba1,804sqft",https://www.zillow.com/homedetails/7043-Catlen...
7,"$599,900",Modern flooring,"8053 Bothwell Dr, Sacramento, CA 95829","4bds2ba2,078sqft",https://www.zillow.com/homedetails/8053-Bothwe...
8,"$650,000",Showcase,"1956 4th Ave, Sacramento, CA 95818",2bds1ba965sqft,https://www.zillow.com/homedetails/1956-4th-Av...


## III. Conclusion 


This walkthrough showed a basic way to collect data with BeautifulSoup. Notice that our DataFrame holds only 9 records, even though the page displays 41 listings. Because the results span several pages, our next iteration will focus on handling pagination so we can capture every listing.
