## Web Scraping Zillow Part 1

### Project Overview

In this mini-project we’ll demonstrate a streamlined workflow for extracting real-estate listings directly from Zillow. Using `requests` to fetch each results page and **BeautifulSoup** for HTML parsing, we’ll harvest key attributes—address, price, beds, baths, and square footage—and load them into a **pandas** DataFrame for analysis. The goal is to keep the scraping logic lightweight and transparent while laying the groundwork for more advanced features (pagination, error handling, and API interception) that can be added later.


## I. Preparation

First, import the core libraries—**pandas** and **NumPy**—to handle and prepare the data. We’ll then use **Requests** to fetch the webpage and **BeautifulSoup** to parse and display the HTML content.

In [1]:
## Necessary Libraries
import pandas as pd 
import numpy as np 

## Reading the HTML
import requests 
from bs4 import BeautifulSoup

import warnings
warnings.filterwarnings('ignore')

Next, assign the target URL to a variable so you can reference it throughout the script.

In [2]:
url = 'https://www.zillow.com/sacramento-ca/'

Many sites reject traffic that looks automated, so we’ll add a realistic `User-Agent` (and any other needed headers) to make the request resemble a normal browser visit. After sending the request with `requests.get()`, we’ll check `response.status_code` to confirm the page loaded successfully and handle any errors.


In [3]:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br, zstd",
    "Cookie": ""
}

response = requests.get(url, headers=headers)

response.status_code

200

An HTTP **200** status means the request succeeded and the page content is available—so we can proceed with parsing.


## II. Soup

Now that we have a clean 200 OK status, we can hand the page to **BeautifulSoup** for parsing and inspection. Before doing so, we set `response.encoding` (if it wasn’t provided) and use `response.content.decode(response.encoding, errors="replace")` to ensure the HTML is decoded with the correct character set, avoiding any stray-character or encoding errors during parsing.


In [4]:
encod = response.encoding
contents = response.content.decode(encod, errors="replace")

soup = BeautifulSoup(contents, 'html.parser')


With the full HTML in hand, our first step is to locate the section header that contains the data we need—in this case, the heading **“Sacramento CA Real Estate.”**


In [5]:
titles = soup.title.text

print(titles)

Sacramento CA Real Estate - Sacramento CA Homes For Sale | Zillow


Finally, we’ll build a DataFrame and populate it with the information extracted from the HTML. We’ll loop through every property card, pull out the relevant fields, and append each record to the DataFrame. Wrapping the extraction logic in a `try / except` block lets the loop continue even if a particular card is malformed or a field is missing.

In [6]:
## Create a dataframe
table1 = pd.DataFrame(columns = {
    'address', 
    'price', 
    'link', 
    'posted_note', 
    'beds', 
    'baths', 
    'sqft'
})

## Find all listing cards
houses = soup.find_all("div", {"class":"StyledCard-c11n-8-109-3__sc-1w6p0lv-0 icLwbG StyledPropertyCardBody-c11n-8-109-3__sc-1danayh-0 iHqwT PropertyCardWrapper__StyledPropertyCardBody-srp-8-109-3__sc-16e8gqd-3 iSbqZf" })

## Loop through the listing cards that will populate our dataframe 
for house in houses: 
    try:
        address = house.find("address").get_text(strip=True)
        price = house.find("span", {"data-test":"property-card-price"}).get_text(strip=True)
        link = house.find("div", {"class":"StyledPhotoCarouselSlide-c11n-8-109-3__sc-jwte3-0 bYzbke"}).a["href"]
        posted = house.find("div", class_ = "StyledPropertyCardBadgeArea-c11n-8-109-3__sc-11omngf-0 izNoTZ").get_text(strip=True)
        beds = house.find_all("li", {"class":""})[0].get_text(strip=True)
        baths = house.find_all("li", {"class":""})[1].get_text(strip=True)
        sqft = house.find_all("li", {"class":""})[2].get_text(strip=True)
    except Exception as e: 
        address = None
        price = None
        link = None
        posted = None
        detail = None
        
    table1 = table1.append({
        "address": address,
        "price": price,
        "beds": beds,
        "baths": baths,
        "sqft": sqft,
        "link": link, 
        "posted_note": posted}, ignore_index=True)
        
table1

Unnamed: 0,price,posted_note,address,baths,sqft,beds,link
0,"$559,900",Showcase,"3800 William Way, Sacramento, CA 95821",2ba,"2,057sqft",3bds,https://www.zillow.com/homedetails/3800-Willia...
1,"$599,000",Private backyard oasis,"2940 Wheat Grass St, Sacramento, CA 95833",3ba,"1,850sqft",3bds,https://www.zillow.com/homedetails/2940-Wheat-...
2,"$250,000",Beautifully designed tri-level home,"1370 Pebblewood Dr, Sacramento, CA 95833",3ba,"1,862sqft",3bds,https://www.zillow.com/homedetails/1370-Pebble...
3,"$529,900",Brand new carpet,"5906 Caddington Way, Sacramento, CA 95835",3ba,"1,875sqft",3bds,https://www.zillow.com/homedetails/5906-Caddin...
4,"$539,950",Fruit bearing trees,"2466 Buzz Aldrin Way, Sacramento, CA 95834",3ba,"1,996sqft",4bds,https://www.zillow.com/homedetails/2466-Buzz-A...
5,"$555,000",3D Tour,"3097 Tintorera Way, Sacramento, CA 95833",2ba,"1,757sqft",3bds,https://www.zillow.com/homedetails/3097-Tintor...
6,"$565,000",Sparkling pool and spa,"3065 Sand Dollar Way, Sacramento, CA 95821",2ba,"1,447sqft",3bds,https://www.zillow.com/homedetails/3065-Sand-D...
7,"$370,000",3 days on Zillow,"2060 Monarch Ave, Sacramento, CA 95832",2ba,"1,018sqft",3bds,https://www.zillow.com/homedetails/2060-Monarc...
8,"$365,000",Functional layout,"7312 Hutchins Way, North Highlands, CA 95660",2ba,"1,090sqft",3bds,https://www.zillow.com/homedetails/7312-Hutchi...


## III. Conclusion 


This walkthrough showed a basic way to collect data with BeautifulSoup. Notice that our DataFrame holds only 9 records, even though the page displays 41 listings. Because the results span several pages, our next iteration will focus on handling pagination so we can capture every listing.
