This notebook explores historical 1-bedroom median rents in major U.S. cities using Zillow’s public dataset. I wanted to understand how affordability has shifted over time, and this notebook is the first step: loading and transforming raw rent data.

## 1. Gather data

First I'm going to download the raw data for the Zillow Observed Rent Index for the cities I'm interested in. I'm looking for city rent data over time, so I want something longitudinal.

![Alt text](image1.png)

In [7]:
   %pip install pandas requests

Collecting pandas
  Downloading pandas-2.2.3-cp310-cp310-macosx_11_0_arm64.whl (11.3 MB)
[K     |████████████████████████████████| 11.3 MB 1.3 MB/s eta 0:00:01
[?25hCollecting requests
  Downloading requests-2.32.3-py3-none-any.whl (64 kB)
[K     |████████████████████████████████| 64 kB 8.0 MB/s  eta 0:00:01
[?25hCollecting numpy>=1.22.4
  Downloading numpy-2.2.4-cp310-cp310-macosx_14_0_arm64.whl (5.4 MB)
[K     |████████████████████████████████| 5.4 MB 18.0 MB/s eta 0:00:01
[?25hCollecting tzdata>=2022.7
  Downloading tzdata-2025.2-py2.py3-none-any.whl (347 kB)
[K     |████████████████████████████████| 347 kB 19.9 MB/s eta 0:00:01
[?25hCollecting pytz>=2020.1
  Downloading pytz-2025.2-py2.py3-none-any.whl (509 kB)
[K     |████████████████████████████████| 509 kB 11.9 MB/s eta 0:00:01
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.4.1-cp310-cp310-macosx_10_9_universal2.whl (198 kB)
[K     |████████████████████████████████| 198 kB 24.3 MB/s eta 0:00:01


In [8]:
import pandas as pd
import requests

In [11]:
# Read in the CSV file
df = pd.read_csv('City_zori_uc_sfrcondomfr_sm_month.csv')

# Display basic information about the dataframe
print("DataFrame Info:")
print(df.info())

# Show the first few rows
print("\nFirst few rows:")
print(df.head())

# Display column names
print("\nColumn names:")
print(df.columns.tolist())


DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3307 entries, 0 to 3306
Columns: 130 entries, RegionID to 2025-02-28
dtypes: float64(122), int64(2), object(6)
memory usage: 3.3+ MB
None

First few rows:
   RegionID  SizeRank   RegionName RegionType StateName State  \
0      6181         0     New York       city        NY    NY   
1     12447         1  Los Angeles       city        CA    CA   
2     39051         2      Houston       city        TX    TX   
3     17426         3      Chicago       city        IL    IL   
4      6915         4  San Antonio       city        TX    TX   

                                   Metro          CountyName   2015-01-31  \
0  New York-Newark-Jersey City, NY-NJ-PA       Queens County  2519.633872   
1     Los Angeles-Long Beach-Anaheim, CA  Los Angeles County  1829.491393   
2   Houston-The Woodlands-Sugar Land, TX       Harris County  1192.772833   
3     Chicago-Naperville-Elgin, IL-IN-WI         Cook County  1525.898035   
4  

So we can tell that the data is
- One row per city (perfect)
- Includes the metro area, state, county etc (not really  necessary)
- And the observed rent (the mean of the 35th to 65th percentile, per Zillow).