This notebook performs Exploratory Data Analysis (EDA) on Zillow’s Home Value Index (ZHVI) dataset. I focus on Colorado home prices from 2000 to 2025. The goal of this notebook is to clean the data, understand patterns, visualize trends, and prepare the dataset for the modeling work that I complete in my second notebook, Modeling Colorado Home Prices (2000–2025).

In [None]:

# STEP 1 — Import libraries
import pandas as pd              # for data loading, cleaning, transformations
import matplotlib.pyplot as plt  # for plotting charts
import warnings
warnings.filterwarnings("ignore")



In [None]:
# STEP 2 — Load the Zillow Home Value Index dataset
df = pd.read_csv("/kaggle/input/zillow-home-value-index/ZHVI.csv")

#  the first few rows
df.head()




In [None]:

# STEP 3 — Rename the date column
df = df.rename(columns={'Unnamed: 0': 'Date'})

df.head()


In [None]:
# STEP 4 — Convert Date to proper datetime format
df['Date'] = pd.to_datetime(df['Date'])

df.head()


In [None]:
# STEP 5 — Keep only the Date and Colorado columns
co = df[['Date', 'Colorado']].dropna()

co.head()


In [None]:
# STEP 6 — Convert monthly ZHVI data into yearly averages
co_yearly = (
    co.groupby(co['Date'].dt.year)['Colorado']
      .mean()
      .reset_index()
)

# Rename columns
co_yearly.columns = ['Year', 'AveragePrice']

# Filter to 2000–2025 only
co_yearly = co_yearly[(co_yearly['Year'] >= 2000) & (co_yearly['Year'] <= 2025)]

co_yearly


In [None]:
# STEP 7 — Display summary statistics
print("=== Summary stats for Colorado ZHVI (Yearly, 2000–2025) ===")
co_yearly.describe()


In [None]:
# STEP 8 — Line chart for yearly home values
plt.figure(figsize=(12,6))
plt.plot(co_yearly['Year'], co_yearly['AveragePrice'], marker='o', linewidth=2)
plt.title("Average Home Value in Colorado (2000–2025)", fontsize=16)
plt.xlabel("Year", fontsize=13)
plt.ylabel("Average Home Price (USD)", fontsize=13)
plt.grid(True)
plt.show()


### Trend Interpretation

The graph shows that the average home value in Colorado has generally increased from 2000 to 2025. From 2000 to around 2007, home prices increased slowly. There is a clear drop between 2008 and 2011, which matches the U.S. housing market crash during the financial crisis. 
After 2012, home prices began to rise again, and the growth became much stronger around 2015. The largest jump happened between 2020 and 2022, where prices increased very quickly. After 2022, the prices show a slight decrease and then level off from 2023 to 2025. 
Overall, the long-term trend shows that home values in Colorado have grown significantly over the last 25 years, with some ups and downs related to economic conditions.


In [None]:
# STEP 10 — Compare Colorado with Washington and Texas (yearly averages)

# Choose states for comparison
states = ['Colorado', 'Washington', 'Texas']

# Keep only Date + selected states, drop missing values
states_df = df[['Date'] + states].dropna()

# Extract the year
states_df['Year'] = states_df['Date'].dt.year

# Group by year and compute average for each state
states_yearly = (
    states_df
    .groupby('Year')[states]
    .mean()
    .reset_index()
)

# Show the first few rows
states_yearly.head()


In [None]:
# STEP 11 — Line chart for Colorado vs Washington vs Texas

plt.figure(figsize=(12, 6))

for s in states:
    plt.plot(states_yearly['Year'], states_yearly[s], marker='o', linewidth=2, label=s)

plt.title("Average Home Values: Colorado vs Washington vs Texas (2000–2025)", fontsize=16)
plt.xlabel("Year", fontsize=13)
plt.ylabel("Average Home Price (USD)", fontsize=13)
plt.grid(True)
plt.legend()
plt.show()


This chart compares how home prices changed in Colorado, Washington, and Texas from 2000 to 2025. Washington has the highest prices and shows the fastest growth, especially after 2015, which matches the tech-driven housing boom in the Seattle area. Texas remains the most affordable state, with slower and steadier price increases across the entire period. Colorado starts closer to Texas in the early 2000s but begins rising much faster after 2012. By 2020–2022, Colorado’s trend looks more similar to Washington, showing strong housing demand and rapid price growth. This shift suggests that Colorado has moved from a mid-priced market to a much higher-cost market over the last decade. Overall, the comparison shows that Colorado’s housing market has become significantly more expensive and more competitive than it used to be.

In [None]:
# step 12: save the yearly table to a CSV for later modeling

OUT = "colorado_zhvi_2000_2025.csv"
co_yearly.to_csv(OUT, index=False)
print(f"Saved: {OUT}  (rows: {len(co_yearly)})")
