# üìò **Project Data Description: Housing Cost Drivers by City (ACS 5-Year 2020 vs 2023)**

This project analyzes **why housing costs are rising in U.S. cities** by examining key structural, economic, and demographic variables from the **American Community Survey (ACS) 5-Year Estimates** for **2021 and 2025**.
The goal is to identify how changes in supply, demand, and affordability pressures contribute to increases in rent and home values at the city level.

To do this, the analysis uses a **minimal but powerful set of ACS indicators**, grouped into four core categories:
(1) Housing Cost Outcomes
(2) Supply Constraints
(3) Demand Pressure
(4) Affordability Stress
plus (5) Zoning Proxies.

These variables collectively allow us to measure how fast cities are growing, how quickly they are adding housing, and whether residents can afford rising costs.

---


## üè† **1. Housing Cost Metrics (Outcomes to Explain)**

These are the primary outcomes in the model‚Äî**changes in rent and home value**.

| Concept               | ACS Table ID |
| --------------------- | ------------ |
| **Median Gross Rent** | **B25064**   |
| **Median Home Value** | **B25077**   |

Tracking these over time reveals how housing prices are evolving in each city.

---

In [6]:
import pandas as pd
import sqlite3

# Load CSV files into DataFrames
MGR_2021 = pd.read_csv('Housing_Cost_Metrics/Median_Gross_Rent_2021.csv')
MGR_2024 = pd.read_csv('Housing_Cost_Metrics/Median_Gross_Rent_2024.csv')
MV_2021 = pd.read_csv('Housing_Cost_Metrics/Median_Value_2021.csv')
MV_2024 = pd.read_csv('Housing_Cost_Metrics/Median_Value_2024.csv')



# ---------------------------------------------------
# üóÉÔ∏è Load DataFrames into an SQLite in-memory database
# ---------------------------------------------------
conn = sqlite3.connect(':memory:')

MGR_2021.to_sql('Median_Gross_Rent_2021', conn, index=False, if_exists='replace')
MGR_2024.to_sql('Median_Gross_Rent_2024', conn, index=False, if_exists='replace')
MV_2021.to_sql('Median_Value_2021', conn, index=False, if_exists='replace')
MV_2024.to_sql('Median_Value_2024', conn, index=False, if_exists='replace')




# ---------------------------
# üìä SQL Query to Join Tables
# ---------------------------
sql_query = """
SELECT DISTINCT
    g21.[Geographic Area Name],
    g21.[Median gross rent (2021)],
    g24.[Median gross rent (2024)],
    v21.[Median value (2021)],
    v24.[Median value (2024)]
FROM Median_Gross_Rent_2021 AS g21
LEFT JOIN Median_Gross_Rent_2024 AS g24
  ON g21.[Geographic Area Name] = g24.[Geographic Area Name]
LEFT JOIN Median_Value_2021 AS v21
  ON g21.[Geographic Area Name] = v21.[Geographic Area Name]
LEFT JOIN Median_Value_2024 AS v24
  ON g21.[Geographic Area Name] = v24.[Geographic Area Name];
"""
Housing_Cost_Metrics_df = pd.read_sql_query(sql_query, conn)



# Execute the query and load results into a DataFrame
Housing_Cost_Metrics_df = pd.read_sql_query(sql_query, conn)

# Save the merged DataFrame to a new CSV file
csv_filename = "Housing_Cost_Metrics_df.csv"
Housing_Cost_Metrics_df.to_csv(csv_filename, index=False)

# Close the connection
conn.close()

# Output the final DataFrame
Housing_Cost_Metrics_df

Unnamed: 0,Geographic Area Name,Median gross rent (2021),Median gross rent (2024),Median value (2021),Median value (2024)
0,"Auburn city, Alabama",1009,1159,319300,392900
1,"Birmingham city, Alabama",895,1206,117600,189800
2,"Dothan city, Alabama",832,1016,169200,222800
3,"Hoover city, Alabama",1212,1441,363200,424100
4,"Huntsville city, Alabama",983,1241,250400,339400
...,...,...,...,...,...
629,"Caguas zona urbana, Puerto Rico",608,642,118200,136400
630,"Carolina zona urbana, Puerto Rico",645,665,138400,161600
631,"Guaynabo zona urbana, Puerto Rico",997,1057,213900,257100
632,"Ponce zona urbana, Puerto Rico",535,495,102500,125300



## üèóÔ∏è **2. Housing Supply Indicators (Primary Drivers of Cost Changes)**

These variables capture **how much housing exists**, how fast it's being built, and whether units are sitting vacant. Supply shortages are one of the most direct causes of rising costs.

| Supply Driver                                | ACS Table ID |
| -------------------------------------------- | ------------ |
| **Total Housing Units**                      | **B25001**   |
| **Units Built by Year (New Construction)**   | **B25034**   |
| **Vacancy Rate (Overall)**                   | **B25002**   |
| **Housing Density Mix (Units in Structure)** | **B25024**   |

**Why these matter:**

* **Low vacancy** and **slow construction** lead to tighter markets and higher rents.
* The **mix of housing types** signals zoning and density limitations‚Äîcities dominated by single-family homes often have higher prices due to constrained development.

---

In [None]:
import pandas as pd
import sqlite3

# Load CSV files into DataFrames
THU_2021 = pd.read_csv('Housing_Supply_Indicators/Total_Housing_Units_2021.csv')
THU_2024 = pd.read_csv('Housing_Supply_Indicators/Total_Housing_Units_2024.csv')
TOS_2021 = pd.read_csv('Housing_Supply_Indicators/Total_Occupancy_Status_2021.csv')
TOS_2024 = pd.read_csv('Housing_Supply_Indicators/Total_Occupancy_Status_2024.csv')
TV_2021 = pd.read_csv('Housing_Supply_Indicators/Total_Vacant_2024.csv')
TV_2024 = pd.read_csv('Housing_Supply_Indicators/Total_Vacant_2024.csv')
UB_2021 = pd.read_csv('Housing_Supply_Indicators/Units_Built_2021.csv')
UB_2024 = pd.read_csv('Housing_Supply_Indicators/Units_Built_2024.csv')
HDM_2021 = pd.read_csv('Housing_Supply_Indicators/Units_in_Structure_2021.csv')
HDM_2024 = pd.read_csv('Housing_Supply_Indicators/Units_in_Structure_2024.csv')

# ---------------------------------------------------
# üóÉÔ∏è Load DataFrames into an SQLite in-memory database
# ---------------------------------------------------
conn = sqlite3.connect(':memory:')

THU_2021.to_sql('Total_Housing_Units_2021', conn, index=False, if_exists='replace')
THU_2024.to_sql('Total_Housing_Units_2024', conn, index=False, if_exists='replace')
TOS_2021.to_sql('Total_Occupancy_Status_2021', conn, index=False, if_exists='replace')
TOS_2024.to_sql('Total_Occupancy_Status_2024', conn, index=False, if_exists='replace')
TV_2021.to_sql('Total_Vacant_2021', conn, index=False, if_exists='replace')
TV_2024.to_sql('Total_Vacant_2024', conn, index=False, if_exists='replace')
UB_2021.to_sql('Units_Built_2021', conn, index=False, if_exists='replace')
UB_2024.to_sql('Units_Built_2024', conn, index=False, if_exists='replace')
HDM_2021.to_sql('Units_in_Structure', conn, index=False, if_exists='replace')
HDM_2024.to_sql('Median_Value_2024', conn, index=False, if_exists='replace')

# ---------------------------
# üìä SQL Query to Join Tables
# ---------------------------
sql_query = """
SELECT DISTINCT
    g21.[Geographic Area Name],
    g21.[Median gross rent (2021)],
    g24.[Median gross rent (2024)],
    v21.[Median value (2021)],
    v24.[Median value (2024)]
FROM Median_Gross_Rent_2021 AS g21
LEFT JOIN Median_Gross_Rent_2024 AS g24
  ON g21.[Geographic Area Name] = g24.[Geographic Area Name]
LEFT JOIN Median_Value_2021 AS v21
  ON g21.[Geographic Area Name] = v21.[Geographic Area Name]
LEFT JOIN Median_Value_2024 AS v24
  ON g21.[Geographic Area Name] = v24.[Geographic Area Name];
"""
Housing_Cost_Metrics_df = pd.read_sql_query(sql_query, conn)



# Execute the query and load results into a DataFrame
Housing_Cost_Metrics_df = pd.read_sql_query(sql_query, conn)

# Save the merged DataFrame to a new CSV file
csv_filename = "Housing_Cost_Metrics_df.csv"
Housing_Cost_Metrics_df.to_csv(csv_filename, index=False)

# Close the connection
conn.close()

# Output the final DataFrame
Housing_Cost_Metrics_df

## üìà **3. Demand Pressure Indicators (Economic Factors Pushing Prices Up)**

Housing becomes more expensive when **more people**, **higher incomes**, and **stronger employment** increase demand faster than supply expands.

| Demand Driver                           | ACS Table ID |
| --------------------------------------- | ------------ |
| **Total Population**                    | **B01003**   |
| **Median Household Income**             | **B19013**   |
| **Employment / Labor Force Indicators** | **S2301**    |

**Key idea:**
When population and incomes grow faster than housing units, cost pressures rise.

---


## üí∏ **4. Affordability Stress (Ability of Residents to Pay)**

These indicators measure how much of residents' income is being consumed by housing costs.

| Affordability Indicator              | ACS Table ID            |
| ------------------------------------ | ----------------------- |
| **Gross Rent as % of Income**        | **B25070**              |
| **Owner Costs as % of Income**       | **B25091**              |
| **Severely Burdened Renters (‚â•50%)** | Derived from **B25070** |

High burden levels indicate that housing costs are rising faster than incomes‚Äîand help identify cities where affordability is deteriorating.

---


## üß± **5. Zoning Proxies (Structural Indicators of Density Constraints)**

While ACS does not provide zoning data, it does offer **housing structure types**, which act as strong proxies for zoning restrictions.

| Zoning Indicator                      | ACS Table ID |
| ------------------------------------- | ------------ |
| **% Single-Family Detached (1-unit)** | **B25024**   |
| **% Multi-Family (10+ units)**        | **B25024**   |

**Interpretation:**

* Cities dominated by **single-family homes** tend to have higher land costs and limited supply expansion.
* Cities with more **10+ unit buildings** absorb demand better and typically see slower rent spikes.

---

---

# ‚úîÔ∏è **Summary**

This minimalist dataset provides everything needed to diagnose whether rising city housing costs are driven by:

* **Insufficient supply**
* **Surging demand (population, income, employment)**
* **Affordability constraints**
* **Structural zoning limitations**

By comparing these indicators between **2021 and 2025**, the project identifies the most important drivers of local housing cost changes and uncovers which cities are experiencing the most severe pressure.

---

If you want, I can also write:
üìå A matching **Methodology / Approach** section
üìå A **dashboard description** for Tableau
üìå A **Housing Pressure Index** definition for your analysis