# Heat Exposure and Environmental Inequity â€“ Initial Analysis

Goal: Quantify how **heat exposure** varies by **income level** and **demographic characteristics**, 
as a first step toward environmental equity analysis.

Part of the **Equity & Environmental Analytics** portfolio:
- Task: Descriptive + simple statistical analysis
- Focus: Who is more exposed to heat?


In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

import statsmodels.api as sm


## 1. Load Data

Assumed merged dataset at census tract / block group level:

- `geo_id`
- `heat_exposure` (e.g., mean summer LST or degree days)
- `median_income`
- `%people_of_color` or similar demographic indicator
- `population`
- Optional: `poverty_rate`, `%elderly`, etc.

Replace the path with your dataset.


In [4]:
data_path = "../data/heat_equity_sample.csv"
df = pd.read_csv(data_path)
df.head()


FileNotFoundError: [Errno 2] No such file or directory: '../data/heat_equity_sample.csv'

In [5]:
df.describe()

# Population-weighted mean heat exposure
weighted_mean = np.average(df["heat_exposure"], weights=df["population"])
weighted_mean


NameError: name 'df' is not defined

In [6]:
# Create income quintiles
df["income_quintile"] = pd.qcut(df["median_income"], 5, labels=[1,2,3,4,5])

exposure_by_income = (
    df.groupby("income_quintile")
      .apply(lambda g: np.average(g["heat_exposure"], weights=g["population"]))
      .reset_index(name="pop_weighted_heat")
)

exposure_by_income


NameError: name 'df' is not defined

In [7]:
plt.figure(figsize=(6,4))
plt.plot(exposure_by_income["income_quintile"], exposure_by_income["pop_weighted_heat"], marker="o")
plt.xlabel("Income quintile (1 = lowest)")
plt.ylabel("Population-weighted heat exposure")
plt.title("Heat Exposure by Income Quintile")
plt.tight_layout()
plt.show()


NameError: name 'exposure_by_income' is not defined

<Figure size 600x400 with 0 Axes>

In [8]:
cols = ["median_income", "pct_people_of_color"]
X = df[cols]
X = sm.add_constant(X)
y = df["heat_exposure"]

model = sm.OLS(y, X).fit()
print(model.summary())


NameError: name 'df' is not defined

## 4. Interpretation (Notes)

- Sign and magnitude of coefficients:
  - `median_income`: negative coefficient suggests higher income areas may have lower heat exposure.
  - `pct_people_of_color`: positive coefficient suggests disproportionate exposure.
- This is an exploratory model and does **not** prove causality.
- Next steps: add controls, spatial models, and robustness checks.
