<a href="https://colab.research.google.com/github/cchen744/uhi-extreme-heat-response/blob/main/notebooks/01_data_exploration.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook, I will quantify Urban Heat Isalnd Effect and Extreme Heat within one single city (Chicago). This data preprocessing workflow will be applited to more cities later.
Notably, here are following definitions of variables of interest:
- **Urban Heat Island**: Based on land surface temperture, it is defined as SUHI(day) = LST_urb(day) − LST_rur(day), where LST_urb(day) = Aggregate urban pixels (mean or median, predefined) and LST_rur(day) = Aggregate rural reference pixels.
- **Extreme Heat Window**: three consecutive days when the daily mean landsurface temperature overpasses 95 percentile of its historical data.
- **Urban & Rural Definition**：

  - Urban: US Census Urbanized Area
  - Rural: Spatial mean of all non-urban, non-water pixels within the same UA



In [60]:
ee.Authenticate()
ee.Initialize(project='extremeweatheruhi')

In [79]:
from pathlib import Path
import os
import pandas as pd
import ee
import uhi_pipeline
import importlib
importlib.reload(uhi_pipeline)
print("uhi_pipeline module reloaded.")

DATA_DIR = Path("data/cities")
DATA_DIR.mkdir(parents=True, exist_ok=True)

ua_fc = ee.FeatureCollection("projects/extremeweatheruhi/assets/uac20_2025")


uhi_pipeline module reloaded.


In [80]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [81]:
def append_csv(df_part: pd.DataFrame, out_csv: Path):
    if df_part is None or df_part.empty:
        return
    header = not out_csv.exists()
    df_part.to_csv(out_csv, mode="a", header=header, index=False)

In [82]:
START_DATE = "2013-06-01"
END_DATE   = "2019-08-31"

CFG = dict(
    lst_band="LST_Night_1km",
    qc_band="QC_Night",
    agg_func="mean",
    ring_outer_m=30000,
    ring_inner_m=5000,
    lst_scale_m=1000,
    min_urban_pixels=30,
    min_rural_pixels=30,
    extreme_percentile=90,
)

def build_city_safe(city_name: str):
    out_csv = DATA_DIR / f"{city_name}_daily_suhi.csv"

    #
    done_months = set()
    if out_csv.exists():
        tmp = pd.read_csv(out_csv, usecols=["date"])
        done_months = set(pd.to_datetime(tmp["date"]).dt.strftime("%Y-%m"))

    months_done = 0
    rows_written = 0

    for s, e in uhi_pipeline.month_starts(START_DATE, END_DATE):
        ym = s[:7]
        if ym in done_months:
            continue

        df_m = uhi_pipeline.run_city(
            ua_contains=city_name,
            ua_fc=ua_fc,
            start_date=s,
            end_date=e,
            out_csv=None,
            **CFG,
        )
        if df_m is None or df_m.empty:
            continue

        append_csv(df_m, out_csv)
        months_done += 1
        rows_written += len(df_m)

    return {"city": city_name, "file": str(out_csv), "months_done": months_done, "rows_written": rows_written}

In [83]:
import geemap
intended_cities = ['Phoenix','Houston','Chicago']
for city_name in intended_cities:
  city = ua_fc.filter(ee.Filter.stringContains("NAME20", city_name))
  city_count = city.size().getInfo()

  if city_count > 0:
      print(f"{city_name} found in ua_fc with {city_count} features.")
      display(geemap.ee_to_df(city))
  else:
      print(f"{city_name} not found in ua_fc.")

Phoenix found in ua_fc with 2 features.


Unnamed: 0,ALAND20,AWATER20,FUNCSTAT20,GEOID20,GEOIDFQ20,INTPTLAT20,INTPTLON20,LSAD20,MTFCC20,NAME20,NAMELSAD20,UACE20
0,330537813,985600,S,69192,400C200US69192,33.4679624,-112.3639331,67,G3500,"Phoenix West--Goodyear--Avondale, AZ","Phoenix West--Goodyear--Avondale, AZ Urban Area",69192
1,2876252398,8810551,S,69184,400C200US69184,33.4999005,-111.9631531,67,G3500,"Phoenix--Mesa--Scottsdale, AZ","Phoenix--Mesa--Scottsdale, AZ Urban Area",69184


Houston found in ua_fc with 1 features.


Unnamed: 0,ALAND20,AWATER20,FUNCSTAT20,GEOID20,GEOIDFQ20,INTPTLAT20,INTPTLON20,LSAD20,MTFCC20,NAME20,NAMELSAD20,UACE20
0,4540250252,65791956,S,40429,400C200US40429,29.7730212,-95.4003948,67,G3500,"Houston, TX","Houston, TX Urban Area",40429


Chicago found in ua_fc with 1 features.


Unnamed: 0,ALAND20,AWATER20,FUNCSTAT20,GEOID20,GEOIDFQ20,INTPTLAT20,INTPTLON20,LSAD20,MTFCC20,NAME20,NAMELSAD20,UACE20
0,6055713369,100561449,S,16264,400C200US16264,41.8304099,-87.9086694,67,G3500,"Chicago, IL--IN","Chicago, IL--IN Urban Area",16264


In [84]:
CITIES = ["Phoenix", "Houston", "Chicago"]

results = []
for c in CITIES:
    print("Building:", c)
    results.append(build_city_safe(c))

pd.DataFrame(results)[["city", "months_done", "rows_written"]].head()


Building: Phoenix




Building: Houston




Building: Chicago




Unnamed: 0,city,months_done,rows_written
0,Phoenix,75,2000
1,Houston,75,1699
2,Chicago,75,1714
