# 05: Climate data aggregation
*Aggregate gridded WBGT in the shade estimates into region-averaged estimates. Do this for both the reference dataset (UHE-Daily) as well as the climate change projections developed by `02_generate.ipynb`. This code is based on the Pangeo post [Conservative Region Aggregation with Xarray, Geopandas and Sparse](https://discourse.pangeo.io/t/conservative-region-aggregation-with-xarray-geopandas-and-sparse/2715/1) by Ryan Abernathey. Much of the functionality is from the [extended example](https://discourse.pangeo.io/t/conservative-region-aggregation-with-xarray-geopandas-and-sparse/2715/16) by Rich Signell.*

In [1]:
import coiled
import dask
import numpy as np
import xarray as xr
from dask import delayed
from dask.distributed import progress
from utils import gcm_list, load_regions, prep_sparse, spatial_aggregation, load_ds

Set up cluster to handle multiprocessing using a Dask client.

In [2]:
cluster = coiled.Cluster(
    n_workers=1,
    name="05",
    worker_vm_types=["m7g.medium"],
    scheduler_vm_types=["c7g.large"],
    region="us-west-2",
    spot_policy="spot_with_fallback",
)

cluster.adapt(minimum=1, maximum=100)

client = cluster.get_client()

[2025-01-24 11:04:57,765][INFO    ][coiled] Fetching latest package priorities...
[2025-01-24 11:04:57,766][INFO    ][coiled.package_sync] Resolving your local extreme-heat Python environment...
[2025-01-24 11:04:58,181][INFO    ][coiled.package_sync] Scanning 285 conda packages...
[2025-01-24 11:04:58,183][INFO    ][coiled.package_sync] Scanning 154 python packages...
[2025-01-24 11:04:58,397][INFO    ][coiled] Running pip check...
[2025-01-24 11:04:58,795][INFO    ][coiled] Validating environment...
[2025-01-24 11:05:00,705][INFO    ][coiled] Creating wheel for ~/Documents/carbonplan/extreme-heat-extension/central_asia/notebooks...
[2025-01-24 11:05:00,810][INFO    ][coiled] Uploading coiled_local_notebooks...
[2025-01-24 11:05:02,342][INFO    ][coiled] Requesting package sync build...
[2025-01-24 11:05:03,364][INFO    ][coiled] Creating Cluster (name: 05, https://cloud.coiled.io/clusters/739294?account=carbonplan ). This usually takes 1-2 minutes...
2025-01-24 11:05:53,383 - distrib

[2025-01-24 11:09:03,454][INFO    ][coiled] Adaptive scaling up to 2 workers.


Define functions for the notebook.

In [3]:
def load_population(grid_name="CarbonPlan"):
    """
    Load the population data generated in `03_population.ipynb`.
    """
    population_dict = {
        "CHC": "s3://carbonplan-climate-impacts/extreme-heat/v1.0/inputs/"
        "GHS_POP_E2030_GLOBE_R2023A_4326_30ss_V1_0_resampled_to_UHE_daily.zarr",
        "CarbonPlan": "s3://carbonplan-climate-impacts/extreme-heat/v1.0/"
        "inputs/GHS_POP_E2030_GLOBE_R2023A_4326_30ss_V1_0_resampled_to_CP.zarr",
    }
    population = xr.open_zarr(population_dict[grid_name])
    population = population.rename({"x": "lon", "y": "lat"}).drop("spatial_ref")
    return population

In [None]:
def load_ds(gcm: str, scenario: str, years: np.ndarray):
    """
    Load in the gridded WBGT in the shade estimates from `02_generate.ipynb`.
    """
    ds = xr.open_zarr(
        f"s3://carbonplan-scratch/extreme-heat/wbgt-shade-gridded/years/{gcm}/{gcm}-{scenario}.zarr",
    )
    ds = ds.sel(time=slice(str(years[0]), str(years[-1])))
    ds = ds.assign_coords(lon=(((ds["lon"] + 180) % 360) - 180)).sortby("lon")

    return ds

In [5]:
lon = "lon"
lat = "lat"
scenario_years = {
    "historical": np.arange(1985, 2015),
    "ssp245": np.arange(2015, 2100),
    "ssp370": np.arange(2015, 2100),
}

In [6]:

regions_df = load_regions(extension="central-asia")

buffer = (
    0.5  # padding to expand bounds to ensure you grab the data covering each region
)
bbox = tuple(
    [
        regions_df.total_bounds[0] - buffer,
        regions_df.total_bounds[1] - buffer,
        regions_df.total_bounds[2] + buffer,
        regions_df.total_bounds[3] + buffer,
    ]
)

Access the gridded UHE-Daily data from Tuholske et al (2021) and extract timeseries for the regions of interest. These will form the reference dataset for `06_bias_correction.ipynb`. Thanks to Cascade Tuholske (Montana State University) and Pete Peterson (University of California, Santa Barbara) for making the gridded dataset available. The source gridded dataset may not remain available indefinitely, but the full city- and region-aggregated version is available here alongside the other inputs for the analysis, maintaining reproducibility of the project. 

The next steps aggregate the gridded datasets to region-average estimates. The non-city regions encompass all land area and thus sometimes include significant stretches of uninhabited land with potentially erroneously high or low temperatures (e.g., deserts). Weighting the aggregation by a gridded population product helps ensure that the estimates are human-relevant.

Load the UHE-Daily dataset and calculate weights.

In [None]:
# created from: https://github.com/carbonplan/uhe-daily-recipe
ds = xr.open_zarr(
    "s3://carbonplan-climate-impacts/extreme-heat-extension/v1.0/inputs/uhe_daily_zarr_v2.zarr",
    consolidated=True,
    chunks={},
)
ds
ds = ds.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))

population = load_population(grid_name="CHC")
population = population.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
sparse_weights, population = prep_sparse(
    ds, population, regions_df, return_population=True, variables_to_drop=["WBGT"]
)

Use weights to aggregate gridded estimates into region-average estimates.

Repeat the above process but for our gridded WBGT estimates developed in `02_generate.ipynb`. 

Load a sample dataset as a template to calculate weights. The same weights can be used for every projection because all GCMs are on the same 0.25 degree grid.

In [7]:
ds = load_ds("ACCESS-CM2", "historical", np.arange(1985, 1986))  # noqa : F821
ds = ds.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
population = load_population(grid_name="CarbonPlan")
population = population.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
sparse_weights, population = prep_sparse(
    ds, population, regions_df, return_population=True, variables_to_drop=["WBGT"]
)

  population = population.rename({"x": "lon", "y": "lat"}).drop("spatial_ref")


Generating weights...
population_weights calculated


Aggregate all gridded estimates into region-average estimates.

In [8]:
@delayed
def region_avg_estimate(gcm_scenario_tuple: tuple, population: xr.Dataset) -> tuple:
    gcm, scenario = gcm_scenario_tuple
    ds = load_ds(gcm, scenario, scenario_years[scenario])  # noqa : F821
    ds = ds.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
    population = population.sel(
        lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3])
    )

    regridded = spatial_aggregation(ds, sparse_weights, load=False)
    regridded = regridded.chunk(chunks={"time": -1, "processing_id": 100})
    fp = f"s3://carbonplan-scratch/extreme-heat/wbgt-shade-regions/{gcm}-{scenario}_ori.zarr"
    regridded.to_zarr(fp, mode="w", consolidated=True)
    return gcm_scenario_tuple

In [9]:
gcm_scenario_tuples = [
    (gcm, scenario)
    for gcm in gcm_list
    for scenario in ["historical", "ssp245", "ssp370"]
]

In [10]:
# ############## SUBSET TEMP #################
delayed_results = []
for gcm_scenario in [("ACCESS-CM2", "historical")]:
    result = region_avg_estimate(gcm_scenario, population)
    delayed_results.append(result)

In [11]:
results = dask.persist(delayed_results, retires=1)
progress(results)

VBox()

<!-- half this took about ~16 minutes -->

In [None]:
cluster.adapt(minimum=1, maximum=1000)

In [13]:
cluster.shutdown()

2025-01-24 11:14:37,986 - distributed.deploy.adaptive - INFO - Adaptive scaling stopped: minimum=1 maximum=100. Reason: unknown
[2025-01-24 11:14:38,297][INFO    ][coiled] Cluster 739294 deleted successfully.
