# 05: Climate data aggregation
*Aggregate gridded WBGT in the shade estimates into region-averaged estimates. Do this for both the reference dataset (UHE-Daily) as well as the climate change projections developed by `02_generate.ipynb`. This code is based on the Pangeo post [Conservative Region Aggregation with Xarray, Geopandas and Sparse](https://discourse.pangeo.io/t/conservative-region-aggregation-with-xarray-geopandas-and-sparse/2715/1) by Ryan Abernathey. Much of the functionality is from the [extended example](https://discourse.pangeo.io/t/conservative-region-aggregation-with-xarray-geopandas-and-sparse/2715/16) by Rich Signell.*

In [2]:
import logging
import time

import coiled
import numpy as np
import xarray as xr
from utils import gcm_list, load_regions, prep_sparse, spatial_aggregation

Set up cluster to handle multiprocessing using a Dask client.

In [None]:
cluster = coiled.Cluster(
    n_workers=2,
    worker_vm_types=["m7g.large"],
    scheduler_vm_types=["m7g.4xlarge"],
    region="us-west-2",
    spot_policy="spot_with_fallback",
)


cluster.adapt(minimum=2, maximum=50)

client = cluster.get_client()

Define functions for the notebook.

In [2]:
def load_population(grid_name="CarbonPlan"):
    """
    Load the population data generated in `03_population.ipynb`.
    """
    population_dict = {
        "CHC": "s3://carbonplan-climate-impacts/extreme-heat/v1.0/inputs/"
        "GHS_POP_E2030_GLOBE_R2023A_4326_30ss_V1_0_resampled_to_UHE_daily.zarr",
        "CarbonPlan": "s3://carbonplan-climate-impacts/extreme-heat/v1.0/"
        "inputs/GHS_POP_E2030_GLOBE_R2023A_4326_30ss_V1_0_resampled_to_CP.zarr",
    }
    population = xr.open_zarr(population_dict[grid_name])
    population = population.rename({"x": "lon", "y": "lat"}).drop("spatial_ref")
    return population

In [4]:
def load_ds(gcm: str, scenario: str, years: np.ndarray):
    """
    Load in the gridded WBGT in the shade estimates from `02_generate.ipynb`.
    """
    ds = xr.open_zarr(
        f"s3://carbonplan-scratch/extreme-heat/wbgt-shade-gridded/years/{gcm}/{gcm}-{scenario}.zarr",
        # chunks={},
    )
    ds = ds.sel(time=slice(str(years[0]), str(years[-1])))
    ds = ds.assign_coords(lon=(((ds["lon"] + 180) % 360) - 180)).sortby("lon")

    return ds

In [4]:
ds = xr.open_zarr(
    f"s3://carbonplan-scratch/extreme-heat/wbgt-shade-gridded/years/ACCESS-CM2/ACCESS-CM2-ssp245.zarr",
    # chunks={},
)

In [4]:
ds = load_ds("ACCESS-CM2", "historical", np.arange(1985, 1986))  # noqa : F821


In [5]:
ds = load_ds("ACCESS-CM2", "ssp245", np.arange(2080, 2099))  # noqa : F821


In [1]:
lon = "lon"
lat = "lat"
scenario_years = {
    "historical": np.arange(1985, 2015),
    "ssp245": np.arange(2015, 2100),
    "ssp370": np.arange(2015, 2100),
}

NameError: name 'np' is not defined

In [6]:


regions_df = load_regions(extension="central-asia")  

buffer = (
    0.5  # padding to expand bounds to ensure you grab the data covering each region
)
bbox = tuple(
    [
        regions_df.total_bounds[0] - buffer,
        regions_df.total_bounds[1] - buffer,
        regions_df.total_bounds[2] + buffer,
        regions_df.total_bounds[3] + buffer,
    ]
)

Access the gridded UHE-Daily data from Tuholske et al (2021) and extract timeseries for the regions of interest. These will form the reference dataset for `06_bias_correction.ipynb`. Thanks to Cascade Tuholske (Montana State University) and Pete Peterson (University of California, Santa Barbara) for making the gridded dataset available. The source gridded dataset may not remain available indefinitely, but the full city- and region-aggregated version is available here alongside the other inputs for the analysis, maintaining reproducibility of the project. 

The next steps aggregate the gridded datasets to region-average estimates. The non-city regions encompass all land area and thus sometimes include significant stretches of uninhabited land with potentially erroneously high or low temperatures (e.g., deserts). Weighting the aggregation by a gridded population product helps ensure that the estimates are human-relevant.

Load the UHE-Daily dataset and calculate weights.

In [None]:
# created from: https://github.com/carbonplan/uhe-daily-recipe
ds = xr.open_zarr(
    "s3://carbonplan-climate-impacts/extreme-heat-extension/v1.0/inputs/uhe_daily.zarr",
    zarr_format=3,
    consolidated=False,
    chunks={},
)
ds = ds.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
population = load_population(grid_name="CHC")
population = population.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
sparse_weights, population = prep_sparse(
    ds, population, regions_df, return_population=True, variables_to_drop=["WBGT"]
)

Use weights to aggregate gridded estimates into region-average estimates.

In [None]:
variables_to_drop = ["WBGT"]
sample_time_slice = ds.isel(time=0)["WBGT"].load()
regridded = spatial_aggregation(ds, sparse_weights, load=False)
regridded = regridded.chunk(chunks={"time": -1, "processing_id": 1000})
logging.info(f"{time.ctime()}: Adjusting time dtype")
regridded_dt = regridded.assign_coords(
    {"time": regridded.time.astype("datetime64[ns]")}
)
logging.info(f"{time.ctime()}: Writing Zarr store")
fp = "s3://carbonplan-climate-impacts/extreme-heat/v1.1/inputs/wbgt-UHE-daily-historical.zarr"
regridded_dt.to_zarr(fp, consolidated=True, mode="w")

Repeat the above process but for our gridded WBGT estimates developed in `02_generate.ipynb`. 

Load a sample dataset as a template to calculate weights. The same weights can be used for every projection because all GCMs are on the same 0.25 degree grid.

In [None]:
ds = load_ds("ACCESS-CM2", "historical", np.arange(1985, 1986))  # noqa : F821
ds = ds.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
population = load_population(grid_name="CarbonPlan")
population = population.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
sparse_weights, population = prep_sparse(
    ds, population, regions_df, return_population=True, variables_to_drop=["WBGT"]
)

Aggregate all gridded estimates into region-average estimates.

In [None]:
# UPDATE - SUBSET HERE 
for gcm in gcm_list[0:1]:
    for scenario in ["historical", "ssp245", "ssp370"]:
        logging.info(f"Starting: {time.ctime()}: {gcm}-{scenario}")
        ds = load_ds(gcm, scenario, scenario_years[scenario])  # noqa : F821
        ds = ds.sel(lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3]))
        population = population.sel(
            lon=slice(bbox[0], bbox[2]), lat=slice(bbox[1], bbox[3])
        )
        sample_time_slice = ds.isel(time=0)["WBGT"].load()

        regridded = spatial_aggregation(ds, sparse_weights, load=False)
        regridded = regridded.chunk(chunks={"time": -1, "processing_id": 100})
        fp = f"s3://carbonplan-scratch/extreme-heat/wbgt-shade-regions/{gcm}-{scenario}.zarr"
        logging.info(f"Writing: {time.ctime()}: {fp}")
        # writing to zarr version 2 b/c of the BytesBytesCodec error
        regridded.to_zarr(fp,     zarr_format=2, mode='w',
            consolidated=True)
