# Climate indicator forecasts over Vanuatu using CMIP6 data

In this tutorial, we'll cover the following:
- How to select climate datasets from the CMIP6 archive
- Loading CMIP6 data stored in the Zarr format
- Calculate climate indices for extreme weather forecasts

In [None]:
# !mamba uninstall odc-loader
# !mamba install cartopy obstore 'zarr>=3' 'python=3.11'

In [None]:
import cartopy.crs as ccrs
import dask.diagnostics
import matplotlib.pyplot as plt
import obstore
import obstore.auth.planetary_computer
import pandas as pd
import planetary_computer
import pystac_client
import tqdm
import xarray as xr
import xclim
import zarr

## Part 1: Getting cloud hosted CMIP6 data

The [Coupled Model Intercomparison Project Phase 6 (CMIP6)](https://en.wikipedia.org/wiki/CMIP6#CMIP_Phase_6)
dataset is a rich archive of modelling experiments carried out to predict the climate change impacts.
The datasets are stored using the [Zarr](https://zarr.dev) format, and we'll go over how to access it.

**Note**: This section was adapted from https://tutorial.xarray.dev/intermediate/remote_data/cmip6-cloud.html

Sources:
- https://esgf-node.llnl.gov/search/cmip6/
- CMIP6 data hosted on Google Cloud - https://console.cloud.google.com/marketplace/details/noaa-public/cmip6
- Pangeo/ESGF Cloud Data Access tutorial - https://pangeo-data.github.io/pangeo-cmip6-cloud/accessing_data.html

First, let's open a CSV containing the list of CMIP6 datasets available

In [None]:
df = pd.read_csv("https://cmip6.storage.googleapis.com/pangeo-cmip6.csv")
print(f"Number of rows: {len(df)}")
df.head()

Over 500,000 rows! Let's filter it down to the variable and experiment
we're interested in, e.g. daily max near-surface air temperature.

For the `variable_id`, you can look it up given some keyword at
https://docs.google.com/spreadsheets/d/1UUtoz6Ofyjlpx5LdqhKcwHFz2SGoTQV2_yekHyMfL9Y

For the `experiment_id`, download the spreadsheet from
https://github.com/ES-DOC/esdoc-docs/blob/master/cmip6/experiments/spreadsheet/experiments.xlsx,
go to the 'experiment' tab, and find the one you're interested in.

Another good place to find the right model runs is https://esgf-node.llnl.gov/search/cmip6
(once you get your head around the acronyms and short names).

Below, we'll filter to CMIP6 experiments matching:
- Daily Maximum Near-Surface Air Temperature [K] (variable_id: `tasmax`)
  - Alternatively, you can choose `pr` to get precipitation
- Shared Socioeconomic Pathway 3 (experiment_id: `ssp370`)
  - Alternatively, you can also try `ssp585` for the worst case scenario

In [None]:
df_tasmax = df.query("variable_id == 'tasmax' & experiment_id == 'ssp370'")
df_tasmax

There's 376 modelled scenarios for SSP3.
Let's just get the URL to the last one in the list for now.

In [None]:
print(df_tasmax.zstore.iloc[-1])

## Part 2: Reading from the remote Zarr store

In many cases, you'll need to first connect to the cloud provider.
The CMIP6 dataset allows anonymous access, but for some cases,
you may need to authentication.

We'll connect to the CMIP6 Zarr store on Google Cloud using
[`zarr.storage.ObjectStore`](https://zarr.readthedocs.io/en/v3.1.0/user-guide/storage.html#object-store):

In [None]:
gcs_store = obstore.store.from_url(
    url="gs://cmip6/CMIP6/ScenarioMIP/CMCC/CMCC-ESM2/ssp370/r1i1p1f1/day/tasmax/gn/v20210202/",
    skip_signature=True,
)
store = zarr.storage.ObjectStore(store=gcs_store, read_only=True)

Once the Zarr store connection is in place, we can open it into an `xarray.Dataset` like so.

In [None]:
ds = xr.open_zarr(store=store, consolidated=True, zarr_format=2)
ds

### Selecting time slices

Let's say we want to calculate temperature changes between
2025 and 2050. We can access just the specific time points
needed using [`xr.Dataset.sel`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.sel.html).

In [None]:
tasmax_2025jan = ds.tasmax.sel(time="2025-01-16").squeeze()
tasmax_2050dec = ds.tasmax.sel(time="2050-12-16").squeeze()

Temperature change would just be 2050 minus 2025.

In [None]:
tasmax_change = tasmax_2050dec - tasmax_2025jan

Note that up to this point, we have not actually downloaded any
(big) data yet from the cloud. This is all working based on
metadata only.

To bring the data from the cloud to your local computer, call `.compute`.
This will take a while depending on your connection speed.

In [None]:
tasmax_change = tasmax_change.compute()

We can do a quick plot to show how maximum near-surface temperature
is predicted to change between 2025-2050 (from one modelled experiment).

In [None]:
tasmax_change.plot.imshow()

This temperature change is for the entire planet. Let's zoom in over Vanuatu

In [None]:
projection = ccrs.epsg(code=3832)  # PDC Mercator
fig, ax = plt.subplots(nrows=1, ncols=1, subplot_kw=dict(projection=projection))
tasmax_change.sel(lon=slice(166, 170), lat=slice(-22, -11)).plot.imshow(
    ax=ax,
    transform=ccrs.PlateCarree()
)
ax.set_extent(extents=[166, 170, -22, -11], crs=ccrs.PlateCarree())
ax.coastlines()

This CMIP6 output is very coarse, so there's only 4x11=44 pixels covering Vanuatu.
You could try to say that the northern regions is forecasted to experience
higher daily maximum temperatures than the South in 2050 compared to 2025,
but it's best to get more data before jumping to these conclusions.
Specifically:

- Look at an ensemble of forecasts from different CMIP6 models
- Potentially look at downscaled data that will show local patterns.

## Part 2b: Getting downscaled climate data

For a small country like Vanuatu that is ~400km wide by ~800km long,
it may be desirable to obtain higher spatial resolution projections.
The original CMIP6 datasets have a coarse spatial resolution of
1 arc degree or more (>100 km).

There are groups that have taken these CMIP6 datasets and processed them
using statistical downscaling + bias correction algorithms to produce
higher spatial resolution outputs of 0.25 arc degrees (~25km) or so.
Examples include:

- Climate Impact Lab's [Global Downscaled Projections for Climate Impacts Research](https://github.com/ClimateImpactLab/downscaleCMIP6)
- NASA Earth Exchange Global Daily Downscaled Projection ([NEX-GDDP-CMIP6](https://www.nccs.nasa.gov/services/data-collections/land-based-products/nex-gddp-cmip6))
- Carbonplan's CMIP6 downscaled products based on [4 different methods](https://carbonplan.org/research/cmip6-downscaling-explainer)

References:
- Gergel, D. R., Malevich, S. B., McCusker, K. E., Tenezakis, E., Delgado, M. T., Fish, M. A., and Kopp, R. E.: Global Downscaled Projections for Climate Impacts Research (GDPCIR): preserving quantile trends for modeling future climate impacts, Geosci. Model Dev., 17, 191â€“227, https://doi.org/10.5194/gmd-17-191-2024, 2024.
- TODO

The examples below will use the
[CIL-GDPCIR-CC0](https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc0)
dataset hosted on Planetary Computer

In [None]:
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1/",
    modifier=planetary_computer.sign_inplace,
)

In [None]:
search = catalog.search(
    collections=["cil-gdpcir-cc0"],  # add "cil-gdpcir-cc-by" for more models
    query={"cmip6:experiment_id": {"eq": "ssp370"}},
)
ensemble = search.item_collection()
len(ensemble)

Let's look at precipitation (pr).

Code based on:
- https://planetarycomputer.microsoft.com/dataset/cil-gdpcir-cc0#Ensemble-example
- https://developmentseed.org/obstore/v0.7.1/examples/zarr/#example

In [None]:
# select this variable ID for all models in a collection
variable_id = "pr"  # or "tasmax"

datasets_by_model = []

for item in tqdm.tqdm(ensemble):
    asset = item.assets[variable_id]
    # print(asset.href, asset.extra_fields["xarray:open_kwargs"])

    credential_provider = obstore.auth.planetary_computer.PlanetaryComputerCredentialProvider(
        url=asset.href, account_name="rhgeuwest"
    )
    azure_store = obstore.store.from_url(url=asset.href, credential_provider=credential_provider)
    
    zarr_store = zarr.storage.ObjectStore(store=azure_store, read_only=True)
    datasets_by_model.append(xr.open_zarr(store=zarr_store, chunks={}))

all_datasets = xr.concat(
    datasets_by_model,
    dim=pd.Index([ds.attrs["source_id"] for ds in datasets_by_model], name="model"),
    combine_attrs="drop_conflicts",
)

In [None]:
all_datasets

We now have a data cube consisting of 3 models, spanning a time range from 2015 to 2100.
Let's subset the data over Vanuatu.

In [None]:
ds_vanuatu = all_datasets.sel(lon=slice(166, 170), lat=slice(-22, -11))
ds_vanuatu

## Part 3: Compute climate indicators using `xmip`

Compute [maximum consecutive dry days](https://xclim.readthedocs.io/en/stable/api_indicators.html#xclim.indicators.atmos.maximum_consecutive_dry_days)
with threshold of <1 mm/day.

In [None]:
da_cdd = xclim.indicators.atmos.maximum_consecutive_dry_days(
    ds=ds_vanuatu,
    thresh="1 mm/day",
)
da_cdd

In [None]:
with dask.diagnostics.ProgressBar():
  vu_cdd = da_cdd.isel(model=0, time=-1).compute()

In [None]:
projection = ccrs.epsg(code=3832)  # PDC Mercator
fig, ax = plt.subplots(nrows=1, ncols=1, subplot_kw=dict(projection=projection))
vu_cdd.sel(lon=slice(166, 170), lat=slice(-22, -11)).plot.imshow(
    ax=ax,
    transform=ccrs.PlateCarree()
)
ax.set_extent(extents=[166, 170, -22, -11], crs=ccrs.PlateCarree())
ax.coastlines()

In [None]:
# TODO plot time-series
# TODO use ensemble

That's all! Hopefully this will get you started on how to handle CMIP6 climate data!