In [None]:
# Remove flox spam

import logging

# Get the logger for the 'flox' package
logger = logging.getLogger("flox")
# Set the logging level to WARNING
logger.setLevel(logging.WARNING)

# Ensembles

## Ensemble reduction

This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use pre-computed annual mean temperatures from `xclim.testing`.

In [None]:
import pooch
import xarray as xr
from xclim.testing.utils import nimbus

import xscen as xs

downloader = pooch.HTTPDownloader(headers={"User-Agent": f"xscen-{xs.__version__}"})

datasets = {
    "ACCESS": "EnsembleStats/BCCAQv2+ANUSPLIN300_ACCESS1-0_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc",
    "BNU-ESM": "EnsembleStats/BCCAQv2+ANUSPLIN300_BNU-ESM_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc",
    "CCSM4-r1": "EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r1i1p1_1950-2100_tg_mean_YS.nc",
    "CCSM4-r2": "EnsembleStats/BCCAQv2+ANUSPLIN300_CCSM4_historical+rcp45_r2i1p1_1950-2100_tg_mean_YS.nc",
    "CNRM-CM5": "EnsembleStats/BCCAQv2+ANUSPLIN300_CNRM-CM5_historical+rcp45_r1i1p1_1970-2050_tg_mean_YS.nc",
}

for d in datasets:
    file = nimbus().fetch(datasets[d], downloader=downloader)
    ds = xr.open_dataset(file).isel(lon=slice(0, 4), lat=slice(0, 4))
    ds = xs.climatological_op(
        ds,
        op="mean",
        window=30,
        periods=[[1981, 2010], [2021, 2050]],
        horizons_as_dim=True,
    ).drop_vars("time")
    datasets[d] = xs.compute_deltas(ds, reference_horizon="1981-2010")

In [None]:
datasets

### Preparing the data

Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In this case, we'll use the mean temperature delta between 2021-2050 and 1981-2010, but monthly or seasonal indicators could also be required. The `horizons_as_dim` argument in `climatological_op` can help combine indicators of multiple frequencies into a single dataset. Alternatively, `xscen.utils.unstack_dates` can also accomplish the same thing if the climatological operations have already been computed.

The functions implemented in `xclim.ensembles._reduce` require a very specific 2-D DataArray of dimensions "realization" and "criteria". The first solution is to first create an ensemble using `xclim.ensembles.create_ensemble`, then pass the result to `xclim.ensembles.make_criteria`. Alternatively, the datasets can be passed directly to `xscen.ensembles.reduce_ensemble` and the necessary preliminary steps will be accomplished automatically.

In this example, the number of criteria will corresponds to: `indicators x horizons x longitude x latitude`, but criteria that are purely NaN across all realizations will be removed.

Note that `xs.spatial_mean` could have been used prior to calling that function to remove the spatial dimensions.

### Selecting a reduced ensemble

<div class="alert alert-info"> <b>NOTE</b>
    
Ensemble reduction in `xscen` is built upon `xclim.ensembles`. For more information on basic usage and available methods, [please consult their documentation](https://xclim.readthedocs.io/en/stable/notebooks/ensembles-advanced.html).
</div>

Ensemble reduction through `xscen.reduce_ensemble` consists in a simple call to `xclim`. The arguments are:
- `data`, which is the aforementioned 2D DataArray, or the list/dict of datasets required to build it.
- `method` is either `kkz` or `kmeans`. See the link above for further details on each technique.
- `horizons` is used to instruct on which horizon(s) to build the data from, if data needs to be constructed.
- `create_kwargs`, the arguments to pass to `xclim.ensembles.create_ensemble` if data needs to be constructed.
- `kwargs` is a dictionary of arguments to send to the clustering method chosen.

In [None]:
selected, clusters, fig_data = xs.reduce_ensemble(
    data=datasets, method="kmeans", horizons=["2021-2050"], max_clusters=3
)

The method always returns 3 outputs (selected, clusters, fig_data):
- `selected` is a DataArray of dimension 'realization' listing the selected simulations.
- `clusters` (kmeans only) groups every realization in their respective clusters in a python dictionary.
- `fig_data` (kmeans only) can be used to call `xclim.ensembles.plot_rsqprofile(fig_data)`

In [None]:
selected

In [None]:
# To see the clusters in more details
clusters

In [None]:
from xclim.ensembles import plot_rsqprofile

plot_rsqprofile(fig_data)

## Ensemble partition
This tutorial will show how to use xscen to create the input for [xclim partition functions](https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning).

In [None]:
# Get catalog
from pathlib import Path

import xclim as xc

output_folder = Path().absolute() / "_data"
cat = xs.DataCatalog(str(output_folder / "tutorial-catalog.json"))

# create a dictionary of datasets wanted for the partition
input_dict = cat.search(variable="tas", member="r1i1p1f1").to_dataset_dict(
    xarray_open_kwargs={"engine": "h5netcdf"}
)
datasets = {}
for k, v in input_dict.items():
    ds = xc.atmos.tg_mean(v.tas).to_dataset()
    ds.attrs = v.attrs
    datasets[k] = ds

From a dictionary of datasets, the function creates a dataset with new dimensions in `partition_dim`(`["source", "experiment", "bias_adjust_project"]`, if they exist). In this toy example, we only have different experiments.
- By default, it translates the xscen vocabulary (eg. `experiment`) to the xclim partition vocabulary (eg. `scenario`). It is possible to pass `rename_dict` to rename the dimensions with other names.
- If the inputs are not on the same grid, they can be regridded through `regrid_kw` or subset to a point through `subset_kw`. The functions assumes that if there are different `bias_adjust_project`, they will be on different grids (with all `source` on the same grid). If there is one or less `bias_adjust_project`, the assumption is that`source` have different grids.
- You can also compute indicators on the data if the input is daily. This can be especially useful when the daily input data is on different calendars.

In [None]:
# build a single dataset
import xclim as xc

ds = xs.ensembles.build_partition_data(
    datasets,
    subset_kw=dict(name="mtl", method="gridpoint", lat=[45.5], lon=[-73.6]),
)
ds

Pass the input to an xclim partition function.

In [None]:
# This is a hidden cell.
# extend with fake data to have at least 3 years
import xarray as xr

ds2 = ds.copy()
ds["time"] = xr.cftime_range(start="2001-01-01", periods=len(ds["time"]), freq="YS")
ds2["time"] = xr.cftime_range(start="2003-01-01", periods=len(ds["time"]), freq="YS")
ds = xr.concat([ds, ds2], dim="time")

In [None]:
# compute uncertainty partitioning
mean, uncertainties = xc.ensembles.hawkins_sutton(ds.tg_mean)
uncertainties

<div class="alert alert-info"> <b>NOTE</b>
    
Note that the [figanos library](https://figanos.readthedocs.io/en/latest/) provides a function `fg.partition` to plot the uncertainties.
    
</div>