# Ensemble reduction

This tutorial will explore ensemble reduction (also known as ensemble selection) using `xscen`. This will use the catalog from the Getting Started notebook, so make sure you run GettingStarted.ipynb before this one.

In [None]:
from pathlib import Path

import xscen as xs

output_folder = Path().absolute() / "_data"

# Open the Getting Started catalog
gettingStarted_cat = xs.DataCatalog(str(output_folder / "example-gettingstarted.json"))

## Preparing the data

Ensemble reduction is built upon climate indicators that are relevant to represent the ensemble's variability for a given application. In the case of Getting Started, two indicators were computed:

In [None]:
gettingStarted_cat.search(processing_level="deltas").unique("variable")

However, the functions implemented in `xclim.ensembles._reduce` require a very specific 2-D DataArray of dimensions "realization" and "criteria". That means that all the variables need to be combined and renamed, and that all dimensions need to be stacked together.

`xs.build_reduction_data` can be used to prepare the data for ensemble reduction. Its arguments are:

- `datasets` (dict, list)
- `xrfreqs` are the unique frequencies of the indicators.
- `horizons` is used to instruct on which horizon(s) to build the data from.

Because a simulation could have multiple datasets (in the case of multiple frequencies), an attempt will be made to decipher the ID and frequency from the metadata.

In [None]:
ds_dict = gettingStarted_cat.search(processing_level="deltas", domain="regular0-25")
data = xs.build_reduction_data(
    datasets=ds_dict.to_dataset_dict(),
    xrfreqs=ds_dict.unique("xrfreq"),
    horizons=["2005-2009", "2010-2014"],
)

data

The number of criteria corresponds to: `indicators x horizons x longitude x latitude`, but criteria that are purely NaN across all realizations are removed.

Note that `xs.spatial_mean` could have been used prior to calling that function to remove the spatial dimensions.

## Selecting a reduced ensemble

<div class="alert alert-info"> <b>NOTE</b>
    
Ensemble reduction in `xscen` is built upon `xclim.ensembles`. For more information on basic usage and available methods, [please consult their documentation](https://xclim.readthedocs.io/en/stable/notebooks/ensembles-advanced.html).
</div>

Ensemble reduction through `xscen.reduce_ensemble` consists in a simple call to `xclim`. The arguments are:
- `data`, which is the 2D DataArray that is created by using `xs.build_reduction_data`.
- `method` is either `kkz` or `kmeans`. See the link above for further details on each technique.
- `kwargs` is a dictionary of arguments to send to the method chosen.

In [None]:
selected, clusters, fig_data = xs.reduce_ensemble(
    data=data, method="kmeans", kwargs={"method": {"n_clusters": 2}}
)

The method always returns 3 outputs (selected, clusters, fig_data):
- `selected` is a DataArray of dimension 'realization' listing the selected simulations.
- `clusters` (kmeans only) groups every realization in their respective clusters in a python dictionary.
- `fig_data` (kmeans only) can be used to call `xclim.ensembles.plot_rsqprofile(fig_data)`

In [None]:
selected

In [None]:
clusters

In [None]:
from xclim.ensembles import plot_rsqprofile

plot_rsqprofile(fig_data)