# Customizing and controlling xclim

xclim's behaviour can be controlled globally or contextually through `xclim.set_options`, which acts the same way as `xarray.set_options`.

In [None]:
import xarray as xr
import xclim

Let's create fake data with some missing values and mask every 10th, 20th and 30th of the month.This represents 9.6-10% of masked data for all months except February where it is 7.1%.

In [None]:
tasmax = xr.tutorial.open_dataset('air_temperature').air.resample(time='D').max(keep_attrs=True)
tasmax = tasmax.where(tasmax.time.dt.day % 10 != 0)

## Checks
Above, we created fake temperature data from a xarray tutorial dataset that doesn't have all the standard CF attributes. By default, when triggering a computation with an Indicator from xclim, warnings will be raised:

In [None]:
tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq='MS') # compute monthly max tasmax

Setting `cf_compliance` to `'log'` mutes those warnings and sends them to the log instead.

In [None]:
xclim.set_options(cf_compliance='log')

tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq='MS') # compute monthly max tasmax

## Missing values

For example, one can globally change the missing method.

Change the default missing method to "pct" and set its tolerance to 8%:

In [None]:
xclim.set_options(check_missing='pct', missing_options={'pct': {'tolerance': 0.08}})

tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq='MS') # compute monthly max tasmax
tx_mean.sel(time='2013', lat=75, lon=200)

Only February has non-masked data. Let's say we want to use the "wmo" method (and its default options), but only once, we can do:

In [None]:
with xclim.set_options(check_missing="wmo"):
    tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq='MS') # compute monthly max tasmax
tx_mean.sel(time='2013', lat=75, lon=200)

This method checks that there is less than `nm=5` invalid values in a month and that there are no consecutive runs of `nc>=4` invalid values. Thus, every month is now valid.

Finally, it is possible for advanced users to register their own method. Xclim's missing methods are in fact based on class instances. Thus, to create a custom missing class, one should implement a subclass based on `xclim.core.checks.MissingBase` and overriding at least the `is_missing` method. The method should take a `null` argument and  a `count` argument.

- `null` is a `DataArrayResample` instance of the resampled mask of invalid values in the input dataarray.
- `count` is the number of days in each resampled periods and any number of other keyword arguments. 

The `is_missing` method should return a boolean mask, at the same frequency as the indicator output (same as `count`), where True values are for elements that are considered missing and masked on the output.

When registering the class with the `xclim.core.checks.register_missing_method` decorator, the keyword arguments will be registered as options for the missing method. One can also implement a `validate` static method that receives only those options and returns whether they should be considered valid or not.

In [None]:
from xclim.core.missing import register_missing_method
from xclim.core.missing import MissingBase
from xclim.indices.run_length import longest_run

@register_missing_method("consecutive")
class MissingConsecutive(MissingBase):
    """Any period with more than max_n consecutive missing values is considered invalid"""
    def is_missing(self, null, count, max_n=5):
        return null.map(longest_run, dim="time") >= max_n

    @staticmethod
    def validate(max_n):
        return max_n > 0


The new method is now accessible and usable with:

In [None]:
with xclim.set_options(check_missing="consecutive", missing_options={'consecutive': {'max_n': 2}}):
    tx_mean = xclim.atmos.tx_mean(tasmax=tasmax, freq='MS') # compute monthly max tasmax
tx_mean.sel(time='2013', lat=75, lon=200)

## Defining new indicators

xclim's Indicators are instances of subclasses of `xclim.core.indicator.Indicator`. They define the following key ingredients:

- the `identifier`, as string that uniquely identifies the indicator,
- the `realm`, one of "atmos", "land", "seaIce" or "ocean", classifying the domain of use of the indicator.
- the `compute` function that returns one or more DataArrays,
- the `cfcheck` and `datacheck` methods that make sure the inputs are appropriate and valid.
- the `missing` function that masks elements based on null values in the input.
- all metadata attributes that will be attributed to the output and that document the indicator.

See the [class documentation](../api.rst#indicator-tools) for more info on the available options for creating  indicators. The [indicators](https://github.com/Ouranosinc/xclim/tree/master/xclim/indicators) module contains over 50 examples of indicators to draw inspiration from.

New indicators can be created using standard Python subclasses:

In [None]:
class NewIndicator(xclim.core.indicator.Indicator):
    identifier = "new_indicator"
    missing = "any"
    realm = "atmos"

    @staticmethod
    def compute(tas):
        return tas.mean(dim="time")

    @staticmethod
    def cfcheck(tas):
        xclim.core.cfchecks.check_valid(tas, "standard_name", "air_temperature")

    @staticmethod
    def datacheck(tas):
        xclim.core.datachecks.check_daily(tas)

# An instance must be created to register and make the indicator usable
newind = NewIndicator()

Another mechanism to create subclasses is to call Indicator with all the attributes passed as arguments:

In [None]:
from xclim.core.indicator import Indicator

newind = Indicator(identifier="new_indicator", realm="atmos", compute=xclim.indices.tg_mean, var_name='tmean', units="K")

Behind the scene, this will create a `NEW_INDICATOR` subclass and return an instance. As in the case above, creating an indicator with a name already existing in the registry raises a warning.

One pattern to create multiple indicators is to write a standard subclass that declares all the attributes that are common to indicators, then call this subclass with the custom attributes. See for example in [xclim.indicators.atmos](https://github.com/Ouranosinc/xclim/blob/master/xclim/indicators/atmos/_temperature.py) how indicators based on daily mean temperatures are created from the :class:`Tas` subclass of the :class:`Daily` subclass.

### Subclass registries
All subclasses that are created from `Indicator` are stored in a *registry*. So for example:

In [None]:
from xclim.core.indicator import Daily, registry
my_indicator = Daily(identifier="my_indicator", realm="atmos", compute=lambda x: x.mean())
assert "MY_INDICATOR" in registry

This registry is meant to facilitate user customization of existing indicators. Keys in the registry are the uppercase version of the indicator's identifier. So for example, it you'd like a `tg_mean` indicator returning values in Celsius instead of Kelvins, you could simply do:

In [None]:
tg_mean_c = registry["TG_MEAN"](identifier="tg_mean_c", units="C")

Another use case for the registry would be to parse all available indicators. Then, to retrieve an instance from a subclass in the registry one can use:

In [None]:
tg_mean = registry["TG_MEAN"].get_instance()

Note that in the case of compute functions returning multiple outputs, metadata attributes may be given as lists of strings or strings. In the latter case, the string is assumed to be identical for all variables. However, the `var_name` attribute must be a list and have the same length as the number of outputs.

In [None]:
def compute_stats(data, freq='YS'):
    """Simple function returning the min, mean and max for each resampling period."""
    with xr.set_options(keep_attrs=True):
        da = data.resample(time=freq)
        return da.min(), da.mean(), da.max()

tg_stat = registry["TG_MEAN"](
    identifier="tg_stats",
    realm="atmos",
    compute=compute_stats,
    var_name=["tg_min", "tg_mean", "tg_max"],
    units="C",  # As only a str is passed, the three outputs will use the same value as attribute.
    long_name=["Minimum temperature", "Mean temperature", "Max temperature"],
)

In [None]:
tas = xr.tutorial.open_dataset('air_temperature').air.resample(time='D').mean(keep_attrs=True)
tas.attrs.update(cell_methods="time: mean within days", standard_name="air_temperature")

out = tg_stat(tas, freq='MS')  # Outputs 3 DataArrays
xr.merge(out)