# Basic Usage


## Climate indicator computations

`xclim` is a library of climate indicators that operate on [xarray](https://xarray.pydata.org/) `DataArray` objects. 

`xclim` provides two layers of computations, one responsible for computations and units handling (the computation layer, the **indices**), and the other responsible for input health checks and metadata formatting (the CF layer, refering to the Climate and Forecast convention, the **indicators**). Functions from the computation layer are found in `xclim.indices`, while indicator objects from the CF layer are found in *realm* modules (`xclim.atmos`, `xclim.land` and `xclim.seaIce`). Users should always use the indicators, and maybe revert to indices as a last resort if the indicator machinery becomes too heavy for their special edge case.

To use xclim in a project, import both `xclim` and `xarray`. 

In [None]:
from __future__ import annotations

import xarray as xr

import xclim
from xclim.testing import open_dataset

Indice calculations are performed by opening a netCDF-like file, accessing the variable of interest, and calling the indice function, which returns a new DataArray. 

For this example, we'll first open a demonstration dataset storing surface air temperature and compute the number of growing degree days (the sum of degrees above a certain threshold) at the monthly frequency. 

In [None]:
# ds = xr.open_dataset("your_file.nc")
ds = open_dataset("ERA5/daily_surface_cancities_1990-1993.nc")
ds.tas

In [None]:
gdd = xclim.atmos.growing_degree_days(tas=ds.tas, thresh="10.0 degC", freq="YS")
gdd

This computation was made using the `growing_degree_days` **indicator**. The same computation could be made through the **indice**. You can see how the metadata is alot poorer here.

In [None]:
gdd = xclim.indices.growing_degree_days(tas=ds.tas, thresh="10.0 degC", freq="YS")
gdd

The call to `xclim.indices.growing_degree_days` first checked that the input variable units were units of temperature, ran the computation, then set the output's units to the appropriate unit (here `K d` or kelvin days). As you can see, the **indicator** returned the same output, but with more metadata, it also performed more checks as explained below.

`growing_degree_days` makes most sense with **daily input**, but could theoritically accept other source frequencies. The computational layer (*indice*) assumes that users have checked that the input data has the expected temporal frequency and has no missing values. However, no checks are performed, so the output data could be wrong. That's why it's always safer to use **`Indicator`** objects from the CF layer, as done in the following section.

<div class="alert alert-warning">
    
New unit handling paradigm in xclim 0.24  for indices

As of xclim 0.24, the paradigm in unit handling has changed slightly. Now, indices are written in order to be more flexible as to the sampling frequency and units of the data. You _can_ use `growing_degree_days` on, for example, the 6-hourly data. The ouput will then be in degree-hour units (`K h`). Moreover, all units, even when untouched by the calculation, will be reformatted to a CF-compliant symbol format. This was made to ensure consistency between all indices.
    
Very few indices will convert their output to a specific units, rather it is the dimensionality that will be consistent. The [Unit handling](units.ipynb) page goes in more details on how unit conversion can easily be done.
    
This doesn't apply to **Indicators**. Those will always output data in a specific unit, the one listed in the `Indicators.cf_attrs` metadata dictionnary.
    
</div>


Finally, as almost all indices, the function takes a `freq` argument to specify over what time period it is computed. These are called "Offset Aliases" and are the same as the resampling string arguments. Valid arguments are detailed in [panda's doc](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases) (note that aliases involving "business" notions are not supported by `xarray` and thus could raises issues in xclim.

## Health checks and metadata attributes

Indicator instances from the CF layer are found in modules bearing the name of the computational realm in which its input variables are found: `xclim.atmos`, `xclim.land` and `xclim.seaIce`. These objects from the CF layer run sanity checks on the input variables and set output's metadata according to CF-convention when they apply.  Some of the checks involve:

* Identifying periods where missing data significantly impacts the calculation and omits calculations for those periods. Those are called "missing methods" and are detailed in section [Health checks](../checks.rst).
* Appending process history and maintaining the historical provenance of file metadata.
* Writing [Climate and Forecast Convention](http://cfconventions.org/) compliant metadata based on the variables 
and indices calculated.

Those modules are best used for producing NetCDF that will be shared with users. See [Climate Indicators](../indicators.rst) for a list of available indicators.

If we run the `growing_degree_days` indicator over a non daily dataset, we'll be warned that the input data is not daily. That is, running ```xclim.atmos.growing_degree_days(ds.air, thresh='10.0 degC', freq='MS')``` will fail with a `ValidationError`:

In [None]:
ds6h = xr.tutorial.open_dataset("air_temperature")
xr.infer_freq(ds6h.time)  # Show that it is not daily

In [None]:
gdd = xclim.atmos.growing_degree_days(tas=ds6h.tas, thresh="10.0 degC", freq="MS")

Resampling to a daily frequency and running the same indicator succeeds, but we still get warnings from the CF metadata checks.

In [None]:
daily_ds = ds6h.resample(time="D").mean(keep_attrs=True)
gdd = xclim.atmos.growing_degree_days(daily_ds.air, thresh="10.0 degC", freq="YS")

To suppress the CF validation warnings in the following, we will set xclim to send them to the log, instead of raising a warning or an error. We also could have set `data_validation='warn'` to be able to run the indicator on non-daily data. These options are set globally or within a context with [set_options](../api.rst#options-submodule).

The missing method which determines if a period should be considered missing or not can be controlled through the `check_missing` option, globally or contextually. The main missing methods also have options that can be modified.

In [None]:
with xclim.set_options(
    check_missing="pct",
    missing_options={"pct": dict(tolerance=0.1)},
    cf_compliance="log",
):
    # Change the missing method to "percent", instead of the default "any"
    # Set the tolerance to 10%, periods with more than 10% of missing data
    #     in the input will be masked in the ouput.
    gdd = xclim.atmos.growing_degree_days(daily_ds.air, thresh="10.0 degC", freq="MS")

Some indicators also expose time-selection arguments as `**indexer` keywords. This allows to run the indice on a subset of the time coordinates, for example only on a specific season, month, or between two dates in every year. It relies on the [select_time](../xclim.core.rst#xclim.core.calendar.select_time) function. Some indicators will simply select the time period and run the calculations, while others will smartly perform the selection at the right time, when the order of operation makes a difference. All will pass the `indexer` kwargs to the missing value handling ensuring that the missing values _outside_ the valid time period are **not** considered.

The next example computes the annual sum of growing degree days over 10 °C, but only considering days from the 1st of April to the 30th of September.

In [None]:
with xclim.set_options(cf_compliance="log"):
    gdd = xclim.atmos.growing_degree_days(
        tas=daily_ds.air, thresh="10 degC", freq="YS", date_bounds=("04-01", "09-30")
    )
gdd

Finally, xclim also allows to call indicators using datasets and variable names.

In [None]:
with xclim.set_options(cf_compliance="log"):
    gdd = xclim.atmos.growing_degree_days(
        tas="air", thresh="10.0 degC", freq="MS", ds=daily_ds
    )

    # variable names default to xclim names, so we can even do this:
    renamed_daily_ds = daily_ds.rename(air="tas")
    gdd = xclim.atmos.growing_degree_days(
        thresh="10.0 degC", freq="MS", ds=renamed_daily_ds
    )

## Graphics

In [None]:
import matplotlib.pyplot

%matplotlib inline

# Summary statistics histogram
gdd.plot()

In [None]:
# Show time series at a given geographical coordinate
gdd.isel(lon=20, lat=10).plot()

In [None]:
# Show spatial pattern at a specific time period
gdd.sel(time="2013-07").plot()

For more examples, see the directions suggested by [xarray's plotting documentation](https://xarray.pydata.org/en/stable/plotting.html)

To save the data as a new NetCDF, use `to_netcdf`.

In [None]:
gdd.to_netcdf("monthly_growing_degree_days_data.nc")

<div class="alert alert-info">
    
It's possible to save Dataset objects to other file formats. For more information see: [xarray's documentation](https://xarray.pydata.org/en/stable/generated/xarray.Dataset.html)

</div>