# Basic Usage


## Climate indicator computations

`xclim` is a library of climate indicators that operate on [xarray](https://xarray.pydata.org/) `DataArray` objects. 

`xclim` provides two layers of computations, one responsible for computations and units checking (the computation layer), and the other responsible for input health checks and metadata formatting (the CF layer, refering to the Climate and Forecast convention). Functions from the computation layer are found in `xclim.indices`, while objects from the CF layer are found in *realm* modules (`xclim.atmos`, `xclim.land`, ...). 

To use xclim in a project, import both `xclim` and `xarray`. 

In [None]:
import xclim
import xarray as xr

Indice calculations are performed by opening a netCDF file, accessing the variable of interest, and calling the indice function, which returns a new DataArray. 

For this example, we'll first open a demonstration dataset storing surface air temperature and compute the number of growing degree days (the sum of degrees above a certain threshold) at the monthly frequency. 

<div class="alert alert-info">

Calculations are performed on `DataArray` objects, not `Dataset` objects. 

</div>

In [None]:
# ds = xr.open_dataset("your_file.nc")
ds = xr.tutorial.open_dataset('air_temperature')
ds

<div class="alert alert-warning">
With a few exceptions, most indicators operate on daily data only. Our toy data is 6-hourly data and needs to be resampled.
</div>

In [None]:
daily_tas = ds.air.resample(time='D').mean(keep_attrs=True)
gdd = xclim.indices.growing_degree_days(tas=daily_tas, thresh='10.0 degC', freq='MS')
gdd

The call to `xclim.indices.growing_degree_days` first checked that the input variable units were units of temperature, ran the computation, then set the output's units to Celsius days. 

The `growing_degree_days` *indice* function **expects daily input**, which is why we resampled it before. The computational layer assumes that users have checked that the input data has the expected temporal frequency and has no missing values, no checks are performed, so the ouput data could be wrong. If you're unsure about all those things, a safer bet is to use `Indicator` objects from the CF layer. Resampling is explained in [xarray's](http://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations) and [pandas'](https://pandas.pydata.org/docs/user_guide/timeseries.html#resampling) documentation on the subject.

Finally, as almost all indices, the function takes a `freq` argument to specify over what time period it is computed. These are called "Offset Aliases" and are the same as the resampling string arguments. Valid arguments are detailed in [panda's doc](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases).

## Health checks and metadata attributes

Indicator instances from the CF layer are found in modules bearing the name of the computational realm in which it's input variables are found: `xclim.atmos`, `xclim.land` and `xclim.seaIce`. These objects from the CF layer run sanity checks on the input variables and set output's metadata according to CF-convention when they apply.  Some of the checks involve:

* Identifying periods where missing data significantly impacts the calculation and omits calculations for those periods. Those are called "missing methods".
* Appending process history and maintaining the historical provenance of file metadata.
* Writing [Climate and Forecast Convention](http://cfconventions.org/) compliant metadata based on the variables 
and indices calculated.

Those modules are best used for producing NetCDF that will be shared with users. See [Climate Indicators](../indicators.rst) for a list of available indicators.

If we run the `growing_degree_days` indicator over the non-resampled dataset, we'll be warned that the input data is not daily. That is, running ```xclim.atmos.growing_degree_days(ds.air, thresh='10.0 degC', freq='MS')``` will fail with a `ValueError`:

In [None]:
gdd = xclim.atmos.growing_degree_days(ds.air, thresh='10.0 degC', freq='MS')

In [None]:
gdd = xclim.atmos.growing_degree_days(daily_tas, thresh='10.0 degC', freq='MS')
gdd

The missing method which determines if a period should be considered missing or not can be controlled through the `check_missing` option, globally or contextually. The main missing methods also have options that can be modified.

In [None]:
with xclim.set_options(check_missing='pct', missing_pct=0.1):
    # Change the missing method to "percent", instead of the default "any"
    # Set the tolerance to 10%, periods with more than 10% of missing data
    #     in the input will be masked in the ouput.
    gdd = xclim.atmos.growing_degree_days(daily_tas, thresh='10.0 degC', freq='MS')

## Graphics

In [None]:
import matplotlib.pyplot
%matplotlib inline

# Summary statistics histogram
gdd.plot()

In [None]:
# Show time series at a given geographical coordinate
gdd.isel(lon=20, lat=10).plot()

In [None]:
# Show spatial pattern at a specific time period
gdd.sel(time='2013-07').plot()

For more examples, see the directions suggested by [xarray's plotting documentation](https://xarray.pydata.org/en/stable/plotting.html>)

To save the data as a new NetCDF, use `to_netcdf`.

In [None]:
gdd.to_netcdf('monthly_growing_degree_days_data.nc')

<div class="alert alert-info">
    
It's possible to save Dataset objects to other file formats. For more information see: [xarray's documentation](https://xarray.pydata.org/en/stable/generated/xarray.Dataset.html)

</div>