Subsetting utilities
================

`xclim` comes with some utilities to perform common tasks that are either not implemented in xarray, or that are implemented but do not have the generality needed for climate science work. Here we show examples of the [`xclim.subset`](../api.rst#module-xclim.subset) submodule.

In [None]:
import xarray as xr
xr.set_options(display_style='html')
import xclim as xc
from xclim import subset

# import plotting stuff
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn')
plt.rcParams['figure.figsize'] = (13, 5)

In [None]:
ds = xr.tutorial.open_dataset('air_temperature')
ds.coords

In [None]:
ds.air.isel(time=0).plot()  # Simple index-selection with xarray

## subset_bbox : using a latitude-longitude bounding box

In the previous example notebook, we used xarray's `.sel()` to cut a lat-lon subset of our data. xclim offers the same utility, but with more generality. For example, if we mindlessly try xarray's method on our dataset:

In [None]:
ds.sel(lat=slice(45, 50), lon=slice(-60, -55)).coords

As you can see, `lat` and `lon` are empty. In this dataset, the lats are defined in descending order and lons are in the range \[0, 360[ instead of [-180, 180[, which is why xarray's method did not return the expected result. xclim understands these nuances:

In [None]:
subset.subset_bbox(ds, lat_bnds=[45, 50], lon_bnds=[-60, -55]).coords

### When lat and lon are 2-D
`subset_bbox` also manages cases where the lat-lon coordinates are not sorted 1D vectors, for example with this more complex dataset:

<div class="alert alert-warning">

Most `subset` methods expect the input dataset / dataarray to have `lat` and `lon` as variables. It may be able to understand your data if other common names are used (like `latitude`, or `lons`), but we reccomend renaming the variables before using the tool (like in this example).

</div>

In [None]:
ds_roms = xr.tutorial.open_dataset('ROMS_example').rename(lon_rho='lon', lat_rho='lat')
salt = ds_roms.salt.isel(ocean_time=0, s_rho=0)

fig, (axEtaXi, axLatLon) = plt.subplots(1, 2)
salt.plot(cmap=plt.cm.gray_r, ax=axEtaXi, add_colorbar=False)
axLatLon.pcolormesh(salt.lon, salt.lat, salt)
axLatLon.set_xlabel(salt.lon.long_name)
axLatLon.set_ylabel(salt.lat.long_name)

In [None]:
%autoreload

In [None]:
salt_bb = subset.subset_bbox(salt, lat_bnds=[28, 30], lon_bnds=[-91, -88])

fig, (axEtaXi, axLatLon) = plt.subplots(1, 2)
salt_bb.plot(cmap=plt.cm.gray_r, ax=axEtaXi, add_colorbar=False)
axLatLon.pcolormesh(salt_bb.lon, salt_bb.lat, salt_bb)
axLatLon.set_xlabel(salt_bb.lon.long_name)
axLatLon.set_ylabel(salt_bb.lat.long_name)

### Add time subsetting

In [None]:
ds2 = subset.subset_bbox(ds, lat_bnds=lat_bnds, lon_bnds=lon_bnds,
                         start_date='2015-01', end_date='2030-05')
ds2.tasmax

### Selecting a single grid point 
`subset_gridpoint` can be used for selecting single locations. As other function of this submodule, it mostly replicated the behavior of `xarray.DataArray.sel()`. In this case, it adds a `tolerance` parameter so that it finds the nearest point from the given coordinate within this distance, or else it raises an error.

In [None]:
lon_pt = -70.0
lat_pt = 50.0

ds3 = subset.subset_gridpoint(ds, lon=lon_pt, lat=lat_pt, tolerance=10000,
                              start_date='1993', end_date='2020')
ds3.tasmax

#### Create a plot  of subsetted data
xarray provide a very simple plotting interface to easily explore our data.

*(While not doing any computation, this operation needs to download the data first.)*

In [None]:
ds3.tasmax.plot()