# Computation

---

## Learning Objectives 


- Do basic arithmetic with DataArrays and Datasets
- Perform aggregation (reduction) along one or multiple dimensions of a DataArray or Dataset
- Compute climatology and anomaly using xarray's "split-apply-combine" approach via `.groupby()`
- Perform weighted reductions along one or multiple dimensions of a DataArray or Dataset


## Prerequisites


| Concepts | Importance | Notes |
| --- | --- | --- |
| [Understanding of xarray core data structures](./01-xarray-fundamentals.ipynb) | Necessary | |
| [Familiarity with xarray indexing and subsetting](./02-indexing-and-subsetting.ipynb) | Necessary | |
| [Familiarity with NumPy](https://numpy.org/doc/stable/reference/arrays.indexing.html) | Helpful | |


- **Time to learn**: *20-30 minutes*



---

## Imports


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr

Let's open the monthly sea surface temperature data from the CESM2 model:

In [None]:
ds = xr.open_dataset(
    "./data/tos_Omon_CESM2_historical_r11i1p1f1_gr_200001-201412.nc", engine="netcdf4"
)
ds

## Arithmetic Operations

Arithmetic operations with a single DataArray automatically vectorize (like numpy) over all array values. Let's convert the air temperature from degree celsius to Kelvin:

In [None]:
ds.tos + 273.15

Lets's square all values in `tas`:

In [None]:
ds.tos ** 2

## Aggregation Methods 

A very common step during data analysis is to summarize the data in question by computing aggregations like `sum()`, `mean()`, `median()`, `min()`, `max()` in which reduced data provide insight into the nature of large dataset. Let's explore some of these aggregation methods:


In [None]:
# Compute mean
ds.tos.mean()

Because we specified no `dim` argument the function was applied over all dimensions. It is possible to specify a dimension along which to compute an aggregation. For example, to calculate the mean in time for all locations specify the time dimension as the dimension along which the mean should be calculated:

In [None]:
ds.tos.mean(dim='time').plot(size=7, robust=True);

In [None]:
# compute temporal min
ds.tos.min(dim=['time'])

In [None]:
# compute spatial sum
ds.tos.sum(dim=['lat', 'lon'])

In [None]:
# compute temporal median
ds.tos.median(dim='time')

The following table summarizes some other built-in Xarray aggregations:

| Aggregation              | Description                     |
|--------------------------|---------------------------------|
| ``count()``              | Total number of items           |
| ``mean()``, ``median()`` | Mean and median                 |
| ``min()``, ``max()``     | Minimum and maximum             |
| ``std()``, ``var()``     | Standard deviation and variance |
| ``prod()``               | Compute product of elements            |
| ``sum()``                | Compute sum of elements                |
| ``argmin()``, ``argmax()``| Find index of minimum and maximum value |

## GroupBy: Split, Apply, Combine

Simple aggregations can give useful summary of our dataset, but often we would prefer to aggregate conditionally on some coordinate labels or groups. Xarray provides the so-called `groupby` operation which enables the **split / apply / combine** workflow on xarray DataArrays and Datasets. The split-apply-combine operation is illustrated in this figure

<img src="../images/xarray-split-apply-combine.png" width="70%" height="70%">


This makes clear what the `groupby` accomplishes:

- The split step involves breaking up and grouping an xarray Dataset or DataArray depending on the value of the specified group key.
- The apply step involves computing some function, usually an aggregate, transformation, or filtering, within the individual groups.
- The combine step merges the results of these operations into an output xarray Dataset or DataArray.


We are going to use `groupby` to remove the seasonal cycle ("climatology") from our dataset:


In [None]:
ds.tos.sel(lon=310, lat=50, method='nearest').plot();

### Split

Let's group data by month i.e. all Januaries in one group, all Februaries in one group, etc...


In [None]:
ds.tos.groupby(ds.time.dt.month)

<div class="admonition alert alert-info">

In the above example, we are using the `.dt` [`DatetimeAccessor`](https://xarray.pydata.org/en/stable/generated/xarray.core.accessor_dt.DatetimeAccessor.html) to extract specific components of dates/times in our time coordinate dimension
    
   </div>

In [None]:
ds.time.dt.year

In [None]:
ds.time.dt.month

Xarray also offers a more concise syntax when the variable you’re grouping on is already present in the dataset. This is identical to `ds.tos.groupby(ds.time.dt.month)`:

In [None]:
ds.tos.groupby('time.month')

### Apply & Combine 

Now that we have groups defined, it’s time to “apply” a calculation to the group. These calculations can either be:

- aggregation: reduces the size of the group
- transformation: preserves the group’s full size

At then end of the apply step, xarray will automatically combine the aggregated / transformed groups back into a single object. 



#### Compute climatology 


Let's calculate the climatology at every point in the dataset:


In [None]:
tos_clim = ds.tos.groupby('time.month').mean()
tos_clim

In [None]:
# Plot climatology at a specific point
tos_clim.sel(lon=310, lat=50, method='nearest').plot();

In [None]:
# Plot zonal mean climatology
tos_clim.mean(dim='lon').transpose().plot.contourf(levels=12, robust=True, cmap='turbo');

In [None]:
# Difference between January and December climatologies
(tos_clim.sel(month=1) - tos_clim.sel(month=12)).plot(size=6, robust=True);

#### Compute anomaly

Now let's combine the previous steps to compute climatology and use xarray's `groupby` arithmetic to remove this climatology from our original data

In [None]:
gb = ds.tos.groupby('time.month')
tos_anom = gb - gb.mean(dim='time')
tos_anom

In [None]:
tos_anom.sel(lon=310, lat=50, method='nearest').plot();

Let's look at the mean global anomaly in time by computing mean. We need to specify both `lat` and `lon` dimensions in the `dim` argument to mean:

In [None]:
unweighted_mean_global_anom = tos_anom.mean(dim=['lat', 'lon'])
unweighted_mean_global_anom.plot();

<div class="admonition alert alert-warning">
   

An operation which combines grid cells of different size is not scientifically valid unless each cell is weighted by the size of the grid cell. xarray has a convenient [`.weighted`](https://xarray.pydata.org/en/stable/user-guide/computation.html#weighted-array-reductions) method to accomplish this

</div>


Let's first load the cell area data. This dataset contains the weights for the grid cells

In [None]:
areacello = xr.open_dataset("data/areacello_Ofx_CESM2_historical_r11i1p1f1_gr.nc").areacello
areacello

As before, let's calculate area weighted mean global anomaly:

In [None]:
weighted_mean_global_anom = tos_anom.weighted(areacello).mean(dim=['lat', 'lon'])

Let's plot both unweighted and weighted means:

In [None]:
unweighted_mean_global_anom.plot(size=7)
weighted_mean_global_anom.plot()
plt.legend(['unweighted', 'weighted']);

---

## Other high level computation functionality

- `resample`: [Groupby-like functionality specifialized for time dimensions. Can be used for temporal upsampling and downsampling](https://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations)
- `rolling`: [Useful for computing aggregations on moving windows of your dataset e.g. computing moving averages](https://xarray.pydata.org/en/stable/computation.html#rolling-window-operations)
- `coarsen`: [Generic functionality for downsampling data](https://xarray.pydata.org/en/stable/computation.html#coarsen-large-arrays)



In [None]:
# resample to annual frequency
r = ds.tos.resample(time='AS')
r

In [None]:
r.mean()

In [None]:
# Compute a 5-month moving average
m_avg = ds.tos.rolling(time=5, center=True).mean()
m_avg

In [None]:
lat = 50
lon = 310

m_avg.isel(lat=lat, lon=lon).plot(size=6)
ds.tos.isel(lat=lat, lon=lon).plot()
plt.legend(['5-month moving average', 'monthly data']);

---

In [None]:
%load_ext watermark
%watermark --time --python --updated --iversion

## Resources and References

- `groupby`: [Useful for binning/grouping data and applying reductions and/or transformations on those groups](https://xarray.pydata.org/en/stable/groupby.html)
- `resample`: [Groupby-like functionality specifialized for time dimensions. Can be used for temporal upsampling and downsampling](https://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations)
- `rolling`: [Useful for computing aggregations on moving windows of your dataset e.g. computing moving averages](https://xarray.pydata.org/en/stable/computation.html#rolling-window-operations)
- `coarsen`: [Generic functionality for downsampling data](https://xarray.pydata.org/en/stable/computation.html#coarsen-large-arrays)

- `weighted`: [Useful for weighting data before applying reductions](https://xarray.pydata.org/en/stable/user-guide/computation.html#weighted-array-reductions)

- [More xarray tutorials and videos](https://xarray.pydata.org/en/stable/tutorials-and-videos.html)

<div class="admonition alert alert-success">
    <p class="title" style="font-weight:bold">Previous: <a href="./03-data-visualization.ipynb">Data Visualization</a></p>
         <p class="title" style="font-weight:bold">Next: <a href="./05-masking.ipynb">Masking Data</a></p>
</div>