# **Tutorial 7: Other Computational Tools in Xarray**

**Week 1, Day 1, Introduction to the Climate System**

**Content creators:** Sloane Garelick, Julia Kent

**Content reviewers:** Danika Gupta, Younkap Nina Duplex 

**Content editors:** Agustina Pesce

**Production editors:** TBD

**Our 2023 Sponsors:** TBD





###**Code and Data Sources**

Code and data for this tutorial is based on existing content from [Project Pythia](https://foundations.projectpythia.org/core/xarray/computation-masking.html).

## **Tutorial 7 Objectives**
Thus far, we've learned about various climate processes in the videos, and we've explored tools in Xarray that are useful for analyzing and interpretting climate data. 

In this tutorial we'll continue using the SST data from CESM2 and we'll practice using some additional computational tools in Xarray to interpret climate data. Specifically, we will learn three tools that allow us to resample our data, which can help will data comparison and analysis.

- `resample`: Groupby-like functionality specifically for time dimensions. Can be used for temporal upsampling and downsampling. Additional information about resampling in Xarray can be found [here].(https://xarray.pydata.org/en/stable/user-guide/time-series.html#resampling-and-grouped-operations)
- `rolling`: Useful for computing aggregations on moving windows of your dataset e.g. computing moving averages. Additional information about resampling in Xarray can be found [here].(https://xarray.pydata.org/en/stable/user-guide/computation.html#rolling-window-operations)
- `coarsen`: Generic functionality for downsampling data. Additional information about resampling in Xarray can be found [here].(https://xarray.pydata.org/en/stable/user-guide/computation.html#coarsen-large-arrays)

## Imports


In [None]:
# !pip install matplotlib.pyplot
# !pip install numpy
# !pip install xarray
# !pip install pythia_datasets
# !pip install pandas

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
from pythia_datasets import DATASETS
import pandas as pd

Let's load the same data that we used in the previous tutorial (monthly SST data from CESM2):

In [None]:
filepath = DATASETS.fetch('CESM2_sst_data.nc')
ds = xr.open_dataset(filepath,decode_times=False)
new_time = pd.date_range(start='2000-01-15', end='2014-12-15', periods=180)
ds = ds.assign(time=new_time)

### Resampling data

For upsampling or downsampling temporal resolutions, we can use the `resample()` method in Xarray.  For example, you can use this function to downsample a dataset from hourly to 6-hourly resolution.

Our original SST data is monthly resolution. Let's use `resample()` to downsample to annual frequency:

In [None]:
#Resample to an annual frequency
r = ds.tos.resample(time='AS')
r

In [None]:
#Calculate the global mean of the resampled data
annual_mean = r.mean()
annual_mean_global = annual_mean.mean(dim=['lat', 'lon'])
annual_mean_global.plot()

### Moving average

The `rolling()` method allows for a rolling window aggregation and is applied along one dimension using the name of the dimension as a key (e.g. time) and the window size as the value (e.g. 6).

Let's use the `rolling()` function to compute a 6-month moving average of our SST data:

In [None]:
#Calculate the running mean
m_avg = ds.tos.rolling(time=6, center=True).mean()
m_avg

In [None]:
#Calculate the global average of the running mean
m_avg_global = m_avg.mean(dim=['lat','lon'])
m_avg_global.plot()

### Coarsening the data

The `coarsen()` function allows for block aggregation along multiple dimensions. For example, you could take a block mean for every 7 days along time dimension and every 2 points along x dimension.

Let's use the `coarsen()` function to take a block mean for every 4 months and globally (i.e., 180 points along the latitude dimension and 360 points along the longitude dimension):

In [None]:
#Coarsen the data
coarse_data = ds.coarsen(time=4,lat=180,lon=360).mean()
coarse_data

In [None]:
coarse_data.tos.plot()

### Compare the resampling methods

Now that we've tried multiple resampling methods on different temporal resolutions, we can compare the resampled datasets to the original.

In [None]:
original_global = ds.mean(dim=['lat', 'lon'])

In [None]:
original_global.tos.plot(size=6)
coarse_data_global.tos.plot()
m_avg_global.plot()
annual_mean_global.plot()


plt.legend(['original data (monthly)','coarsened (4 months)','moving average (6 months)', 'annually resampled (12 months)']);

- What type of information can you obtain from each time series?
- In what scenarios would you use different temporal resolutions?
- What conclusions about the monthly, annual and decadal variability in global SST can you draw from this plot?