# Calculating Climatologies

## Overview
Calculating climatologies in Python is a common task in geoscience workflows. This notebook will cover:

- Working with [Xarray](https://docs.xarray.dev/en/stable/) and its [GroupBy](https://docs.xarray.dev/en/stable/user-guide/groupby.html) functionality
- A resource guide to point you to more detailed information depending on your use case

---

## Example Data

The dataset used in this notebook originated from the Community Earth System Model v2 (CESM2), and is retrieved from the [Pythia-datasets repository](https://github.com/ProjectPythia/pythia-datasets/tree/main)

The dataset contains 15 years of monthly mean sea surface temperatures (TOS) from January 2000 to December 2014

In [None]:
from pythia_datasets import DATASETS
import xarray as xr

# Get data
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)

ds

## Calculating Anomalies

You can use Xarray's [GroupBy](https://docs.xarray.dev/en/stable/user-guide/groupby.html)  functionality to group the data by various timescales and other properties. From the [Xarray User Guide](https://xarray.pydata.org/en/latest/user-guide/time-series.html#datetime-components):
> xarray also supports a notion of “virtual” or “derived” coordinates for datetime components implemented by pandas, including `year`, `month`, `day`, `hour`, `minute`, `second`, `dayofyear`, `week`, `dayofweek`, `weekday` and `quarter`. For use as a derived coordinate, xarray adds `season` to the list of datetime components supported by pandas.
>


In [None]:
from pythia_datasets import DATASETS
import xarray as xr
import matplotlib.pyplot as plt

# Get data
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)

# Calculate monthly anomaly
tos_monthly = ds.tos.groupby('time.month')
tos_clim = tos_monthly.mean()
tos_anom = tos_monthly - tos_clim

tos_anom

### Visualization

In [None]:
# Plot the first time slice of the calculated anomalies
tos_anom.isel(time=0).plot();

## Removing Annual Cycle

Also known as [seasonal adjustment](https://en.wikipedia.org/wiki/Seasonal_adjustment) or deseasonalization, it is often used to examine underlying trends in data with a repeating cycle. 

In [None]:
from pythia_datasets import DATASETS
import xarray as xr

# Get data
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)

# Remove annual cycle from the global monthly mean tos
tos_monthly = ds.tos.groupby('time.month')
tos_clim = tos_monthly.mean()
tos_anom = tos_monthly - tos_clim
tos_anom_global = tos_anom.mean(dim=["lat", "lon"])

tos_anom_global

### Visualization

In [None]:
# Plot the global mean tos with the annual cycle removed
tos_anom_global.plot()
plt.title("Seasonally adjusted global mean TOS")
plt.ylabel("TOS anomaly (°C)");

## Calculating Long Term Means

In [None]:
from pythia_datasets import DATASETS
import xarray as xr
import matplotlib.pyplot as plt

# Get data
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)

# Calculate long term mean
tos_clim = ds.tos.groupby('time.month').mean()

tos_clim

### Visualization

In [None]:
# Plot an example location of the calculated long term means
tos_clim.sel(lon=310, lat=50, method="nearest").plot()
plt.ylabel("Mean TOS (°C)")
plt.xlabel("Month");

## Calculating Seasonal Means

From the [Xarray User Guide](https://xarray.pydata.org/en/latest/user-guide/time-series.html#datetime-components):
>The set of valid seasons consists of `‘DJF’`, `‘MAM’`, `‘JJA’` and `‘SON’`, labeled by the first letters of the corresponding months.
>
If you need to work with custom seasons, Xarray also offers the ability to [specify custom seasons via grouper and resampler objects](https://docs.xarray.dev/en/latest/user-guide/time-series.html#handling-seasons) or the [GeoCAT-comp package](https://geocat-comp.readthedocs.io/en/stable/getting-started.html) provides [`geocat.comp.climatologies.month_to_season()`](https://geocat-comp.readthedocs.io/en/stable/user_api/generated/geocat.comp.climatologies.month_to_season.html) which can be used to create custom three-month seasonal means.

In [None]:
from pythia_datasets import DATASETS
import xarray as xr
import matplotlib.pyplot as plt

# Get data
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)

# Calculate seasonal means
tos_seasonal = ds.tos.groupby('time.season').mean()

tos_seasonal

### Visualization

In [None]:
# Plot the JJA time slice of the calculated seasonal means
tos_seasonal.sel(season="JJA").plot()

## Finding The Standard Deviations of Monthly Means

Calculate the standard deviations of monthly means for each month using the [`.std()`](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.std.html) function. 

In [None]:
from pythia_datasets import DATASETS
import xarray as xr
import matplotlib.pyplot as plt

# Get data
filepath = DATASETS.fetch("CESM2_sst_data.nc")
ds = xr.open_dataset(filepath)

# Calculate the standard deviation from monthly mean data
std_mon = ds.tos.groupby('time.month').std()

std_mon

### Visualization

In [None]:
# Plot the January time slice of the calculated standard deviations
std_mon.sel(month=1).plot();

## Curated Resources

To learn more about calculating climatologies in Python, we suggest:

- The Xarray User Guide section on [GroupBy: Group and Bin Data](https://docs.xarray.dev/en/stable/user-guide/groupby.html) including notes regarding the use of Flox for improved performance
- The Xarray User Guide section on [Time Series Data](https://docs.xarray.dev/en/stable/user-guide/time-series.html)
- The Xarray User Guide section on [Dask Best Practices](https://docs.xarray.dev/en/stable/user-guide/dask.html#best-practices)
- This Climatematch Academy notebook on [Xarray Data Analysis and Climatology](https://comptools.climatematch.io/tutorials/W1D1_ClimateSystemOverview/student/W1D1_Tutorial5.html)
- This Project Pythia Foundations tutorial on [Computations and Masks with Xarray](https://foundations.projectpythia.org/core/xarray/computation-masking)
