# Climatology Tutorial

This demonstration uses COAsT package has two parts:

1)  Climatology.make_climatology():
    This demonstration uses the COAsT package to calculate a climatological mean of an
    input dataset at a desired output frequency. Output can be written straight
    to file.

2) Make multiyear climatology: This demonstrations uses the COAsT package to calculate a climatological mean of an
    input dataset at a desired output frequency, over multiple years, but will work with single year datasets too.

In [None]:
import warnings 
warnings.filterwarnings('ignore')

In [None]:
import coast
import glob
import xarray as xr
import cftime
import pandas as pd

### Usage of coast.Climatology.make_climatology().

Calculates mean over a given period of time. This doesn't take different years into account, unless using the
'years' frequency.

In [None]:
data_path = "/gws/nopw/j04/canari/shared/large-ensemble/priority/HIST2/1"
fn_nemo_dom = "/gws/nopw/j04/canari/users/dlrhodso/mesh_mask.nc"
config_t = "../config/example_nemo_grid_t.json"
infiles = glob.glob((f"{data_path}/OCN/yearly/*/*_votemper.nc"))

In [None]:
fn_nemo_dat = xr.open_mfdataset(infiles)

In [None]:
fn_nemo_dat

In [None]:
nemo_data = coast.Gridded(fn_data=fn_nemo_dat,
                          fn_domain=fn_nemo_dom,
                          config=config_t,
                          ).dataset


In [None]:
nemo_data

Calculate the climatology for temperature and sea surface height (ssh) as an example:

In [None]:
# Select specific data variables.
data = nemo_data[["temperature"]]

# Define frequency -- Any xarray time string: season, month, etc
climatology_frequency = "month"

In [None]:
# Calculate the climatology and write to file.
clim = coast.Climatology()
clim_mean = clim.make_climatology(data, climatology_frequency, fn_out=None)

Below shows the structure of a dataset returned, containing 1 month worth of meaned temperature and sea surface height data:

In [None]:
clim_mean  # uncomment to print data object summary

### Create multiyear averages for seasons

Calculates the mean over a specified period and groups the data by year-period

data['t_dim'] = data['time']

In [None]:
# Create a list of seasons and years
seasons = [(3, 5), (6, 9), (10, 11), (12, 2)]
data_years = list(data[f"time.year"].data)


In [None]:
# Create a range of dates
date_ranges = []
for y in sorted(set(data_years)):
    y = int(y)
    for period in seasons:
        start = period[0]
        end = period[1]
        begin_date = cftime.Datetime360Day(y, start, 1,0,0,0,0, has_year_zero=True)
        if start > end:
            end_date = cftime.Datetime360Day(y + 1, end, 30,0,0,0,0, has_year_zero=True)
        else:
            end_date = cftime.Datetime360Day(y, end, 30, 0,0,0,0, has_year_zero=True)
        date_ranges.append((begin_date, end_date))

Separate and concat datasets by year periods

In [None]:
datasets = []
year_index = []
month_index = []
for date_range in date_ranges:
    sel_args = {f"{time_dim}": slice(date_range[0], date_range[1])}
    filtered = data.sel(**sel_args)
    datasets.append(filtered)
    year_index = year_index + ([date_range[0].year] * filtered.sizes['t_dim'])
    month_label = f"{calendar.month_abbr[date_range[0].month]}-{calendar.month_abbr[date_range[1].month]}"
    month_index = month_index + ([month_label] * filtered.sizes['t_dim'])

filtered = xr.concat(datasets, dim=time_dim)
filtered = filtered.drop_vars('t_dim')

In [None]:
period_idx = pd.MultiIndex.from_arrays([year_index, month_index], names=("year", "period"))
filtered.coords["year_period"] = ('t_dim', period_idx)

In [None]:
clim_multiyear = xr.Dataset()
for var_name, da in filtered.data_vars.items():
    da_mean = da.groupby("year_period").mean(dim=time_dim, skipna=True)
    clim_multiyear[f"{var_name}"] = da_mean

In [None]:
# Show the climatology multiyear
clim_multiyear