# Climatology and Anomalies

It is very common in climate data analysis to look primarily at `anomalies` or departures from normal, where normal is defined by the `climatology` or seasonal cycle. 

We typically wish to research and understand other aspects of the climate system than the seasonal cycle which is well understood due to differences in 
solar radiation associated with the tilt of the Earth's axis. 

In simple terms, no one is impressed if we can say it will be warm in the summer or cold in the winter or if we can say it will rain in the rainy season and be dry during the dry season.  

Therefore, we typically perform our climate data analysis on anomalies by first calculating and removing the climatology.

### Let's read in some data and calculate a climatology....

#### Imports

In [None]:
import xarray as xr
import numpy as np
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

### Read in monthly temperature data

In [None]:
path='/home/pdirmeye/classes/clim680_2022/GHCN_CAMS/'
file='air.mon.mean.nc'

In [None]:
ds = xr.open_dataset(path+file)
ds

In [None]:
ds.air

In [None]:
plt.pcolormesh(ds.lon,ds.lat,ds['air'][0,:,:],cmap='coolwarm')

### How to calculate the climatology for monthly averaged data

The average value of a variable at a given location due to the seasonal cycle.  

For *monthly* averaged data we typically calculate the climatology as the average value for a given month over all years.

Mathematically, let _T_ be temperature, then
_T(i,j)_ is the temperature at some point (_i_,_j_). 

If we have _N_ years of data, then we can calulate the climatology of the temperature at a point (_i_,_j_) for a given month (_m_) as: 

\begin{equation}
\overline{T_m(i,j)} = \frac{1}{N}\sum_{k=1}^NT_{m,k}(i,j)
\end{equation}

#### Pick a point with data (Washington DC 38.9072° N, 77.0369° W)

Notice - since we are naming the one variable `air` when we select the grid cell, we extract an `xarray.DataArray` from the `xarray.Dataset`.
We use `da` in the variable name to remind us of this.

In [None]:
da_pt = ds['air'].sel(lat=39,lon=360-77,method='nearest')-273.15

fig = plt.figure(figsize=(12,5))
plt.plot(da_pt.time,da_pt)
plt.title('Washington DC, Temperature (˚C)')

### We can calculate the climatology using `groupby`

We can use `groupby` to group over `time.month` and then apply the mean function to that grouping to get the average value for a given month over our entire grid.

This will take a little while - we are crunching through 72 years of data at 0.5˚ resolution. This is **big data**.

In [None]:
ds_climo = ds.groupby('time.month').mean(dim='time')
ds_climo

#### Look at our climatology
Plot the climatology along with the data for 1950 and data for 2019 for Washington DC

In [None]:
da_ptclimo = ds_climo['air'].sel(lat=39,lon=360-77,method='nearest') - 273.15
da_pt1950 = da_pt.sel(time=slice('1950-01-01','1950-12-01'))
da_pt2019 = da_pt.sel(time=slice('2019-01-01','2019-12-01'))

plt.plot(da_ptclimo,label='Climo')
plt.plot(da_pt1950,label='1950')
plt.plot(da_pt2019,label='2019')                    

plt.title('Washington DC, Temperature (K)')
plt.legend()

#### Calculate Anomalies by subtracting the climatology from the original data

\begin{equation}
T_{m,k}^{\prime}(i,j) = T_{m,k}(i,j) - \overline{T_m(i,j)}
\end{equation}

Like when we calculated the climatology, we are crunching through all of this very large dataset.

In [None]:
ds_anoms = ds.groupby('time.month')-ds_climo
ds_anoms

In [None]:
da_anomspt = ds_anoms['air'].sel(lat=39,lon=360-77,method='nearest')

fig = plt.figure(figsize=(12,5))
plt.plot(da_anomspt['time'],da_anomspt)
plt.title('Washington DC, Temperature Anomalies (K)')

Let's add a long-term running mean to our plot of anomalies so we can see trends more clearly. 
In fact, let's add two: a 12-month running mean, and a 10-year running mean...

In [None]:
da_smooth_1y = da_anomspt.rolling(time=12).mean()
da_smooth_10y = da_anomspt.rolling(time=120).mean()

fig = plt.figure(figsize=(12,5))
plt.plot(da_anomspt['time'],da_anomspt,label='Monthly',c='plum')
plt.plot(da_smooth_1y['time'],da_smooth_1y,label='1y Running Mean',c='teal')
plt.plot(da_smooth_10y['time'],da_smooth_10y,label='10y Running Mean',c='red')
plt.axhline(y=0,c='white')
plt.title('Washington DC, Temperature Anomalies (K)')
plt.legend()

### Normalized Colorbars

Now that we have anomalies, we often wish to plot with a diverging colorbar centered at zero.

#### Plot with off center range and colorbar

In [None]:
clevs = np.arange(-5,11,1)
fig = plt.figure(figsize=(11,8.5))
ax = plt.axes(projection=ccrs.Robinson())
cs = ax.contourf(ds_anoms['lon'], ds_anoms['lat'][:-60], 
               ds_anoms['air'][-1,:-60,:],clevs,
               transform = ccrs.PlateCarree(),cmap='RdBu_r',
               extend='both')
ax.coastlines()
ax.gridlines()
cbar = plt.colorbar(cs,shrink=0.7,orientation='horizontal',
                    label='Surface Air Temperature Anomaly (˚C)')
plt.title(ds.attrs['title'],fontsize=16)
plt.figtext(0.5,0.28,'March 2020',ha='center',fontsize=20,fontweight='bold')

### Center the colorbar at zero

In [None]:
import matplotlib.colors as colors

In [None]:
fig = plt.figure(figsize=(11,8.5))

ax = plt.axes(projection=ccrs.Robinson())
divnorm = colors.CenteredNorm(vcenter=0)
#norm=divnorm
cs = ax.contourf(ds_anoms['lon'], ds_anoms['lat'][:-60], 
                 ds_anoms['air'][-1,:-60,:],clevs,
                 transform = ccrs.PlateCarree(),cmap='RdBu_r',
                 norm=divnorm,extend='both')
ax.coastlines()
ax.gridlines()
cbar = plt.colorbar(cs,shrink=0.7,orientation='horizontal',
                    label='Surface Air Temperature (K)')
plt.title(ds.attrs['title'],fontsize=16)
plt.figtext(0.5,0.28,'March 2020',ha='center',fontsize=20,fontweight='bold')

### Calculating climatology for daily or higher frequency data

This is more complicated and more controversial in terms of the method to use.  It also gets us into some issues of handling large datasets depending on how much data there is, which we will talk about in more detail in another class.  

#### We can start by calculating the average over all years for each day

Data: Daily Precipitation from CPC over Continental US (CONUS)
https://kpegion.github.io/COLA-DATASETS-CATALOG/precip.V1.0.nc

In [None]:
path_daily = '/home/pdirmeye/classes/clim680_2022/CPC_precip_daily/'
files_daily = 'precip.V1.0.*.nc'

In [None]:
ds_daily = xr.open_mfdataset(path_daily+files_daily,
                             concat_dim='time',combine='nested')
ds_daily

In [None]:
ds_daily_climo = ds_daily.groupby('time.dayofyear').mean()
ds_daily_climo

In [None]:
daily_pt = ds_daily['precip'].sel(lat=39,lon=360-97,method='nearest')
daily_ptclimo = ds_daily_climo['precip'].sel(lat=39,lon=360-97,method='nearest')
daily_pt1948 = daily_pt.sel(time=slice('1948-01-01','1948-12-31'))

plt.plot(daily_pt1948)
plt.plot(daily_ptclimo)
plt.legend(['1948','Climo'])
plt.title('CPC Precipitation')

#### This version of climatology will be very noisy. 

This means if varies a lot from day to day.  Since what we really want in a climatology is to identify the seasonal cycle, which means the wet and dry parts of the year, typically we would smooth this daily climatology in some way or try to identify a cyclical part of the data with a seasonal timescale. Here, I will demonstrate smoothing.

In [None]:
daily_climo_smooth = ds_daily_climo.rolling(dayofyear=30,center=True).mean()

Here I get an error that is telling me that my data is too large for the computer to handle smoothing it.  It tells me what I can do to deal with it. It is trying to `chunk` my data into pieces so that it can work on separate parts of it instead of all of it at the same time. It is using something called `dask` behind the scenes to handle my data in parallel.  We will talk more about this next week -- for now I will just do what it says.

In [None]:
ds_daily_climo.chunk?

In [None]:
ds_daily_climo = ds_daily_climo.chunk({'dayofyear':-1})
daily_climo_smooth = ds_daily_climo.rolling(dayofyear=30,center=True).mean()

In [None]:
ds_smoothpt = daily_climo_smooth['precip'].sel(lat=39,lon=360-97,method='nearest')

In [None]:
plt.plot(daily_pt1948)
plt.plot(daily_ptclimo)
plt.plot(ds_smoothpt)
plt.legend(['1948','Climo','Smooth'])
plt.title('CPC Precipitation')