### Calculating a Composite

What is a composite?  mean of a field conditioned on the value of another field.

_Example:_

We have heard that ENSO changes atmospheric circulation and impacts precipitation and temperature globally.  We can explore this using composites. 

_Specific statement of the problem:_ 

What are the precipitation anomalies during El Nino vs. Neutral vs. La Nina and how are they different?

In [None]:
import xarray as xr
import matplotlib.pyplot as plt
import numpy as np

import cartopy.crs as ccrs
import cartopy.mpl.ticker as cticker
from cartopy.util import add_cyclic_point

#### ENSO

We have an index called Nino3.4 that quantifies ENSO.  When it is large and positive, we say there is an El Nino.  When it is large and negative, we say there is a La Nina.  In between, we say it is neutral.

Nino3.4 is calculated at the SST anomalies in a particular region in the Tropical Pacific.  We calculated this previously using the NOAA OISST data and then Subsetting, Groupby, and Aggregating. I wrote this data to a file:

`/scratch/kpegion/nino34_1982-2019.oisstv2_anoms.nc`

In [None]:
file_nino34 = '/home/lortizur/clim680//nino34_1982-2019.oisstv2_anoms.nc'
ds_nino34 = xr.open_dataset(file_nino34)
ds_nino34

This data goes from 1982-2019

We can plot it as a timeseries

In [None]:
plt.plot(ds_nino34['time'],ds_nino34['sst']) ;

### Defining El Nino, La Nina, and Neutral

Nino3.4 >= 1 -> El Nino

Nino3.4 <= -1 -> La Nina

Nino3.4 > -1 and Nino3.4 < 1 -> Neutral

In [None]:
elnino = ds_nino34['sst'].where(ds_nino34['sst']>=1) # nans where false and unaltered where true if only first parameter given
lanina = ds_nino34['sst'].where(ds_nino34['sst']<=-1)
neutral = ds_nino34['sst'].where((ds_nino34['sst']>-1) & (ds_nino34['sst']<1))

In [None]:
plt.plot(ds_nino34['time'],elnino,'r^')
plt.plot(ds_nino34['time'],lanina,'bv')
plt.plot(ds_nino34['time'],neutral,'go') ;

#### How many months do we have with El Nino, La Nina, and Neutral?

In [None]:
print('El Nino: ',elnino.count(dim='time').values)
print('Neutral: ',neutral.count(dim='time').values)
print('La Nina: ',lanina.count(dim='time').values)

counts=[elnino.count(dim='time').values,
        neutral.count(dim='time').values,
        lanina.count(dim='time').values]
print(counts)

#### Let's get a little fancier

In [None]:
plt.plot(ds_nino34['time'],ds_nino34['sst'],'k')
plt.fill_between(ds_nino34['time'].values,ds_nino34['sst'],color='lightgreen')
plt.fill_between(ds_nino34['time'].values,elnino,y2=1.0,color='red')
plt.fill_between(ds_nino34['time'].values,lanina,y2=-1.0,color='blue')
plt.axhline(0,color='black',linewidth=0.5)
plt.axhline(1,color='black',linewidth=0.5,linestyle='dotted')
plt.axhline(-1,color='black',linewidth=0.5,linestyle='dotted') ;

### Precipitation Data

We will use the Global Precipitation Climatology Project (GPCP) Monthly Precipitation Data located in:

`/shared/obs/gridded/GPCP/monthly/precip.mon.mean.nc`

In [None]:
file='/home/lortizur/clim680//GPCP_precip.mon.mean.nc'
ds_precip = xr.open_dataset(file)
ds_precip

This data goes from 1979-2020.  Let's select the same times as the nino34 data.

In [None]:
da_precip = ds_precip.precip.sel(time=slice(ds_nino34['time'][0],ds_nino34['time'][-1]))
da_precip

### We need to make anomalies of our precipitation

In [None]:
da_climo = da_precip.groupby('time.month').mean()
da_anoms = da_precip.groupby('time.month')-da_climo
da_anoms

### Now we can select the dates that match El Nino, La Nina, and Neutral

In [None]:
elnino_precip = da_anoms.sel(time=elnino.dropna(dim='time')['time']).mean(dim='time')
lanina_precip = da_anoms.sel(time=lanina.dropna(dim='time')['time']).mean(dim='time')
neutral_precip = da_anoms.sel(time=neutral.dropna(dim='time')['time']).mean(dim='time')

comp_precip = [elnino_precip,lanina_precip,neutral_precip]
comp_precip

In [None]:
labels=['El Nino','La Nina', 'Neutral']
clevs = np.arange(-2.0,2.1,0.25)

# Define the figure and each axis for the 3 rows and 3 columns
fig, axs = plt.subplots(nrows=3,ncols=1,
                        subplot_kw={'projection': ccrs.PlateCarree()},
                        figsize=(8.5,11))

# axs is a 2 dimensional array of `GeoAxes`.  
# We will flatten it into a 1-D array (just 3 rows)
axs = axs.flatten()

#Loop over all of the seasons and plot
for i,enso in enumerate(comp_precip):

        # Select the season
        data = comp_precip[i]

        # Add the cyclic point
        data,lons = add_cyclic_point(data,coord=comp_precip[i]['lon'])

        # Contour plot
        cs=axs[i].contourf(lons,comp_precip[i]['lat'],data,clevs,
                          transform = ccrs.PlateCarree(),
                          cmap='BrBG',extend='both')

       # Longitude labels
        axs[i].set_xticks(np.arange(-180,181,60), crs=ccrs.PlateCarree())
        lon_formatter = cticker.LongitudeFormatter()
        axs[i].xaxis.set_major_formatter(lon_formatter)

        # Latitude labels
        axs[i].set_yticks(np.arange(-90,91,30), crs=ccrs.PlateCarree())
        lat_formatter = cticker.LatitudeFormatter()
        axs[i].yaxis.set_major_formatter(lat_formatter)

        
        # Title each subplot with the name of the season
        axs[i].set_title(labels[i]+' ('+str(counts[i])+')')

        # Draw the coastines for each subplot
        axs[i].coastlines()
        
# Adjust the location of the subplots 
# on the page to make room for the colorbar
fig.subplots_adjust(bottom=0.25, top=0.9, left=0.05, right=0.95,
                    wspace=0.1, hspace=0.5)

# Add a colorbar axis at the bottom of the graph
cbar_ax = fig.add_axes([0.25, 0.18, 0.5, 0.012])

# Draw the colorbar
cbar = fig.colorbar(cs,cax=cbar_ax,orientation='horizontal',label='mm/day')

# Add a big title at the top
plt.suptitle('Composite Precipitation Anomalies during ENSO') ;

### Checking and understanding our Composites
 
Pick some points and make a scatter plot with the Nino34 index

* High Composite Value (EQ, 120W)
* Medium Composite Value (30N, 90W)
* Low Composite Value (40N, 90W)

__High Composite__

In [None]:
pt = da_anoms.sel(lat=0,lon=360-120,method='nearest')
plt.scatter(pt,ds_nino34['sst'],s=np.abs(2*pt)+1)
plt.xlim([-5,15])
plt.ylim([-3,3])
plt.xlabel('Precip Anoms at (EQ,120W) [$mm\;day^{-1}$]')
plt.ylabel('Niño34 Index')

m,b = np.linalg.lstsq(np.vstack([pt.values, np.ones(len(pt.values))]).T,ds_nino34['sst'].values,rcond=None)[0]
plt.plot(pt, m*pt + b, 'r', label='Fitted line')

plt.axvline(0,color='darkturquoise',linewidth=0.5)
plt.axhline(1,color='black',linewidth=0.5,linestyle='dotted')
plt.axhline(0,color='black',linewidth=0.5)
plt.axhline(-1,color='black',linewidth=0.5,linestyle='dotted') ;

In this case, the composite identifies that high values of Nino34 are associated 
with high values of precipitation anomalies in this region, especially for positive values.  

The red line is the _linear regression_ of precipitaiton anomalies on Niño34 index.
We will learn more about linear regression later in the course.

Compositing does not make any assumptions about how our two datasets are related. 

__Medium Composite__

In [None]:
pt=da_anoms.sel(lat=30,lon=360-90,method='nearest')
plt.scatter(pt,ds_nino34['sst'],s=np.abs(2*pt)+1)
plt.xlim([-5,15])
plt.ylim([-3,3])
plt.xlabel('Precip Anoms at (EQ,120W) [$mm\;day^{-1}$]')
plt.ylabel('Niño34 Index')

m,b = np.linalg.lstsq(np.vstack([pt.values, np.ones(len(pt.values))]).T,ds_nino34['sst'].values,rcond=None)[0]
plt.plot(pt, m*pt + b, 'r', label='Fitted line')

plt.axvline(0,color='darkturquoise',linewidth=0.5)
plt.axhline(1,color='black',linewidth=0.5,linestyle='dotted')
plt.axhline(0,color='black',linewidth=0.5)
plt.axhline(-1,color='black',linewidth=0.5,linestyle='dotted') ;

__Low Composite__

In [None]:
pt=da_anoms.sel(lat=40,lon=360-90,method='nearest')
plt.scatter(pt,ds_nino34['sst'],s=np.abs(2*pt)+1)
plt.xlim([-5,15])
plt.ylim([-3,3])
plt.xlabel('Precip Anoms at (EQ,120W) [$mm\;day^{-1}$]')
plt.ylabel('Niño34 Index')

m,b = np.linalg.lstsq(np.vstack([pt.values, np.ones(len(pt.values))]).T,ds_nino34['sst'].values,rcond=None)[0]
plt.plot(pt, m*pt + b, 'r', label='Fitted line')

plt.axvline(0,color='darkturquoise',linewidth=0.5)
plt.axhline(1,color='black',linewidth=0.5,linestyle='dotted')
plt.axhline(0,color='black',linewidth=0.5)
plt.axhline(-1,color='black',linewidth=0.5,linestyle='dotted') ;

You can see that the fit between precipitaiton anomalies on Niño34 indexprecipitaiton anomalies on Niño34 index
becomes weaker as progress from point to point. The line is flatter and the scatter appears more uniform and random.


### Interpreting a composite

* What kind of questions can a composite answer?
* Why should we be careful about comparing El Nino or La Nina with Neutral?
* How could we made this comparison better?
* In what situation might a composite be misleading?

### Summary & Key Points

* A composite is a mean made based on a certain condition.
* We can make composites using the `xr.where` function.
* A composite does not make any assumptions about how two datasets are related.
* We need to be aware of outliers, variability, and sample size when making composites.
* There remains variability that is not considered in a composite.
