# Getting started with `xarray`

The El-Nino dataset is provided in the form of a netCDF file, which can be accessed in python using modules such as `xarray` and `netCDF4-python`. Here, we will quickly go over how to use `xarray` to access these data to get you started with the hackathon.

In [None]:
%matplotlib inline
import xarray as xr
import matplotlib.pyplot as plt
import matplotlib.patches as patches

As an example, we will work with the sea surface temperature anomaly data from the El-Nino dataset.

NetCDF files can be read in `xarray` using `xarray.open_dataset`, which returns an `xarray.Dataset` object. These are similar to `pandas.DataFrame` objects as we shall see.

In [None]:
sst_dataset = xr.open_dataset('../datasets/elnino/cci_sst_anomalies_1981_2018.nc')
sst_dataset

It is also possible to read multiple files at once using `xarray.open_mfdataset`. To see how this works, let's also download the sea surface salinity data from the El Nino dataset and open this together with the sea surface temperature data using `open_mfdataset`.

In [None]:
# Open SST and SSS together into a single xarray Dataset
mfdataset = xr.open_mfdataset(['../datasets/elnino/cci_sss_anomalies_2010_2019.nc', '../datasets/elnino/cci_sst_anomalies_1981_2018.nc'])
mfdataset

Two key properties of an xarray dataset are the *coordinates* and *data variables*. `Data variables` are the fields such as SST and SSS anomalies that are stored in the dataset, sort of like the columns in a pandas dataframe, and the `coordinates` are multi-dimentional indices for which we can access the values of these fields, such as the latitude, longitude and time.

![xarray image](../images/xarray.png)

Accessing the data variables in a dataset is similar to how we access columns in a pandas dataframe.

In [None]:
sst_anomaly = mfdataset['sst_anomaly']
sst_anomaly

The result is a multidimensional array that is indexed by the `coordinates` of the dataset. To access the coordinates, we use the `sel` or `isel` method, which is analogous to the `loc`/`iloc` method in `pandas`. The only difference is in whether we access the data by position or by name.

In [None]:
# Access SST anomaly field at the first date it was measured
sst_anomaly.isel(time=0) # Equivalent to `sst_anomaly.sel(time='1981-09-01')`

Below, we plot the monthly mean SST anomaly field for `2015-10` and `2013-01`, where there was a strong El Nino event and no El Nino event respectively. We also put a box around the Nino 3.4 region ($-5^\circ : 5^\circ N$, $190^\circ : 240^\circ E$) to highlight the difference.

In [None]:
def plot_sst(date, title):
    """
    Args:
        date: string of form 'yyyy-mm'
        title: string
    """
    fig = plt.figure(figsize=(10, 5))
    ax = fig.gca()
    
    # Plot monthly mean SST anomaly field
    sst_anomaly.sel(time=date).mean('time').plot(ax=ax, cmap='RdBu_r', vmin = -3, vmax = 3)
    
    # Put box around Nino 3.4 region
    rect = patches.Rectangle((190, -5), 50, 10, linewidth=2, edgecolor='lime', facecolor='none')
    ax.add_patch(rect)
    
    plt.title(title, fontsize=14)
    plt.xlabel('longitude', fontsize=12)
    plt.ylabel('latitude', fontsize=12)
    plt.show()
    
date_1, date_2 = '2015-10', '2013-01'
plot_sst(date_1, f'Sea Surface Temperature Anomaly ({date_1})')
plot_sst(date_2, f'Sea Surface Temperature Anomaly ({date_2})')
date_3 = '2016-01'
plot_sst(date_3, f'Sea Surface Temperature Anomaly ({date_3})')

We can clearly see a difference when there is an El Nino event vs when there is no El Nino event!

It is also possible to slice the field to restrict the area to just Nino 3.4 instead of the whole globe.

In [None]:
# Restrict area to Nino 3.4
nino34 = sst_anomaly.sel(lat=slice(-5,5), lon=slice(190,240))

# Plot SST anomaly in Nino 3.4 at the two dates
fig = plt.figure(figsize=(17, 1.5))
ax1 = plt.subplot(1, 2, 1)
ax2 = plt.subplot(1, 2, 2)
nino34.sel(time='2015-10').mean('time').plot(ax=ax1, cmap='RdBu_r', vmin = -3, vmax = 3)
nino34.sel(time='2013-01').mean('time').plot(ax=ax2, cmap='RdBu_r', vmin = -3, vmax = 3)
ax1.set_title('2015-10', fontsize=14)
ax1.set_xlabel('longitude', fontsize=12)
ax1.set_ylabel('latitude', fontsize=12)
ax2.set_title('2013-01', fontsize=14)
ax2.set_xlabel('longitude', fontsize=12)
ax2.set_ylabel('latitude', fontsize=12)
plt.show()

The Nino 3.4 index is defined as the 5 month rolling average of the SST anomaly field restricted to the Nino 3.4 region, which we compute below. An El Nino or La Nina event is characterised by periods where the Nino 3.4 index exceeds $\pm 0.4^\circ C$ for a period of $6$ months or longer.

In [None]:
%matplotlib widget
# Compute Nino 3.4 index
nino34_timeseries = nino34.mean('lat').mean('lon')
nino34_index = nino34_timeseries.rolling(time=5).mean().dropna("time")

# Plot Nino 3.4 index timeseries
fig = plt.figure(figsize=(8, 4))
ax = fig.gca()
nino34_index.plot(ax=ax)

# Plot threshold lines
start_time, end_time = nino34_index.get_index('time')[0], nino34_index.get_index('time')[-1]
plt.hlines(0.4, start_time, end_time, colors = 'black', linestyles = 'dashed')
plt.hlines(0, start_time, end_time, colors = 'black')
plt.hlines(-0.4, start_time, end_time, colors = 'black', linestyles = 'dashed')
plt.xlabel('date', fontsize=12)
plt.ylabel(b'Ni\xc3\xb1o-3.4 index'.decode("utf-8"), fontsize=12)
plt.show()

In [None]:
sss_anomaly = mfdataset['sss_anomaly']
sss_anomaly

In [None]:
%matplotlib inline
def plot_sss(date, title):
    """
    Args:
        date: string of form 'yyyy-mm'
        title: string
    """
    fig = plt.figure(figsize=(10, 5))
    ax = fig.gca()
    
    # Plot monthly mean SST anomaly field
    sss_anomaly.sel(time=date).mean('time').plot(ax=ax, cmap='RdBu_r', vmin = -3, vmax = 3)
    
    # Put box around Nino 3.4 region
    rect = patches.Rectangle((190, -5), 50, 10, linewidth=2, edgecolor='lime', facecolor='none')
    ax.add_patch(rect)
    
    plt.title(title, fontsize=14)
    plt.xlabel('longitude', fontsize=12)
    plt.ylabel('latitude', fontsize=12)
    plt.show()
    
date_1, date_2 = '2015-10', '2013-01'
plot_sss(date_1, f'Sea Surface Salinity Anomaly ({date_1})')
plot_sss(date_2, f'Sea Surface Salinity Anomaly ({date_2})')
date_3 = '2016-01'
plot_sss(date_3, f'Sea Surface Salinity Anomaly ({date_3})')

In [None]:
# Restrict area to Nino 3.4
nino34_sss = sss_anomaly.sel(lat=slice(-5,5), lon=slice(190,240))

# Plot SST anomaly in Nino 3.4 at the two dates
fig = plt.figure(figsize=(17, 1.5))
ax1 = plt.subplot(1, 2, 1)
ax2 = plt.subplot(1, 2, 2)
nino34_sss.sel(time='2015-10').mean('time').plot(ax=ax1, cmap='RdBu_r', vmin = -3, vmax = 3)
nino34_sss.sel(time='2013-01').mean('time').plot(ax=ax2, cmap='RdBu_r', vmin = -3, vmax = 3)
ax1.set_title('2015-10', fontsize=14)
ax1.set_xlabel('longitude', fontsize=12)
ax1.set_ylabel('latitude', fontsize=12)
ax2.set_title('2013-01', fontsize=14)
ax2.set_xlabel('longitude', fontsize=12)
ax2.set_ylabel('latitude', fontsize=12)
plt.show()

In [None]:
%matplotlib widget
# Compute Nino 3.4 salinity index
nino34_sss_timeseries = nino34_sss.mean('lat').mean('lon')
# nino34_sss_index = nino34_sss_timeseries.dropna("time").mean("time")

# Plot Nino 3.4 index timeseries
fig = plt.figure(figsize=(8, 4))
ax = fig.gca()
nino34_sss_timeseries.plot(ax=ax)

# Plot threshold lines
start_time, end_time = nino34_sss_timeseries.get_index('time')[0], nino34_sss_timeseries.get_index('time')[-1]
plt.hlines(0.4, start_time, end_time, colors = 'black', linestyles = 'dashed')
plt.hlines(0, start_time, end_time, colors = 'black')
plt.hlines(-0.4, start_time, end_time, colors = 'black', linestyles = 'dashed')
plt.xlabel('date', fontsize=12)
plt.ylabel(b'Ni\xc3\xb1o-3.4 index'.decode("utf-8"), fontsize=12)
plt.show()