# Analysis of Gridded Ensemble Precipitation and Temperature Estimates over the Contiguous United States

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Analysis-of-Gridded-Ensemble-Precipitation-and-Temperature-Estimates-over-the-Contiguous-United-States" data-toc-modified-id="Analysis-of-Gridded-Ensemble-Precipitation-and-Temperature-Estimates-over-the-Contiguous-United-States-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Analysis of Gridded Ensemble Precipitation and Temperature Estimates over the Contiguous United States</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Learning Objectives</a></span></li><li><span><a href="#Create-and-Connect-to-Dask-Distributed-Cluster" data-toc-modified-id="Create-and-Connect-to-Dask-Distributed-Cluster-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Create and Connect to Dask Distributed Cluster</a></span></li><li><span><a href="#Open-Dataset" data-toc-modified-id="Open-Dataset-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Open Dataset</a></span></li><li><span><a href="#Figure:-Elevation-and-domain-mask" data-toc-modified-id="Figure:-Elevation-and-domain-mask-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Figure: Elevation and domain mask</a></span></li><li><span><a href="#Quantify-the-ensemble-uncertainty-for-a-single-day" data-toc-modified-id="Quantify-the-ensemble-uncertainty-for-a-single-day-1.5"><span class="toc-item-num">1.5&nbsp;&nbsp;</span>Quantify the ensemble uncertainty for a single day</a></span></li><li><span><a href="#Intra-ensemble-range" data-toc-modified-id="Intra-ensemble-range-1.6"><span class="toc-item-num">1.6&nbsp;&nbsp;</span>Intra-ensemble range</a></span></li><li><span><a href="#Calling-compute" data-toc-modified-id="Calling-compute-1.7"><span class="toc-item-num">1.7&nbsp;&nbsp;</span>Calling compute</a></span></li><li><span><a href="#Figure:-Intra-ensemble-range" data-toc-modified-id="Figure:-Intra-ensemble-range-1.8"><span class="toc-item-num">1.8&nbsp;&nbsp;</span>Figure: Intra-ensemble range</a></span></li><li><span><a href="#Average-seasonal-snowfall" data-toc-modified-id="Average-seasonal-snowfall-1.9"><span class="toc-item-num">1.9&nbsp;&nbsp;</span>Average seasonal snowfall</a></span></li><li><span><a href="#Figure:-Average-seasonal-snowfall-totals" data-toc-modified-id="Figure:-Average-seasonal-snowfall-totals-1.10"><span class="toc-item-num">1.10&nbsp;&nbsp;</span>Figure: Average seasonal snowfall totals</a></span></li><li><span><a href="#Extract-a-time-series-of-annual-maximum-precipitation-events-over-a-region" data-toc-modified-id="Extract-a-time-series-of-annual-maximum-precipitation-events-over-a-region-1.11"><span class="toc-item-num">1.11&nbsp;&nbsp;</span>Extract a time series of annual maximum precipitation events over a region</a></span></li><li><span><a href="#Figure:-Timeseries-of-maximum-precipitation-near-Boulder,-CO." data-toc-modified-id="Figure:-Timeseries-of-maximum-precipitation-near-Boulder,-CO.-1.12"><span class="toc-item-num">1.12&nbsp;&nbsp;</span>Figure: Timeseries of maximum precipitation near Boulder, CO.</a></span></li></ul></li></ul></div>

## Learning Objectives

For this example, we'll work with 100 member ensemble of precipitation and temperature data. For this notebook, we will be using data stored using the [Zarr](https://zarr.readthedocs.io/en/stable/) format.


Link to the original dataset stored in netCDF format: https://www.earthsystemgrid.org/dataset/gridded_precip_and_temp.html



In [None]:
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import numpy as np
import xarray as xr
import matplotlib.pyplot as plt
import dask
from distributed.utils import format_bytes
import hvplot.pandas

## Create and Connect to Dask Distributed Cluster

In [None]:
from dask.distributed import Client
from ncar_jobqueue import NCARCluster

cluster = NCARCluster(memory="100GB", cores=36, processes=1,
                      walltime="00:15:00", project="NIOW0001", queue="dav")
# Scale adaptively between 1 and 10 dask workers
cluster.adapt(minimum=1, maximum=10, wait_count=60)
cluster

☝️ Don't forget to click the link above to view the scheduler dashboard!

In [None]:
# Connect client to the remote dask workers
client = Client(cluster)
client

## Open Dataset

We'll load the dataset using a package called [xarray](http://xarray.pydata.org/en/stable/). Under the hood, this dataset is stored using the [Zarr](https://zarr.readthedocs.io/en/stable/) format.

The dataset has dimensions of time, latitude, longitude, and ensemble member.

In [None]:
store = '/glade/scratch/abanihi/data/gmet_v1.zarr'
%time ds = xr.open_zarr(store, consolidated=True)

In [None]:
# Get dataset size
format_bytes(ds.nbytes)

In [None]:
# Print dataset
ds

In [None]:
ds.pcp.data

## Figure: Elevation and domain mask

A quick plot of the mask to give us an idea of our spatial domain

In [None]:
%%time
elevation = ds['elevation']
elevation = elevation.where(elevation > 0).load()
elevation.plot(figsize=(10, 6))
plt.title('Domain Elevation')

## Quantify the ensemble uncertainty for a single day

This dataset provides 100 equally likely realizations of the temperature/precipitation that could have occured, given the station-observed weather. We can quantify the uncertaintly that comes from observation and gridding errors like this:

In [None]:
temp = ds['t_mean'].sel(time='1984-07-31')
temp_ens_mean = temp.mean('member_id')
temp_errors = temp - temp_ens_mean
temp_std_errors = temp_errors.std('member_id')

In [None]:
temp_std_errors.plot(robust=True, figsize=(10, 6))

As we can see, remote and topographically complex areas tend to have larger uncertainties in this dataset.

## Intra-ensemble range

We calculate the intra-ensemble range for all the mean daily temperature in this dataset.  This gives us a sense of uncertainty.

In [None]:
temp_mean = ds['t_mean'].mean(dim='time')
spread = (temp_mean.max(dim='member_id')
          - temp_mean.min(dim='member_id'))
spread

## Calling compute
The expressions above didn't actually compute anything. They just build the task graph. To do the computations, we call the `compute()` or `persist()` or `load()` methods:

In [None]:
spread = spread.compute(retries=2)
spread

## Figure: Intra-ensemble range


In [None]:
spread.attrs['units'] = 'degC'
spread.plot(robust=True, figsize=(10, 6))
plt.title('Intra-ensemble range in mean annual temperature')

## Average seasonal snowfall

We can compute a crude estimate of average seasonal snowfall using the temperature and precipitation variables in our dataset. Here, we'll look at the first 4 ensemble members and make some maps of the seasonal total snowfall in each ensemble member.

In [None]:
da_snow = ds['pcp'].where(ds['t_mean'] < 0.)\
                   .resample(time='QS-Mar').sum('time')


seasonal_snow = da_snow.isel(member_id=slice(0, 4))\
    .groupby('time.season').mean('time')\
    .load()

In [None]:
# properly sort the seasons
seasonal_snow = seasonal_snow.sel(season=['DJF', 'MAM', 'JJA', 'SON'])
seasonal_snow.attrs['units'] = 'mm/season'
seasonal_snow

## Figure: Average seasonal snowfall totals 

In [None]:
seasonal_snow.plot.pcolormesh(col='season', row='member_id',
                              cmap='Blues', robust=True)

## Extract a time series of annual maximum precipitation events over a region

In the previous two examples, we've mostly reduced the time and/or ensemble dimension. Here, we'll do a reduction operation on the spatial dimension to look at a time series of extreme precipitation events near Boulder, CO (40.0150° N, 105.2705° W).


In [None]:
buf = 0.25  # look at Boulder +/- 0.25 deg

ds_co = ds.sel(lon=slice(-105.2705-buf, -105.2705+buf),
               lat=slice(40.0150-buf, 40.0150+buf))

In [None]:
pcp_ann_max = ds_co['pcp'].resample(time='AS').max('time')

In [None]:
pcp_ann_max_ts = pcp_ann_max.max(('lat', 'lon')).load()
pcp_ann_max_ts

## Figure: Timeseries of maximum precipitation near Boulder, CO.

In [None]:
pcp_ann_max_ts.hvplot.line(x='time', title='Maximum precipitation near Boulder, CO',
                           legend=False)

In [None]:
# Gracefully destroy/close our cluster
client.close()
cluster.close()