In [1]:
import dask.array as da
import fsspec
import numpy as np
import pyproj
import pystac
import rioxarray
import stac2dcache
import xarray as xr

# Spring Index Models from Daymet

## 1. Introduction

### 1.1 Overview

In this notebook we calculate two spring onset indicators, namely **the day of first leaf appearance** and **the day of first bloom**, as 1-km gridded estimates over the conterminous United States (CONUS). As input data, we use variables from the [Daymet dataset](https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1840), which we have previously retrieved to the [SURF dCache storage](http://doc.grid.surfsara.nl/en/stable/Pages/Service/system_specifications/dcache_specs.html) in the form of a [SpatioTemporal Asset Catalog](https://stacspec.org/) (see [this notebook](./01-download-Daymet4.ipynb)). The same storage system is used for the output spring index products, which we save in [Zarr](https://zarr.readthedocs.io/en/stable/) format. This work is based on the publication [Izquierdo-Veriguier et al., 2018](
https://doi.org/10.1016/j.agrformet.2018.06.028). 

### 1.2 The model

The first-leaf and first-bloom spring indices have been computed following the Extended Spring Index (SI-x) models from [Schwartz et al., 2013](https://doi.org/10.1002/joc.3625). Input data variables, taken from the Daymet dataset, are the daily minimum and maximum temperatures and the daylight duration. 

Using the SI-x models, the first-leaf and first-bloom dates are estimated for three reference plant species (*Lilac*, *Arnold Red*, and *Zabeli*), from which average leaving and blooming dates are derived. For more information have a look at the original publication [Izquierdo-Veriguier, 2018](
https://doi.org/10.1016/j.agrformet.2018.06.028).

### 1.3 Before running this notebook

The input and output datasets as well as the corresponding  metadata are stored on the SURF dCache system, which we access via bearer-token authentication with a macaroon. The macaroon, generated using [this script](https://github.com/sara-nl/GridScripts/blob/master/get-macaroon), is stored together with other configuration parameters within a JSON fsspec configuration file (also see the [STAC2dCache tutorial](https://github.com/NLeSC-GO-common-infrastructure/stac2dcache/blob/main/notebooks/tutorial.ipynb) and the [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration) for more info):

```json
{
    "dcache": {
        "token": "<MACAROON_STRING_HERE>",
        "api_url": "https://dcacheview.grid.surfsara.nl:22880/api/v1",
        "webdav_url": "https://webdav.grid.surfsara.nl:2880",
        "block_size": 0, 
    "request_kwargs": {
            "timeout": 3600
        }
    }
}
```

## 2. Calculating the Spring Indices

### 2.1 Overview

The calculation of the spring index events involves the following steps: 
* opening the input variables from the retrieved collection; 
* performing some preprocessing operations (filtering the spatial and temporal extents from the daily records, carrying out few conversions);
* estimating the spring index dates on the 1-km grid on which input variables are provided;
* saving the output.

All the steps are run by looping over years and by using a [Dask](http://dask.org) cluster to parallelize operations over spatial regions and days of the year. 

### 2.2 Input parameters  

The following variables define the parameters for the spring index calculations. These include the range of years, the range of days where to look for the spring onset events, the boundaries of the area of interest and the size of the chunks used to process the input data. 

In [2]:
# Range of years to calculate spring index 
years = range(1980, 2022)

# Year day range for calculating growing degree hours
startdate = 1 
enddate = 300

# Bounding box expressed in lat/lon degrees
bbox_latlon = (-124.784, 24.743, -66.951, 49.346)

# Load input dataset using these chunk sizes
chunks = {"time": 5, "x": 1000, "y": 1000}

We also set the dCache path to the STAC catalog where we have archived the Daymet dataset and the path where to store the output spring indices:

In [3]:
# dCache project root path
root_urlpath = (
    "dcache://pnfs/grid.sara.nl/data/remotesensing/disk/"
)

catalog_urlpath = f"{root_urlpath}/daymet-daily-v4/catalog.json"
output_urlpath = f"{root_urlpath}/spring-index-models.zarr"

### 2.3 The model

The SI-x model is encoded in the following few functions, which are used to calculate the first-leaf and first-bloom spring index dates. From the input variables extracted from Daymet, the growing degree hours (GDH) is first computed. A set of predictors is then calculated from the GDH, and these are in turn used to estimate the spring onset dates for the three reference plant species (and their mean).  

In [4]:
BASE_TEMP_FAHRENHEIT = 32.

LEAF_INDEX_COEFFS = xr.DataArray(
    data=np.array([
        [3.306, 13.878, 0.201, 0.153],
        [4.266, 20.899, 0.000, 0.248],
        [2.802, 21.433, 0.266, 0.000],
    ]),
    dims=("plant", "variable"),
    coords={"plant": ["lilac", "arnold red", "zabelli"]}
)

BLOOM_INDEX_COEFFS = xr.DataArray(
    data=np.array([
        [-23.934, 0.116],
        [-24.825, 0.127],
        [-11.368, 0.096],
    ]),
    dims=("plant", "variable"),
    coords={"plant": ["lilac", "arnold red", "zabelli"]}
)

LEAF_INDEX_LIMIT = 637

In [5]:
def calculate_gdh(dayl, tmin, tmax):
    """ 
    Calculate growing degree hours (GDH). 
    """
    
    ideal_dl = np.floor(dayl)

    # Calculate sunset temperature
    dt = tmax - tmin
    sunset = np.sin(np.pi/(dayl + 4)*dayl)*dt + tmin
    
    hours = xr.DataArray(
        da.arange(24), 
        dims=("hours",),
        name="hours",
    )
    
    a = hours - ideal_dl
    log1 = np.log(a, where=a>0)
    eq1 = np.sin(hours * np.pi/(dayl + 4))*dt + tmin
    eq2 = - log1*(sunset - tmin)/(np.log(24 - dayl)) + sunset
    t = xr.where(a<=0, eq1, eq2) - BASE_TEMP_FAHRENHEIT
    t = t.clip(min=0)
    return t.sum(dim="hours", skipna=False)


def calculate_predictors(gdh, day):
    """
    Calculate predictors to estimate first leaf and first bloom dates.
    """
    
    # Calculating dde2 - trailing 3 days GDH sum from day i to day i-2
    dde2 = gdh.rolling(time=3, center=False).sum()
    dde2[0,:,:] = gdh[0,:,:]*3
    dde2[1,:,:] = gdh[1,:,:] + gdh[0,:,:]*2

    # Calculating aggregate GDH 
    agdh = gdh.cumsum(axis=0, skipna=False)

    # Calculating dd57 - trailing 5-7 days GDH sum from day i-5 to i-7
    dd57 = gdh.rolling(time=8, center=False, min_periods=1).sum() \
        - gdh.rolling(time=5, center=False, min_periods=1).sum()

    # Calculating MDS0
    mds0 = day - 1

    return dde2, agdh, dd57, mds0


def calculate_first_bloom(mds0, agdh):
    """
    Calculate day of first bloom for each plant species from GDH.
    """
    
    # Prediction calculation for first bloom
    mdsum = BLOOM_INDEX_COEFFS[:,0]*mds0 \
        + BLOOM_INDEX_COEFFS[:,1]*agdh
    
    mdbool = mdsum>999.5  # Calculate all occurences of first bloom

    # Vectorized approach to identifying first day of bloom
    outdate = mdbool.argmax(dim="time")
    outdate = outdate.where(mdbool.sum(dim="time")>0)
    
    outdate = add_plant_mean(outdate)
    return outdate


def calculate_first_leaf(mds0, dde2, dd57):
    """
    Calculate day of first leaf for each plant species from GDH.
    """ 
    
    # Calculating synop
    synflag = dde2>=LEAF_INDEX_LIMIT
    synop = synflag.cumsum(dim="time")
            
    # Prediction calculation for first leaf
    mdsum = LEAF_INDEX_COEFFS[:,0]*mds0 \
        + LEAF_INDEX_COEFFS[:,1]*synop \
        + LEAF_INDEX_COEFFS[:,2]*dde2 \
        + LEAF_INDEX_COEFFS[:,3]*dd57

    mdbool = mdsum>999.5  # Calculate all occurences of first leaf

    # Vectorized approach to identifying first day of leaf
    outdate = mdbool.argmax(dim="time")
    outdate = outdate.where(mdbool.sum(dim="time")>0)
            
    # Arnold red's first leaf is one day after reaching mdsum limit
    day_shift = xr.DataArray(
        da.array([0, 1, 0]),
        dims=("plant",),
        coords={"plant": ["lilac", "arnold red", "zabelli"]}
    )
    outdate = outdate + day_shift
    
    outdate = add_plant_mean(outdate)
    return outdate


def add_plant_mean(outdate):
    """
    Average spring index date over plant species and add this as a new layer. 
    """
    
    mean = outdate.mean(dim="plant", skipna=False).round()
    mean = mean.expand_dims(plant=["mean"])
    return xr.concat([outdate, mean], dim="plant")

### 2.4 Open the input catalog 

The input variables (minimum temperature, maximum temperature and day length duration) are extracted from the Daymet catalog, which we have dowloaded earlier as a STAC catalog (see [this notebook](./01-download-Daymet4.ipynb)). In order to get access to the data we load the catalog:

In [6]:
catalog = pystac.Catalog.from_file(catalog_urlpath)
catalog

<Catalog id=daymet-daily-v4>

In addition to providing links to the data, the catalog provides all the dataset's metadata, which we use e.g. to convert the bounding box from latitude/logitude degrees to the dataset's coordinate reference system (CRS):

In [7]:
# Extract information about input CRS from metadata
_item = next(catalog.get_all_items())
proj_json = _item.properties["proj:projjson"]
crs_lcc = pyproj.CRS.from_json_dict(proj_json)

# Set up CRS converter
transformer = pyproj.Transformer.from_crs(
    crs_from="EPSG:4326", 
    crs_to=crs_lcc,
    always_xy=True,
)

# Calculate bbox in the dataset's CRS
bbox = transformer.transform_bounds(*bbox_latlon)

### 2.5 Connect to the cluster

Once we are ready to run the calculation we setup a Dask cluster and create a client connection with that. This is most easily achieved via the Dask JupyterLab extension (look for the Dask logo on the left tab of the JupyterLab interface):  

In [8]:
from dask.distributed import Client

client = Client("tcp://10.0.2.186:38717")
client

0,1
Connection method: Direct,
Dashboard: /proxy/8787/status,

0,1
Comm: tcp://10.0.2.186:38717,Workers: 15
Dashboard: /proxy/8787/status,Total threads: 60
Started: 22 minutes ago,Total memory: 450.00 GiB

0,1
Comm: tcp://10.0.2.186:42183,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:32941,
Local directory: /tmp/dask-worker-space/worker-xtmmuuri,Local directory: /tmp/dask-worker-space/worker-xtmmuuri
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 141.45 MiB,Spilled bytes: 0 B
Read bytes: 118.94 kiB,Write bytes: 228.02 kiB

0,1
Comm: tcp://10.0.2.186:37253,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:41915,
Local directory: /tmp/dask-worker-space/worker-c95bk7xm,Local directory: /tmp/dask-worker-space/worker-c95bk7xm
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 136.92 MiB,Spilled bytes: 0 B
Read bytes: 114.24 kiB,Write bytes: 214.46 kiB

0,1
Comm: tcp://10.0.2.186:37341,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:32983,
Local directory: /tmp/dask-worker-space/worker-2ogdcup1,Local directory: /tmp/dask-worker-space/worker-2ogdcup1
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 137.54 MiB,Spilled bytes: 0 B
Read bytes: 128.17 kiB,Write bytes: 236.59 kiB

0,1
Comm: tcp://10.0.2.186:34083,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:42549,
Local directory: /tmp/dask-worker-space/worker-8mmzzj9e,Local directory: /tmp/dask-worker-space/worker-8mmzzj9e
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 138.41 MiB,Spilled bytes: 0 B
Read bytes: 113.79 kiB,Write bytes: 214.98 kiB

0,1
Comm: tcp://10.0.2.186:42973,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:46215,
Local directory: /tmp/dask-worker-space/worker-m5vfjv6f,Local directory: /tmp/dask-worker-space/worker-m5vfjv6f
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 141.12 MiB,Spilled bytes: 0 B
Read bytes: 113.82 kiB,Write bytes: 215.04 kiB

0,1
Comm: tcp://10.0.2.186:43247,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:40091,
Local directory: /tmp/dask-worker-space/worker-ezjahq0n,Local directory: /tmp/dask-worker-space/worker-ezjahq0n
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 138.94 MiB,Spilled bytes: 0 B
Read bytes: 120.82 kiB,Write bytes: 229.64 kiB

0,1
Comm: tcp://10.0.2.186:42333,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:42779,
Local directory: /tmp/dask-worker-space/worker-t68y4xvz,Local directory: /tmp/dask-worker-space/worker-t68y4xvz
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 139.64 MiB,Spilled bytes: 0 B
Read bytes: 122.08 kiB,Write bytes: 230.62 kiB

0,1
Comm: tcp://10.0.2.186:33297,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:36993,
Local directory: /tmp/dask-worker-space/worker-_t6dio9n,Local directory: /tmp/dask-worker-space/worker-_t6dio9n
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 138.12 MiB,Spilled bytes: 0 B
Read bytes: 129.22 kiB,Write bytes: 237.99 kiB

0,1
Comm: tcp://10.0.1.193:33343,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.1.193:34823,
Local directory: /tmp/dask-worker-space/worker-7zabadl1,Local directory: /tmp/dask-worker-space/worker-7zabadl1
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 136.01 MiB,Spilled bytes: 0 B
Read bytes: 0.0 B,Write bytes: 0.0 B

0,1
Comm: tcp://10.0.2.186:37761,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:37381,
Local directory: /tmp/dask-worker-space/worker-5fbvc4ek,Local directory: /tmp/dask-worker-space/worker-5fbvc4ek
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 145.48 MiB,Spilled bytes: 0 B
Read bytes: 117.20 kiB,Write bytes: 226.09 kiB

0,1
Comm: tcp://10.0.2.186:39983,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:38653,
Local directory: /tmp/dask-worker-space/worker-qywdfwtq,Local directory: /tmp/dask-worker-space/worker-qywdfwtq
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 138.14 MiB,Spilled bytes: 0 B
Read bytes: 118.56 kiB,Write bytes: 227.29 kiB

0,1
Comm: tcp://10.0.2.186:34481,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:38915,
Local directory: /tmp/dask-worker-space/worker-28m5l6zp,Local directory: /tmp/dask-worker-space/worker-28m5l6zp
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 132.29 MiB,Spilled bytes: 0 B
Read bytes: 132.87 kiB,Write bytes: 241.56 kiB

0,1
Comm: tcp://10.0.2.186:45983,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:44281,
Local directory: /tmp/dask-worker-space/worker-olgo1f3i,Local directory: /tmp/dask-worker-space/worker-olgo1f3i
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 137.85 MiB,Spilled bytes: 0 B
Read bytes: 129.39 kiB,Write bytes: 238.30 kiB

0,1
Comm: tcp://10.0.2.186:37789,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:41363,
Local directory: /tmp/dask-worker-space/worker-hm2xv642,Local directory: /tmp/dask-worker-space/worker-hm2xv642
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 138.18 MiB,Spilled bytes: 0 B
Read bytes: 112.49 kiB,Write bytes: 213.00 kiB

0,1
Comm: tcp://10.0.2.186:38413,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:36927,
Local directory: /tmp/dask-worker-space/worker-fwed20gs,Local directory: /tmp/dask-worker-space/worker-fwed20gs
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 139.28 MiB,Spilled bytes: 0 B
Read bytes: 131.98 kiB,Write bytes: 240.97 kiB


Here we have created a cluster with 15 nodes, and we wait for them to be available:

In [9]:
client.wait_for_workers(n_workers=15)

### 2.6 Run the model

Once the Dask cluster is reachable, we can start the computation! We define few convenience functions to open the dataset using the Xarray library, preprocess the input variables and save the output products to the storage. Note that by setting the size of the data "chunks", we choose to use Dask arrays as underlying data structure. All calls to Xarray's objects are then lazily executed until data are redirected to disk, which triggers the calculation of the spring index for a given year.

In [10]:
def open_dataset(urlpaths, **kwargs):
    """
    Open the remote files as a single dataset. 
    """
    
    ofs = fsspec.open_files(urlpaths, block_size=4*2**20)
    return xr.open_mfdataset(
        [of.open() for of in ofs],
        engine="h5netcdf", 
        decode_coords="all",
        drop_variables=("lat", "lon"),
        **kwargs
    )


def preprocess_dataset(ds, startdate, enddate, bbox):
    """
    Subset the input dataset and make necessary conversions.
    """
    
    # Select time range for GDH calculation
    ds = ds.isel(time=slice(startdate-1, enddate))
    
    # Spatial selection
    ds = ds.rio.clip_box(*bbox)
    
    # Convert temperatures to Fahrenheit
    tmax = ds["tmax"] * 1.8 + 32
    tmin = ds["tmin"] * 1.8 + 32

    # Convert daylength from seconds to hours
    dayl = ds["dayl"] / 3600

    # Extract day of year
    day = ds["yearday"]
    return tmax, tmin, dayl, day


def save_to_urlpath(first_leaf, first_bloom, urlpath, group, chunks):
    """
    Save output to urlpath in Zarr format. 
    """
    
    ds = xr.Dataset({
        f"first-leaf": first_leaf, 
        f"first-bloom": first_bloom,
    })
    ds = ds.chunk(chunks)
    
    fs_map = fsspec.get_mapper(urlpath)
    ds.to_zarr(fs_map, group=group)

In [11]:
for year in years:
    
    print(f"Running year {year} ...")
    
    item = catalog.get_item(f"na-{year}", recursive=True)
    hrefs = [
        item.assets[var].get_absolute_href() 
        for var in ("tmin", "tmax", "dayl")
    ]
    
    ds = open_dataset(hrefs, chunks=chunks)
    
    tmax, tmin, dayl, day = preprocess_dataset(ds, startdate, enddate, bbox)
        
    gdh = calculate_gdh(dayl, tmin, tmax)
    
    dde2, agdh, dd57, mds0 = calculate_predictors(gdh, day)
    
    first_leaf = calculate_first_leaf(mds0, dde2, dd57)
    first_bloom = calculate_first_bloom(mds0, agdh)
    
    save_to_urlpath(
        first_leaf,
        first_bloom,
        output_urlpath, 
        f"{year}",
        {"plant": 1, "x": chunks["x"], "y": chunks["y"]},
    )

Running year 1980 ...




Running year 1981 ...




Running year 1982 ...




Running year 1983 ...




Running year 1984 ...




Running year 1985 ...




Running year 1986 ...




Running year 1987 ...




Running year 1988 ...




Running year 1989 ...




Running year 1990 ...




Running year 1991 ...




Running year 1992 ...




Running year 1993 ...




Running year 1994 ...




Running year 1995 ...




Running year 1996 ...




Running year 1997 ...




Running year 1998 ...




Running year 1999 ...




Running year 2000 ...




Running year 2001 ...




Running year 2002 ...




Running year 2003 ...




Running year 2004 ...




Running year 2005 ...




Running year 2006 ...




Running year 2007 ...




Running year 2008 ...




Running year 2009 ...




Running year 2010 ...




Running year 2011 ...




Running year 2012 ...




Running year 2013 ...




Running year 2014 ...




Running year 2015 ...




Running year 2016 ...




Running year 2017 ...




Running year 2018 ...




Running year 2019 ...




Running year 2020 ...




Running year 2021 ...




When done, we shutdown the cluster to release resources:

In [12]:
client.shutdown()