In [1]:
import dask.array as da
import fsspec
import numpy as np
import pyproj
import pystac
import rioxarray
import stac2dcache
import xarray as xr

# Spring Index Models from Daymet

## 1. Introduction

### 1.1 Overview

In this notebook we calculate two spring onset indicators, namely **the day of first leaf appearance** and **the day of first bloom**, as 1-km gridded estimates over the conterminous United States (CONUS). As input data, we use variables from the [Daymet dataset](https://daac.ornl.gov/cgi-bin/dsviewer.pl?ds_id=1840), which we have previously retrieved to the [SURF dCache storage](http://doc.grid.surfsara.nl/en/stable/Pages/Service/system_specifications/dcache_specs.html) in the form of a [SpatioTemporal Asset Catalog](https://stacspec.org/) (see [this notebook](./01-download-Daymet4.ipynb)). The same storage system is used for the output spring index products, which we save in [Zarr](https://zarr.readthedocs.io/en/stable/) format. This work is based on the publication [Izquierdo-Veriguier et al., 2018](https://doi.org/10.1016/j.agrformet.2018.06.028). 

### 1.2 The model

The first-leaf and first-bloom spring indices have been computed following the Extended Spring Index (SI-x) models from [Schwartz et al., 2013](https://doi.org/10.1002/joc.3625). Input data variables, taken from the Daymet dataset, are the daily minimum and maximum temperatures and the daylight duration. 

Using the SI-x models, the first-leaf and first-bloom dates are estimated for three reference plant species (*Lilac*, *Arnold Red*, and *Zabeli*), from which average leaving and blooming dates are derived. For more information have a look at the original publication [Izquierdo-Veriguier, 2018](
https://doi.org/10.1016/j.agrformet.2018.06.028).

### 1.3 Before running this notebook

The input and output datasets as well as the corresponding  metadata are stored on the SURF dCache system, which we access via bearer-token authentication with a macaroon. The macaroon, generated using [this script](https://github.com/sara-nl/GridScripts/blob/master/get-macaroon), is stored together with other configuration parameters within a JSON fsspec configuration file (also see the [STAC2dCache tutorial](https://github.com/NLeSC-GO-common-infrastructure/stac2dcache/blob/main/notebooks/tutorial.ipynb) and the [fsspec documentation](https://filesystem-spec.readthedocs.io/en/latest/features.html#configuration) for more info):

```json
{
    "dcache": {
        "token": "<MACAROON_STRING_HERE>",
        "api_url": "https://dcacheview.grid.surfsara.nl:22880/api/v1",
        "webdav_url": "https://webdav.grid.surfsara.nl:2880",
        "block_size": 0, 
    "request_kwargs": {
            "timeout": 3600
        }
    }
}
```

## 2. Calculating the Spring Indices

### 2.1 Overview

The calculation of the spring index events involves the following steps: 
* opening the input variables from the retrieved collection; 
* performing some preprocessing operations (filtering the spatial and temporal extents from the daily records, carrying out few conversions);
* estimating the spring index dates on the 1-km grid on which input variables are provided;
* saving the output.

All the steps are run by looping over years and by using a [Dask](http://dask.org) cluster to parallelize operations over spatial regions and days of the year. 

### 2.2 Input parameters  

The following variables define the parameters for the spring index calculations. These include the range of years, the range of days where to look for the spring onset events, and the boundaries of the area of interest. 

In [2]:
# Range of years to calculate spring index 
years = range(1980, 2022)

# Year day range for calculating growing degree hours
startdate = 1 
enddate = 300

# Bounding box expressed in lat/lon degrees
bbox_latlon = (-124.784, 24.743, -66.951, 49.346)

We also set the dCache path to the STAC catalog where we have archived the Daymet dataset and the path where to store the output spring indices:

In [3]:
# dCache project root path
root_urlpath = (
    "dcache://pnfs/grid.sara.nl/data/remotesensing/disk/"
)

catalog_urlpath = f"{root_urlpath}/daymet-daily-v4/catalog.json"
output_urlpath = f"{root_urlpath}/spring-index-models.zarr"

### 2.3 The model

The SI-x model is encoded in the following few functions, which are used to calculate the first-leaf and first-bloom spring index dates. From the input variables extracted from Daymet, the growing degree hours (GDH) is first computed. A set of predictors is then calculated from the GDH, and these are in turn used to estimate the spring onset dates for the three reference plant species (and their mean).  

In [4]:
BASE_TEMP_FAHRENHEIT = 31.

HOURS = xr.DataArray(
    data=da.arange(24), 
    dims=("hours",),
)

DAYS = xr.DataArray(
    data=da.arange(startdate, enddate+1),
    dims=("time",),
)

LEAF_INDEX_COEFFS = xr.DataArray(
    data=da.from_array(
        [
            [3.306, 13.878, 0.201, 0.153],
            [4.266, 20.899, 0.000, 0.248],
            [2.802, 21.433, 0.266, 0.000],
        ],
        chunks=(1,-1)
    ),
    dims=("plant", "variable"),
    coords={"plant": ["lilac", "arnold red", "zabelli"]}
)

BLOOM_INDEX_COEFFS = xr.DataArray(
    data=da.from_array(
        [
            [-23.934, 0.116],
            [-24.825, 0.127],
            [-11.368, 0.096],
        ],
        chunks=(1,-1)
    ),
    dims=("plant", "variable"),
    coords={"plant": ["lilac", "arnold red", "zabelli"]}
)

LEAF_INDEX_LIMIT = 637

In [5]:
def calculate_gdh(dayl, tmin, tmax):
    """ 
    Calculate growing degree hours (GDH). 
    """
    
    dt = tmax - tmin
    const = np.sin(np.pi/(dayl + 4) * dayl) * dt
    
    eq1 = np.sin(HOURS * np.pi/(dayl + 4)) * dt 
    eq2 = (1 - np.log(HOURS - np.floor(dayl))/np.log(24 - dayl)) * const
    t = xr.where(~np.isfinite(eq2), eq1, eq2) + tmin - BASE_TEMP_FAHRENHEIT
    t = t.clip(min=0)
    return t.sum(dim="hours", skipna=False)


def calculate_leaf_predictors(gdh):
    """
    Calculate predictors for first leaf: DDE2, DD57, MDS0, and SYNOP.
    """
    
    # Pad GDH to solve issues with first days of the year
    gdh_padded = gdh.pad(time=(7,0), mode="edge")
    
    # Calculating dde2 - trailing 3 days GDH sum from day i-2 to i
    dde2 = gdh_padded.rolling(time=3, center=False).sum()
    dde2 = dde2.isel(time=slice(7, None))  # drop padded values 
    
    # Calculating dd57 - trailing 5-7 days GDH sum from day i-7 to i-5
    dd57 = gdh_padded.rolling(time=8, center=False).sum() \
        - gdh_padded.rolling(time=5, center=False).sum()
    dd57 = dd57.isel(time=slice(7, None))  # drop padded values
    
    # Calculating mds0
    mds0 = DAYS - 1
    
    # Calculating synop
    synflag = dde2>=LEAF_INDEX_LIMIT
    synop = synflag.cumsum(dim="time")

    return dde2, dd57, mds0, synop


def calculate_bloom_predictors(gdh, first_leaf):
    """
    Calculate predictors for first bloom index: MDS0 and AGDH.
    
    Note: these descriptors are computed from the first leaf date.
    """
    
    # Calculating MDS0
    mds0 = DAYS - first_leaf
    mask = mds0 > 0
    mds0 = xr.where(mask, np.floor(mds0), 0.)
    
    # Calculating aggregate GDH 
    agdh = gdh.where(mask, 0.).cumsum(dim="time")
    
    return mds0, agdh


def calculate_first_leaf(dde2, dd57, mds0, synop):
    """
    Calculate day of first leaf for each plant species from GDH.
    """ 
            
    # Prediction calculation for first leaf
    mdsum = LEAF_INDEX_COEFFS[:,0]*mds0 \
        + LEAF_INDEX_COEFFS[:,1]*synop \
        + LEAF_INDEX_COEFFS[:,2]*dde2 \
        + LEAF_INDEX_COEFFS[:,3]*dd57

    mdbool = mdsum>999.5  # Calculate all occurences of first leaf

    # Vectorized approach to identifying first day of leaf
    outdate = mdbool.argmax(dim="time")
    outdate = outdate.where(mdbool.sum(dim="time")>0)
            
    # Arnold red's first leaf is one day after reaching mdsum limit
    day_shift = xr.DataArray(
        da.array([0, 1, 0]),
        dims=("plant",),
        coords={"plant": ["lilac", "arnold red", "zabelli"]}
    )
    outdate = outdate + day_shift
    return outdate


def calculate_first_bloom(mds0, agdh):
    """
    Calculate day of first bloom for each plant species from GDH.
    """
    
    # Prediction calculation for first bloom
    mdsum = BLOOM_INDEX_COEFFS[:,0]*mds0 \
        + BLOOM_INDEX_COEFFS[:,1]*agdh
    
    mdbool = mdsum>999.5  # Calculate all occurences of first bloom

    # Vectorized approach to identifying first day of bloom
    outdate = mdbool.argmax(dim="time")
    outdate = outdate.where(mdbool.sum(dim="time")>0)
    return outdate


def add_mean_plant_layer(outdate):
    """
    Average the spring index date over plant species and add the mean
    as a new layer.
    """
    
    mean = outdate.mean(dim="plant", skipna=False).round()
    mean = mean.expand_dims(plant=["mean"])
    return xr.concat([outdate, mean], dim="plant")

### 2.4 Open the input catalog 

The input variables (minimum temperature, maximum temperature and day length duration) are extracted from the Daymet catalog, which we have dowloaded earlier as a STAC catalog (see [this notebook](./01-download-Daymet4.ipynb)). In order to get access to the data we load the catalog:

In [6]:
catalog = pystac.Catalog.from_file(catalog_urlpath)
catalog

0
ID: daymet-daily-v4
"Title: Daymet: Daily Surface Weather Data on a 1-km Grid for North America, Version 4"
"Description: This dataset provides Daymet Version 4 data as gridded estimates of daily weather parameters for North America, Hawaii, and Puerto Rico. Daymet variables include the following parameters: minimum temperature, maximum temperature, precipitation, shortwave radiation, vapor pressure, snow water equivalent, and day length. The dataset covers the period from January 1, 1980, to December 31 (or December 30 in leap years) of the most recent full calendar year for the Continental North America and Hawaii spatial regions. Data for Puerto Rico is available starting in 1950. Each subsequent year is processed individually at the close of a calendar year. Daymet variables are provided as individual files, by variable and year, at a 1 km x 1 km spatial resolution and a daily temporal resolution. Areas of Hawaii and Puerto Rico are available as files separate from the continental North America. Data are in a North America Lambert Conformal Conic projection and are distributed in a standardized Climate and Forecast (CF)-compliant netCDF file format."
type: Catalog

0
ID: region-na
Description: Daymet dataset for Continental North America
Providers:  ORNL DAAC (producer)
type: Collection
stac_extensions: ['https://stac-extensions.github.io/scientific/v1.0.0/schema.json']
sci:doi: 10.3334/ORNLDAAC/1840
"providers: [{'name': 'ORNL DAAC', 'roles': ['producer'], 'url': 'https://doi.org/10.3334/ORNLDAAC/1840'}]"

0
https://stac-extensions.github.io/scientific/v1.0.0/schema.json

0
ID: na-1980
"Bounding Box: [-178.1333, 14.0749, -53.0567, 82.9143]"
Datetime: 1980-01-01 00:00:00+00:00
gsd: 1000
proj:epsg: None
"proj:projjson: {'$schema': 'https://proj.org/schemas/v0.4/projjson.schema.json', 'type': 'ProjectedCRS', 'name': 'unknown', 'base_crs': {'name': 'unknown', 'datum': {'type': 'GeodeticReferenceFrame', 'name': 'Unknown based on WGS84 ellipsoid', 'ellipsoid': {'name': 'WGS 84', 'semi_major_axis': 6378137, 'inverse_flattening': 298.257223563, 'id': {'authority': 'EPSG', 'code': 7030}}}, 'coordinate_system': {'subtype': 'ellipsoidal', 'axis': [{'name': 'Longitude', 'abbreviation': 'lon', 'direction': 'east', 'unit': 'degree'}, {'name': 'Latitude', 'abbreviation': 'lat', 'direction': 'north', 'unit': 'degree'}]}}, 'conversion': {'name': 'unknown', 'method': {'name': 'Lambert Conic Conformal (2SP)', 'id': {'authority': 'EPSG', 'code': 9802}}, 'parameters': [{'name': 'Latitude of false origin', 'value': 42.5, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8821}}, {'name': 'Longitude of false origin', 'value': -100, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8822}}, {'name': 'Latitude of 1st standard parallel', 'value': 25, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8823}}, {'name': 'Latitude of 2nd standard parallel', 'value': 60, 'unit': 'degree', 'id': {'authority': 'EPSG', 'code': 8824}}, {'name': 'Easting at false origin', 'value': 0, 'unit': 'metre', 'id': {'authority': 'EPSG', 'code': 8826}}, {'name': 'Northing at false origin', 'value': 0, 'unit': 'metre', 'id': {'authority': 'EPSG', 'code': 8827}}]}, 'coordinate_system': {'subtype': 'Cartesian', 'axis': [{'name': 'Easting', 'abbreviation': 'E', 'direction': 'east', 'unit': 'metre'}, {'name': 'Northing', 'abbreviation': 'N', 'direction': 'north', 'unit': 'metre'}]}}"
datetime: 1980-01-01T00:00:00Z
stac_extensions: ['https://stac-extensions.github.io/projection/v1.0.0/schema.json']

0
https://stac-extensions.github.io/projection/v1.0.0/schema.json

0
href: ./daymet_v4_daily_na_dayl_1980.nc
Title: Day length
Description: Duration of the daylight period in seconds per day. This calculation is based on the period of the day during which the sun is above a hypothetical flat horizon
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: s/day

0
href: https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840//daymet_v4_daily_na_prcp_1980.nc
Title: Precipitation
Description: Daily total precipitation in millimeters. Sum of all forms of precipitation converted to a water-equivalent depth.
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: mm

0
href: https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840//daymet_v4_daily_na_srad_1980.nc
Title: Shortwave radiation
"Description: Incident shortwave radiation flux density in watts per square meter, taken as an average over the daylight period of the day. Note: Daily total radiation (MJ/m2/day) can be calculated as follows: ((srad (W/m2) * dayl (s/day)) / l,000,000)"
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: W/m2

0
href: https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840//daymet_v4_daily_na_swe_1980.nc
Title: Snow water equivalent
Description: Snow water equivalent in kilograms per square meter. The amount of water contained within the snowpack.
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: kg/m2

0
href: ./daymet_v4_daily_na_tmax_1980.nc
Title: Maximum air temperature
Description: Daily maximum 2 m air temperature in degrees Celsius.
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: degrees C

0
href: ./daymet_v4_daily_na_tmin_1980.nc
Title: Minimum air temperature
Description: Daily minimum 2 m air temperature in degrees Celsius.
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: degrees C

0
href: https://thredds.daac.ornl.gov/thredds/fileServer/ornldaac/1840//daymet_v4_daily_na_vp_1980.nc
Title: Water vapor pressure
Description: Water vapor pressure in pascals. Daily average partial pressure of water vapor.
Media type: application/x-hdf5
Roles: ['data']
Owner:
units: Pa

0
Rel: root
Target:
Media Type: application/json

0
Rel: collection
Target: ../collection.json
Media Type: application/json

0
Rel: self
Target: dcache://pnfs/grid.sara.nl/data/remotesensing/disk/daymet-daily-v4/region-na/na-1980/na-1980.json
Media Type: application/json

0
Rel: parent
Target:
Media Type: application/json

0
Rel: root
Target:
Media Type: application/json

0
Rel: cite-as
Target: https://doi.org/10.3334/ORNLDAAC/1840

0
Rel: item
Target:
Media Type: application/json

0
Rel: item
Target: ./na-1981/na-1981.json
Media Type: application/json

0
Rel: item
Target: ./na-1982/na-1982.json
Media Type: application/json

0
Rel: item
Target: ./na-1983/na-1983.json
Media Type: application/json

0
Rel: item
Target: ./na-1984/na-1984.json
Media Type: application/json

0
Rel: item
Target: ./na-1985/na-1985.json
Media Type: application/json

0
Rel: item
Target: ./na-1986/na-1986.json
Media Type: application/json

0
Rel: item
Target: ./na-1987/na-1987.json
Media Type: application/json

0
Rel: item
Target: ./na-1988/na-1988.json
Media Type: application/json

0
Rel: item
Target: ./na-1989/na-1989.json
Media Type: application/json

0
Rel: item
Target: ./na-1990/na-1990.json
Media Type: application/json

0
Rel: item
Target: ./na-1991/na-1991.json
Media Type: application/json

0
Rel: item
Target: ./na-1992/na-1992.json
Media Type: application/json

0
Rel: item
Target: ./na-1993/na-1993.json
Media Type: application/json

0
Rel: item
Target: ./na-1994/na-1994.json
Media Type: application/json

0
Rel: item
Target: ./na-1995/na-1995.json
Media Type: application/json

0
Rel: item
Target: ./na-1996/na-1996.json
Media Type: application/json

0
Rel: item
Target: ./na-1997/na-1997.json
Media Type: application/json

0
Rel: item
Target: ./na-1998/na-1998.json
Media Type: application/json

0
Rel: item
Target: ./na-1999/na-1999.json
Media Type: application/json

0
Rel: item
Target: ./na-2000/na-2000.json
Media Type: application/json

0
Rel: item
Target: ./na-2001/na-2001.json
Media Type: application/json

0
Rel: item
Target: ./na-2002/na-2002.json
Media Type: application/json

0
Rel: item
Target: ./na-2003/na-2003.json
Media Type: application/json

0
Rel: item
Target: ./na-2004/na-2004.json
Media Type: application/json

0
Rel: item
Target: ./na-2005/na-2005.json
Media Type: application/json

0
Rel: item
Target: ./na-2006/na-2006.json
Media Type: application/json

0
Rel: item
Target: ./na-2007/na-2007.json
Media Type: application/json

0
Rel: item
Target: ./na-2008/na-2008.json
Media Type: application/json

0
Rel: item
Target: ./na-2009/na-2009.json
Media Type: application/json

0
Rel: item
Target: ./na-2010/na-2010.json
Media Type: application/json

0
Rel: item
Target: ./na-2011/na-2011.json
Media Type: application/json

0
Rel: item
Target: ./na-2012/na-2012.json
Media Type: application/json

0
Rel: item
Target: ./na-2013/na-2013.json
Media Type: application/json

0
Rel: item
Target: ./na-2014/na-2014.json
Media Type: application/json

0
Rel: item
Target: ./na-2015/na-2015.json
Media Type: application/json

0
Rel: item
Target: ./na-2016/na-2016.json
Media Type: application/json

0
Rel: item
Target: ./na-2017/na-2017.json
Media Type: application/json

0
Rel: item
Target: ./na-2018/na-2018.json
Media Type: application/json

0
Rel: item
Target: ./na-2019/na-2019.json
Media Type: application/json

0
Rel: item
Target: ./na-2020/na-2020.json
Media Type: application/json

0
Rel: item
Target: ./na-2021/na-2021.json
Media Type: application/json

0
Rel: self
Target: dcache://pnfs/grid.sara.nl/data/remotesensing/disk/daymet-daily-v4/region-na/collection.json
Media Type: application/json

0
Rel: parent
Target:
Media Type: application/json

0
Rel: self
Target: dcache://pnfs/grid.sara.nl/data/remotesensing/disk//daymet-daily-v4/catalog.json
Media Type: application/json

0
Rel: root
Target:
Media Type: application/json

0
Rel: child
Target:
Media Type: application/json

0
Rel: child
Target: ./region-pr/collection.json
Media Type: application/json

0
Rel: child
Target: ./region-hi/collection.json
Media Type: application/json

0
Rel: license
Target: https://science.nasa.gov/earth-science/earth-science-data/data-information-policy


In addition to providing links to the data, the catalog provides all the dataset's metadata, which we use e.g. to convert the bounding box from latitude/logitude degrees to the dataset's coordinate reference system (CRS):

In [7]:
# Extract information about input CRS from metadata
_item = next(catalog.get_all_items())
proj_json = _item.properties["proj:projjson"]
crs_lcc = pyproj.CRS.from_json_dict(proj_json)

# Set up CRS converter
transformer = pyproj.Transformer.from_crs(
    crs_from="EPSG:4326", 
    crs_to=crs_lcc,
    always_xy=True,
)

# Calculate bbox in the dataset's CRS
bbox = transformer.transform_bounds(*bbox_latlon)

### 2.5 Connect to the cluster

Once we are ready to run the calculation we setup a Dask cluster and create a client connection. This is most easily achieved via the Dask JupyterLab extension (look for the Dask logo on the left tab of the JupyterLab interface):  

In [8]:
from dask.distributed import Client

client = Client("tcp://10.0.2.109:44483")
client

0,1
Connection method: Direct,
Dashboard: /proxy/8787/status,

0,1
Comm: tcp://10.0.2.109:44483,Workers: 15
Dashboard: /proxy/8787/status,Total threads: 60
Started: 11 minutes ago,Total memory: 450.00 GiB

0,1
Comm: tcp://10.0.2.186:43033,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:44255,
Local directory: /tmp/dask-worker-space/worker-vmuimcpc,Local directory: /tmp/dask-worker-space/worker-vmuimcpc
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 95.46 MiB,Spilled bytes: 0 B
Read bytes: 1.47 kiB,Write bytes: 5.48 kiB

0,1
Comm: tcp://10.0.2.186:37559,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:40009,
Local directory: /tmp/dask-worker-space/worker-07o3yo_l,Local directory: /tmp/dask-worker-space/worker-07o3yo_l
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 93.48 MiB,Spilled bytes: 0 B
Read bytes: 149.71451361658444 B,Write bytes: 281.46328559917873 B

0,1
Comm: tcp://10.0.2.186:38991,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:43555,
Local directory: /tmp/dask-worker-space/worker-y8jvof1w,Local directory: /tmp/dask-worker-space/worker-y8jvof1w
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 94.97 MiB,Spilled bytes: 0 B
Read bytes: 0.94 kiB,Write bytes: 3.48 kiB

0,1
Comm: tcp://10.0.2.109:35257,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:36049,
Local directory: /tmp/dask-worker-space/worker-o0cwd8wt,Local directory: /tmp/dask-worker-space/worker-o0cwd8wt
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 94.57 MiB,Spilled bytes: 0 B
Read bytes: 19.04 kiB,Write bytes: 14.22 kiB

0,1
Comm: tcp://10.0.2.186:36649,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:35835,
Local directory: /tmp/dask-worker-space/worker-pm9n_oix,Local directory: /tmp/dask-worker-space/worker-pm9n_oix
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 98.74 MiB,Spilled bytes: 0 B
Read bytes: 0.94 kiB,Write bytes: 3.48 kiB

0,1
Comm: tcp://10.0.2.109:42649,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:40869,
Local directory: /tmp/dask-worker-space/worker-vjvd9mb3,Local directory: /tmp/dask-worker-space/worker-vjvd9mb3
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 93.02 MiB,Spilled bytes: 0 B
Read bytes: 10.31 kiB,Write bytes: 8.51 kiB

0,1
Comm: tcp://10.0.2.186:33129,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:40565,
Local directory: /tmp/dask-worker-space/worker-s1lrnvza,Local directory: /tmp/dask-worker-space/worker-s1lrnvza
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 94.52 MiB,Spilled bytes: 0 B
Read bytes: 150.24358185687703 B,Write bytes: 282.4579338909288 B

0,1
Comm: tcp://10.0.2.186:40269,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:34137,
Local directory: /tmp/dask-worker-space/worker-b0f8r2g6,Local directory: /tmp/dask-worker-space/worker-b0f8r2g6
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 93.55 MiB,Spilled bytes: 0 B
Read bytes: 1.99 kiB,Write bytes: 7.75 kiB

0,1
Comm: tcp://10.0.2.186:42329,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:43605,
Local directory: /tmp/dask-worker-space/worker-tssx6nvt,Local directory: /tmp/dask-worker-space/worker-tssx6nvt
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 99.26 MiB,Spilled bytes: 0 B
Read bytes: 1.20 kiB,Write bytes: 4.53 kiB

0,1
Comm: tcp://10.0.2.109:38507,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:36989,
Local directory: /tmp/dask-worker-space/worker-fc0p4t2b,Local directory: /tmp/dask-worker-space/worker-fc0p4t2b
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 90.04 MiB,Spilled bytes: 0 B
Read bytes: 5.29 kiB,Write bytes: 5.31 kiB

0,1
Comm: tcp://10.0.2.109:38407,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:33533,
Local directory: /tmp/dask-worker-space/worker-on3ynihi,Local directory: /tmp/dask-worker-space/worker-on3ynihi
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 0.0%,Last seen: Just now
Memory usage: 92.51 MiB,Spilled bytes: 0 B
Read bytes: 244.89 kiB,Write bytes: 48.42 kiB

0,1
Comm: tcp://10.0.2.109:45515,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:39421,
Local directory: /tmp/dask-worker-space/worker-lwu6it8r,Local directory: /tmp/dask-worker-space/worker-lwu6it8r
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 94.36 MiB,Spilled bytes: 0 B
Read bytes: 6.56 kiB,Write bytes: 6.43 kiB

0,1
Comm: tcp://10.0.2.109:45389,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:44063,
Local directory: /tmp/dask-worker-space/worker-rheu7u6k,Local directory: /tmp/dask-worker-space/worker-rheu7u6k
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 91.49 MiB,Spilled bytes: 0 B
Read bytes: 2.81 kiB,Write bytes: 2.83 kiB

0,1
Comm: tcp://10.0.2.186:46425,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.186:42727,
Local directory: /tmp/dask-worker-space/worker-tz1pcgq6,Local directory: /tmp/dask-worker-space/worker-tz1pcgq6
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 92.98 MiB,Spilled bytes: 0 B
Read bytes: 150.184763283382 B,Write bytes: 282.3473549727582 B

0,1
Comm: tcp://10.0.2.109:35059,Total threads: 4
Dashboard: /proxy/8787/status,Memory: 30.00 GiB
Nanny: tcp://10.0.2.109:40659,
Local directory: /tmp/dask-worker-space/worker-o7atsgap,Local directory: /tmp/dask-worker-space/worker-o7atsgap
Tasks executing: 0,Tasks in memory: 0
Tasks ready: 0,Tasks in flight: 0
CPU usage: 2.0%,Last seen: Just now
Memory usage: 93.71 MiB,Spilled bytes: 0 B
Read bytes: 4.06 kiB,Write bytes: 4.07 kiB


Here we have created a cluster with 15 nodes, and we wait for them to be available:

In [9]:
client.wait_for_workers(n_workers=15)

### 2.6 Run the model

Once the Dask cluster is reachable, we can start the computation! We define few convenience functions to open the dataset using the Xarray library, preprocess the input variables and save the output products to the storage. Note that by setting the size of the data "chunks" when reading the data, we choose to use Dask arrays as underlying data structure. All calls to Xarray's objects are then lazily executed until data are redirected to disk, which triggers the calculation of the spring index for a given year.

In [10]:
def open_dataset(urlpaths, **kwargs):
    """
    Open the remote files as a single dataset. 
    """
    
    ofs = fsspec.open_files(urlpaths, block_size=4*2**20)
    return xr.open_mfdataset(
        [of.open() for of in ofs],
        engine="h5netcdf", 
        decode_coords="all",
        drop_variables=("lat", "lon"),
        **kwargs
    )


def preprocess_dataset(ds, startdate, enddate, bbox):
    """
    Subset the input dataset and make necessary conversions.
    """
    
    # Select time range for GDH calculation
    ds = ds.isel(time=slice(startdate-1, enddate))
    
    # Spatial selection
    ds = ds.rio.clip_box(*bbox)
    
    # Convert temperatures to Fahrenheit
    tmax = ds["tmax"] * 1.8 + 32
    tmin = ds["tmin"] * 1.8 + 32

    # Convert daylength from seconds to hours
    dayl = ds["dayl"] / 3600

    return tmax, tmin, dayl


def save_to_urlpath(first_leaf, first_bloom, urlpath, group):
    """
    Save output to urlpath in Zarr format. 
    """
    
    fs_map = fsspec.get_mapper(urlpath)
    ds = xr.Dataset({
        f"first-leaf": first_leaf, 
        f"first-bloom": first_bloom,
    })
    ds.to_zarr(fs_map, group=group)

In [11]:
for year in years:
    
    print(f"Running year {year} ...")
    
    # Extract urlpaths to Daymet files
    item = catalog.get_item(f"na-{year}", recursive=True)
    hrefs = [
        item.assets[var].get_absolute_href() 
        for var in ("tmin", "tmax", "dayl")
    ]
    
    # Open files as a single dataset, using chunked Dask arrays
    ds = open_dataset(hrefs, chunks={"time": 5, "x": 1000, "y": 1000})
    
    # Extract temporal/spatial ranges, preprocess variables
    tmax, tmin, dayl = preprocess_dataset(ds, startdate, enddate, bbox)
    
    # Calculate GDH and rechunk to have single chunk along time axis
    gdh = calculate_gdh(dayl, tmin, tmax)
    gdh = gdh.chunk({"time": enddate-startdate+1, "x": 500, "y": 500})
    
    # Fist leaf index
    dde2, dd57, mds0, synop = calculate_leaf_predictors(gdh)
    first_leaf = calculate_first_leaf(dde2, dd57, mds0, synop)
    
    # First bloom index
    mds0, agdh = calculate_bloom_predictors(gdh, first_leaf)
    first_bloom = calculate_first_bloom(mds0, agdh)
    
    # Calculate means by averaging over plants
    first_leaf = add_mean_plant_layer(first_leaf)
    first_bloom = add_mean_plant_layer(first_bloom)
    
    # Rechunk and save to storage
    save_to_urlpath(
        first_leaf.chunk({"plant": 1, "x": 1000, "y": 1000}),
        first_bloom.chunk({"plant": 1, "x": 1000, "y": 1000}),
        output_urlpath, 
        f"{year}",
    )

Running year 1980 ...




Running year 1981 ...




Running year 1982 ...




Running year 1983 ...




Running year 1984 ...




Running year 1985 ...




Running year 1986 ...




Running year 1987 ...




Running year 1988 ...




Running year 1989 ...




Running year 1990 ...




Running year 1991 ...




Running year 1992 ...




Running year 1993 ...




Running year 1994 ...




Running year 1995 ...




Running year 1996 ...




Running year 1997 ...




Running year 1998 ...




Running year 1999 ...




Running year 2000 ...




Running year 2001 ...




Running year 2002 ...




Running year 2003 ...




Running year 2004 ...




Running year 2005 ...




Running year 2006 ...




Running year 2007 ...




Running year 2008 ...




Running year 2009 ...




Running year 2010 ...




Running year 2011 ...




Running year 2012 ...




Running year 2013 ...




Running year 2014 ...




Running year 2015 ...




Running year 2016 ...




Running year 2017 ...




Running year 2018 ...




Running year 2019 ...




Running year 2020 ...




Running year 2021 ...




When done, we shutdown the cluster to release resources:

In [12]:
client.shutdown()

The calculation of the spring indices over the full set of years take ~5 hours on 15 nodes (60 cores). 