Skip to content
Marcos Longo edited this page Feb 26, 2021 · 8 revisions

Synopsis

ED2 output is formatted in Hierarchical Data Format 5 (HDF5). For more information on this data format please see the main HDF5 web page. The HDF5 format was chosen because it offers compression options, api's for parallel I/O, the ability to embed metadata/descriptions to the file, and self description regarding byte sizes and orders. The current implementation of ED2 does not use compression for output files because compression both slows down file writes and complicates the parallel writing of hyper-slabs. This may change in the future.

Output files

File name convention

All output files in ED use the following convention: prefix-X-YYYY-MM-DD-HHNNSS-gGG.h5:

  • prefix: A unique identifier set by the user through variables NL%FFILOUT and NL%SFILOUT in ED2IN. The prefix may contain a full leading path.
  • X: A unique identifier for the type of output file (see table).
  • YYYY: Year.
  • MM: Month.
  • DD: Day. For output files generated at monthly or longer scales, this will be 00.
  • HHNNSS: Hour (UTC). For output files generated at daily or longer scales, this will be 000000.
  • gGG: The grid or polygon of interest. For single-polygon simulations, this will be always g01.

File types

ED2 can generate multiple types of output files, which cover different time scales. These files are described as the following:

File type File identifier
(-X- tag)
ED2IN flag
controlling
output
Example of file name Time span
State (history) -S- NL%ISOUTPUT state-S-2021-02-09-120000-g01.h5 Instantaneous
Sub-daily -I- NL%IFOUTPUT or
NL%IOOUTPUT
analy-I-2021-02-09-120000-g01.h5 If NL%FRQFAST=3600, average from 2021-02-09 11 UTC to 2021-02-09 12 UTC
Daily -D- NL%IDOUTPUT analy-D-2021-02-09-000000-g01.h5 Average from 2021-02-09 00 UTC to 2021-02-09 23:59:59 UTC
Monthly -E- NL%IMOUTPUT analy-E-2021-02-00-000000-g01.h5 Average from 2021-02-01 00 UTC to 2021-02-28 23:59:59 UTC
Monthly &
Mean diel
-Q- NL%IQOUTPUT analy-Q-2021-02-00-000000-g01.h5 Average from 2021-02-01 00 UTC to 2021-02-28 23:59:59 UTC
Yearly -Y- NL%IYOUTPUT analy-Y-2021-00-00-000000-g01.h5 Average from 2021-01-01 00 UTC to 2021-12-31 23:59:59 UTC
Tower -T- NL%ITOUTPUT analy-T-2021-02-00-000000-g01.h5 All sub-daily averages from 2021-02-01 00 UTC to 2021-02-28 23:59:59 UTC

Additional remarks on output files

  • Do not use state (history) files for research. These files exist to allow resuming an interrupted simulation with binary reproducibility. Many variables are either instantaneous or their contents are only meaningful to the model itself, and names may often be misleading.

  • Sub-daily files can quickly become numerous. Consider using alternative approaches to reduce unnecessary output files. For example, use either NL%IOOUTPUT to subset only the times of interest (e.g., when comparing ED2 results with satellite imagery), or try using the NL%ITOUTPUT settings to generate fewer files with multiple times.

  • There are trade-offs between detail and file size. In general, use the coarser averages for longer simulations, otherwise the volume of information can become unmanageable. For example, sub-daily averages are typically used in short simulations (~ 5-10 years), but probably too much information if your goal is to look at the forest dynamics over the past 500 years.

  • Trade-offs apply to time and level of detail. Polygon averages are always written to their respective output files, and some variables provide a summary by plant functional type and DBH size class. Site-, patch-, and cohort-level variables may be useful for short-term simulations (e.g., comparing with forest inventory data), but likely excessive for long-term simulations. It is possible to control whether or not to output site, patch and cohort averages for each output file, by adjusting variables NL%IADD_SITE_MEANS, NL%IADD_PATCH_MEANS and NL%IADD_COHORT_MEANS in ED2IN.

Variable names

Most averaged files used for analysis use the following structure: XMEAN_VARIABLE-DESCRIPTION_ZZ, where the prefix XMEAN corresponds to the time-scale for the averages, the middle part has a unique descriptive name, and the suffix ZZ indicates the level of aggregation (polygon-, site-, patch-, and cohort-level averages). The tables below describe prefixes and suffixes:

Prefix Description ED2IN flag controlling output
FMEAN Sub-daily (e.g., hourly) averages NL%IFOUTPUT
DMEAN Daily averages NL%IDOUTPUT
MMEAN Monthly averages NL%IMOUTPUT or NL%IQOUTPUT
MMSQU Monthly averages of squares NL%IMOUTPUT or NL%IQOUTPUT
QMEAN Monthly averages by time of day NL%IQOUTPUT
QMSQU Monthly averages of squares by time of day NL%IQOUTPUT
Suffix Description ED2IN flag controlling output
PY Polygon-level averages None. Always present in the output
SI Site-level averages NL%IADD_SITE_MEANS
PA Patch-level averages NL%IADD_PATCH_MEANS
CO Cohort-level averages NL%IADD_COHORT_MEANS

A few noteworthy points:

  • HDF5 allows attaching metadata to variables. The metadata contain the description, units, and dimensions of variables, and most variables in ED2 have this information. To include the information, set NL%ATTACH_METADATA to 1. If you identify variables that do not show meaningful information and you know what the variable means, we encourage you to edit the source code (file ed_state_vars.f90) and submit a pull request.
  • There are variables that do not follow the standard above and yet may be relevant for analyses of results. Most of them are not averages, but constant values for the relevant time steps (for example, heartwood biomass does not change over the course of a month). We are still working towards standardising the suffixes for such variables to help identifying their dimensions. In the meantime, a full list of variables can be found at here.
  • Why does ED2 save the mean of squares?. These variables are useful to compute the standard deviation. We don't store the standard deviation directly, because it would prevent users to combine multiple months (for example, if one runs the model for 10 years and wants the average and standard deviation of all simulated Januaries. You can aggregate the mean and the mean of squares of multiple months by simply taking the average (or using a weighted average where weights are the number of days in a month). Once you have the mean (〈X〉) and mean of squares (〈X2〉) aggregated based on n observations, the standard deviation can be found as:

Structure of the stored variables in the HDF5 files.

ED2 has a unique spatial hierarchy for its variables. Grids contain polygons, polygons contain sites, sites contain patches and patches contain cohorts. In the HDF5 output files, every single variable is written as a vector. When the model writes these variables to a file, it will append all the cohort data in the entire model state into a single, continuous block of vector data. This single vector contains data that originally came from numerous other vectors with different pointers. So given a single vector of something like cohort variables, how do we determine which patch, site, polygon and grid each cohort is a member of?

Here is a hypothetical example. Let's assume that we are looking into an output of a single-grid, single-polygon simulation. At that point, this simulation had two sites (henceforth sites A and B). Site A had 3 patches (henceforth patches A1, A2, and A3), and site B has 2 patches (henceforth patches B1 and B2). Patch A1 had 7 cohorts, patch A2 had 5 cohorts, patch A3 is bare-ground (i.e., no cohorts), patch B1 has 4 cohorts, and patch B2 has 2 cohorts:

  • Simple polygon-level variables (e.g., MMEAN_SENSIBLE_AC_PY) will have length 1, as this is a single-polygon simulation.
  • Simple site-level variables (e.g., MMEAN_PCPG_SI) will have length 2.
  • Simple patch-level variables (e.g., MMEAN_RH_PA) will have length 5 (3+2).
  • Simple cohort-level variables (e.g., MMEAN_LAI_CO) will have length 18 (7+5+0+4+2).
  • In case of multi-dimensional variables the last dimension will be associated with the hierarchical level. For example, in a simulation with 12 soil layers, variable MMEAN_SOIL_WATER_PA will have dimensions (12,5). Note: Depending on the software used for reading HDF5 and post-processing, the dimensions may be swapped by the software. It is worth checking the order of dimensions before using new softwares.

The figure below illustrates how variables would be stored in the example above:

To find which polygon, site and patch that any given cohort in the HDF5 output dataset is associated with, every output file in ED2 comes with a set of variables that can be used to identify to which patch, site, and cohort each vector element corresponds.

Indexing variable Description
PYSI_ID Index of the first site of each polygon
PYSI_N Number of sites for each polygon
SIPA_ID Index of the first patch of each site
SIPA_N Number of patches for each site
PACO_ID Index of the first cohort of each patch
PACO_N Number of cohorts for each patch

Changing the default contents of output files

There are instances in which the user may need to add or remove variables from the typical output. Examples of such cases include:

  • Reducing the size of output files by retaining only variables that will be used.
  • Adding new variables to one of the output files, because the default settings do not include a variable deemed necessary.

In these cases, you may consider changing the default contents of output files. In case you are not familiar with editing the ED2 code, we recommend opening an issue first, to make sure the changes are really needed.