Skip to content

Latest commit

 

History

History
906 lines (688 loc) · 43.8 KB

BES_Modules_The_HDF5_Handler.adoc

File metadata and controls

906 lines (688 loc) · 43.8 KB

The HDF5 Handler

1. Introduction

HDF5 handler was originally implemented to map the HDF5 to DAP by following the HDF5 data model and DAP2 protocol in 2001. In the course of the time, there was a strong interest from NASA GES DISC and other NASA Earth data centers to use the visualization clients that follow the CF conventions to access the HDF5 data via Hyrax. Funded by the NASA ACCESS program, in 2007 The HDF Group and the Hyrax team worked together to map HDF5 to DAP2 by following the CF conventions. This enables CF-friendly visualization clients to seamlessly access HDF5 data via Hyrax. This "CF behavior" of the handler has been so widely used, Hyrax source and RPM distributions have provided this "CF behavior" of the Hyrax responses since around 2008. By changing the BES key value in the configuration file hyrax service customers can still change the behavior back to the "basic behavior" implemented in 1999.

Since the time when the "CF behavior" was first added to the handler, the handler’s option to generate the "CF behavior" Hyrax responses has been called the CF option. The original way to generate the Hyrax responses has been called the default option because it provides the general mapping from HDF5 to DAP. In this document, we just follow these two historical terms to distinguish between the "CF behavior" and the non-CF "basic behavior".

In the course of time, DAP4 came out and the DAP4 support has been added to the CF option of the handler. Many NASA HDF5, HDF-EOS5 and netCDF-4 products have also been generated. These new products prompt the continuous improvement and enhancement of the CF option so that the CF-friendly visualization clients, such as Panoply, can visualize these files via Hyrax seamlessly. On the other hand, the DAP4 support has also been added to the default option. Therefore, four different DAP outputs can be generated via the HDF5 handler.

Section Highlights gives the highlights of these options. The following lists the section that provides detailed information of the four options that generate DAP outputs.

Readers need to be aware that CF conventions continue evolving and the HDF5 handler doesn’t keep updating to make it follow the latest CF conventions. For example, since version 1.8, the CF conventions adds the group component into the conventions. But the CF Option for DAP4 and CF Option for DAP2 don’t support the group hierarchy. In the course of time, as funding permits, the HDF5 handler may be updated to support newer components in the CF conventions. Currently the HDF5 handler tries to follow the CF conventions version 1.7 to enable the CF-friendly visualization clients access NASA HDF5 files seamlessly. Hereafter in this document, the CF option means the HDF5 handler tries to follow the CF conventions version 1.7 to map HDF5 to DAP2 or DAP4.

Important
CF option in this document means to follow CF conventions version 1.7.

The HDF5 handler uses the BES keys for the hyrax data service customers to obtain the customized results and to achieve better performance. Section BES Keys provides the information for the most useful BES keys. Especially the Default BES Key Values used in the Hyrax source or RPM distributions are listed. Section Limitations lists the limitations of the handler at the current release. The Miscellaneous Information is provided at last.

2. Highlights

2.1. CF for DAP4

By definition, the CF option means the handler will follow the CF conventions to translate HDF5 to DAP. The CF option is set in the Hyrax source and RPM distribution since most NASA data centers uses the CF option. One can find that the default value of the BES key H5.EnableCF is set to true from section Default BES Key Values.

Key features for the CF option:

  • Following the CF naming conventions, only alphanumeric characters and underscore (“_”) are allowed for a variable or attribute names. For any character not allowed by CF name conventions, change that character to underscore (“_”).

  • There is no group hierarchy. HDF5 groups will be flattened. In general, a variable name for any non-HDF-EOS5 file should have its group path prefixed before it. The first “/” of the final name will be stripped off. For the HDF-EOS5 variable name rule, check section CF Name for DAP4.

    An example:
    HDF5 variable name velocity.u under group geo-location
    becomes the geo_location_velocity_u in the DAP output.
  • The handler follows the CF conventions to translate the dimensions and coordinate variables for HDF-EOS5, netCDF-4 and some NASA HDF5 products.

  • HDF5 integer, floating-point and string datatypes are one-to-one mapped to DAP4. Other datatypes are elided.

  • DAP4 coverage is supported.

A DMR example can be found in section CF DMR Example for DAP4.

2.2. CF for DAP2

The name conventions and dimension/coordinate handling are the same as the DAP4 implementation. CF-friendly visualization clients such as Panoply can visualize the HDF5 data via DAP2 successfully. Screenshots of NASA HDF5 example files via Hyrax can be found at https://hdfeos.org/zoo/hdf5_handler/index.php .

However, due to the DAP2 limitation, HDF5 64-bit integer variables and attributes are elided. Signed 8-bit integer is mapped to 16-bit integer since DAP2 doesn’t support signed 8-bit integer. The handler doesn’t support DAP2 Grid. Instead, it follows the netCDF data model to use the shared dimensions for variables.

DDS and DAS examples can be found in section CF DDS and DAS Examples for DAP2.

2.3. Default for DAP4

To use this option, H5.EnableCF must be set to false in h5.conf. One should notice that Hyrax provides a way to customize the configuration with site.conf. For more information about site.conf, check the document site.conf of the hyrax user’s guide.

Important
To obtain DAP4 output from the default option: H5.EnableCF must be set to false in h5.conf or in site.conf.

This option tries to map HDF5 to DAP4 in a general way. Unlike the CF option, it is not tuned to support the NASA data products. Instead of flattening the group hierarchy, the HDF5’s group hierarchy are kept by mapping HDF5 groups to DAP4 groups.

Moreover, when another BES key H5.DefaultHandleDimension is also set to true or is not present in the configuration file, the HDF5 handler seamlessly translates the dimension names of netCDF-4 or netCDF-4-like files to DAP4 although the HDF5 data model does not support netCDF-4 shared dimensions. If the original netCDF-4 or netCDF-4-like files are generated to follow the CF conventions, the DAP4 output should also follow the CF as well as keeping the HDF5’s group hierarchy.

In addition to mapping integer, string and floating-point data to DAP4, the HDF5 compound datatype, object references and regional references are also mapped to DAP4. A DMR example can be found in section Default Option: DMR Example.

2.4. Default for DAP2

To use this option, H5.EnableCF must be set to false in h5.conf. The BES key H5.DefaultHandleDimension has no effect for this option.

Important
To obtain DAP2 output from the default option: H5.EnableCF must be set to false in h5.conf or in site.conf.

HDF5 signed 8-bit integer maps to signed 16-bit integer. 64-bit integer mapping is elided.

The HDF5 group hierarchy information is kept in a special DAS container HDF_ROOT_GROUP. The full path of an HDF5 variable is kept as an attribute. DDS and DAS Examples can be found in section Default Option: DDS and DAS Examples.

3. CF Option for DAP4

3.1. CF Name for DAP4

Other than the general name conventions described in section CF Option for DAP4, variable names of an HDF-EOS5 multi-grid/multi-swath/multi-zonal-average file have the corresponding grid/swath/zonal-average names prefixed before the field names. Variable names of an HDF-EOS5 single grid/swath/zonal-average just use the corresponding field names. The grid/swath/zonal-average names are ignored.

The original name and the full path of an HDF5 variable are preserved as DAP4 attributes. A BES key can be used to turn on/off these attributes. See section BES Keys for more information. Furthermore, For the HDF-EOS5 products, the original dimension names associated with the variable are also preserved by a DAP4 attribute. This is because the HDF-EOS5 provides the dimension names and those dimension names may be changed in DAP4 output in order to follow the CF conventions.

Although it rarely happens in NASA HDF5 products, by following the CF name conventions, it is possible that two or more DAP4 variables mapped from HDF5 may share the same name and this will cause an error. To avoid this issue, the handler implements a feature to avoid this kind of name clashing. A suffix like “_1” is added to the duplicated variable name. Since this rarely happens and keeping track of the name status may be expensive, a BES key is used for Hyrax service customers to turn on/off this feature.

3.2. CF Datatypes for DAP4

The following table lists the mapping from HDF5 to DAP4 for the CF option.

  1. HDF5 Datatype to DAP4 for CF Option

HDF5 data type

DAP4 data name

Notes

8-bit unsigned integer

Byte

8-bit signed integer

Int8

16-bit unsigned integer

UInt16

16-bit signed integer

Int16

32-bit unsigned integer

UInt32

32-bit signed integer

Int32

64-bit unsigned integer

UInt64

64-bit signed integer

Int64

32-bit floating point

Float32

64-bit floating point

Float64

String

String

Other datatypes

Not supported

The handler elides the mapping of the following datatypes: HDF5 compound, object and region references, variable length(excluding variable length string), enum,opaque, bitfield and time.

3.3. CF BES Keys for DAP4

The following two BES keys should be set to true to carry out the mapping of HDF5 to DAP4. In the current release, the handler is set to run these keys as true even if these two keys are not present in the configuration file. For detailed description of these two keys, check section Keys for Both CF and Default Options and section Keys for CF Option.

H5.EnableCF=true
H5.EnableCFDMR=true

The following BES keys are also important either for performance or for correctly representing the coordinate variables. Hyrax service customers should carefully check the descriptions of these key values before changing them. The detailed description can be found at section Keys for Both CF and Default Options and Keys for CF Option. As software improves, some settings may get changed. So hyrax service customers are encouraged to frequently check the latest README and comments at the HDF5 handler configuration file h5.conf.in at github.

H5.EnableDropLongString=true
H5.EnableAddPathAttrs=true
H5.ForceFlattenNDCoorAttr=true
H5.EnableCoorattrAddPath=true
H5.MetaDataMemCacheEntries=1000
H5.EnableEOSGeoCacheFile=false

More BES keys and their descriptions can also be found at section Keys for CF Option.

3.4. CF DMR Example for DAP4

An h5ls header of an HDF-EOS5 grid file grid_1_2d.h5 is as follows:

/                        Group
/HDFEOS                  Group
/HDFEOS/ADDITIONAL       Group
/HDFEOS/ADDITIONAL/FILE_ATTRIBUTES Group
/HDFEOS/GRIDS            Group
/HDFEOS/GRIDS/GeoGrid    Group
/HDFEOS/GRIDS/GeoGrid/Data\ Fields   Group
/HDFEOS/GRIDS/GeoGrid/Data\ Fields/temperature Dataset {4, 8}
    Attribute: units scalar
        Type:      1-byte null-terminated ASCII string
        Data:  "K"
/HDFEOS\ INFORMATION     Group
    Attribute: HDFEOSVersion scalar
        Type:      32-byte null-terminated ASCII string
        Data:  "HDFEOS_5.1.13"
/HDFEOS\ INFORMATION/StructMetadata.0 Dataset {SCALAR}

The corresponding DMR is:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Dataset xmlns="http://xml.opendap.org/ns/DAP/4.0#" dapVersion="4.0" dmrVersion="1.0" name="grid_1_2d.h5">
    <Dimension name="lon" size="8"/>
    <Dimension name="lat" size="4"/>
    <Float32 name="lon">
        <Dim name="/lon"/>
        <Attribute name="units" type="String">
            <Value>degrees_east</Value>
        </Attribute>
    </Float32>
    <Float32 name="lat">
        <Dim name="/lat"/>
        <Attribute name="units" type="String">
            <Value>degrees_north</Value>
        </Attribute>
    </Float32>
    <Float32 name="temperature">
        <Dim name="/lat"/>
        <Dim name="/lon"/>
        <Attribute name="units" type="String">
            <Value>K</Value>
        </Attribute>
        <Attribute name="origname" type="String">
            <Value>temperature</Value>
        </Attribute>
        <Attribute name="fullnamepath" type="String">
            <Value>/HDFEOS/GRIDS/GeoGrid/Data Fields/temperature</Value>
        </Attribute>
        <Attribute name="orig_dimname_list" type="String">
            <Value>YDim XDim</Value>
        </Attribute>
        <Map name="/lat"/>
        <Map name="/lon"/>
    </Float32>
    <String name="StructMetadata_0">
        <Attribute name="origname" type="String">
            <Value>StructMetadata.0</Value>
        </Attribute>
        <Attribute name="fullnamepath" type="String">
            <Value>/HDFEOS INFORMATION/StructMetadata.0</Value>
        </Attribute>
    </String>
    <Attribute name="HDFEOS" type="Container"/>
    <Attribute name="HDFEOS_ADDITIONAL" type="Container"/>
    <Attribute name="HDFEOS_ADDITIONAL_FILE_ATTRIBUTES" type="Container"/>
    <Attribute name="HDFEOS_GRIDS" type="Container"/>
    <Attribute name="HDFEOS_GRIDS_GeoGrid" type="Container"/>
    <Attribute name="HDFEOS_GRIDS_GeoGrid_Data_Fields" type="Container"/>
    <Attribute name="HDFEOS_INFORMATION" type="Container">
        <Attribute name="HDFEOSVersion" type="String">
            <Value>HDFEOS_5.1.13</Value>
        </Attribute>
        <Attribute name="fullnamepath" type="String">
            <Value>/HDFEOS INFORMATION</Value>
        </Attribute>
    </Attribute>
</Dataset>

Note: The CF option retrieves the values of the coordinate variables and adds them to DAP4 as variable lat and variable lon. The variable name StructMetadata.0 becomes the StructMetadata_0. The group hierarchy is flattened. Since this is a single HDF-EOS5 grid, only the original variable name is kept. Also one can find

<Map name="/lat"/>
<Map name="/lon"/>

under the variable temperature. This represents the DAP4 coverage. The original full path of variable temperature can be found from the attribute fullnamepath of the variable temperature as

<Attribute name="fullnamepath" type="String">
    <Value>/HDFEOS/GRIDS/GeoGrid/Data Fields/temperature</Value>
</Attribute>

HDF5 group information maps to attribute containers such as:

<Attribute name="HDFEOS" type="Container"/>

4. CF Option for DAP2

4.1. CF Name for DAP2

The same as the CF option for DAP4. See section CF Name for DAP4.

4.2. CF Datatype for DAP2

The following table lists the mapping from HDF5 to DAP2 for the CF option.

  1. HDF5 Datatype to DAP2 for CF Option

HDF5 data type

DAP2 data name

Notes

8-bit unsigned integer

Byte

8-bit signed integer

Int16

DAP2 does not have 8-bit signed integer type, so HDF5 8-bit signed integer maps to DAP2 16-bit signed integer.

16-bit unsigned integer

UInt16

16-bit signed integer

Int16

32-bit unsigned integer

UInt32

32-bit signed integer

Int32

64-bit unsigned integer

Not Supported

DAP2 does not support 64-bit unsigned integer type.

64-bit signed integer

Not Supported

DAP2 does not support 64-bit signed integer type.

32-bit floating point

Float32

64-bit floating point

Float64

String

String

Other datatypes

N/A

The handler elides the mapping of the following datatypes: HDF5 compound, variable length(excluding variable length string), object and region reference, enum,opaque, bitfield and time.

4.3. CF BES Keys for DAP2

Except that BES Key H5.EnableCFDMR does not have effect on the DAP2 mapping, the other BES key information is the same as the information described in section CF BES Keys for DAP4.

4.4. CF DDS and DAS Examples for DAP2

The layout of the HDF5 file is the same as the layout described in section CF DMR Example for DAP4.

The DDS is:

Dataset {
    Float32 temperature[lat = 4][lon = 8];
    String StructMetadata_0;
    Float32 lon[lon = 8];
    Float32 lat[lat = 4];
} grid_1_2d.h5;

The DAS is:

Attributes {
    HDFEOS {
    }
    HDFEOS_ADDITIONAL {
    }
    HDFEOS_ADDITIONAL_FILE_ATTRIBUTES {
    }
    HDFEOS_GRIDS {
    }
    HDFEOS_GRIDS_GeoGrid {
    }
    HDFEOS_GRIDS_GeoGrid_Data_Fields {
    }
    HDFEOS_INFORMATION {
        String HDFEOSVersion "HDFEOS_5.1.13";
        String fullnamepath "/HDFEOS INFORMATION";
    }
    temperature {
        String units "K";
        String origname "temperature";
        String fullnamepath "/HDFEOS/GRIDS/GeoGrid/Data Fields/temperature";
        String orig_dimname_list "YDim XDim";
    }
    StructMetadata_0 {
        String origname "StructMetadata.0";
        String fullnamepath "/HDFEOS INFORMATION/StructMetadata.0";
    }
    lon {
        String units "degrees_east";
    }
    lat {
        String units "degrees_north";
    }
}

The DDS and DAS shown in this example are equivalent to the DMR output in section CF DMR Example for DAP4 except that the DMR includes the DAP4 coverage information. However, if there are signed 8-bit integer or 64-bit integer variables in the HDF5 file, DAP4 DMR will show the exact datatype while DAP2 maps the signed 8-bit integer to 16-bit integer and elides the mapping of 64-bit integers.

5. Default Option for DAP4

5.1. Default Option: DAP4 Name

A number of non-alphanumeric characters (e.g., space, #, +, -) used in HDF5 object names are not allowed in the names of DAP objects, object components or in URLs. Libdap escapes these characters by replacing them with "%" followed by the hexadecimal value of their ASCII code. For example, "Raster Image #1" becomes "Raster%20Image%20%231". These translations should be transparent to users of the server (but they will be visible in the DMR and in any applications which use a client that does not translate the identifiers back to their original form).

5.2. Default Option: DAP4 Datatype

The following table lists the mapping from HDF5 to DAP4 for the default option.

  1. HDF5 Datatype to DAP4 for Default Option

HDF5 data type

DAP4 data name

Notes

8-bit unsigned integer

Byte

8-bit signed integer

Int8

16-bit unsigned integer

UInt16

16-bit signed integer

Int16

32-bit unsigned integer

UInt32

32-bit signed integer

Int32

64-bit unsigned integer

Int64

64-bit signed integer

UInt64

32-bit floating point

Float32

64-bit floating point

Float64

String

String

Object/region reference

URL

Compound

Structure

HDF5 compound variable can be mapped to DAP4 under the condition that the base members (excluding object/region references) of compound can be mapped to DAP4.

Other datatypes

Not Supported

The handler elides the mapping of the following datatypes: HDF5 variable length(excluding variable length string), enum,opaque, bitfield and time.

5.3. Default Option: DAP4 BES Keys

The H5.EnableCF key must be set to false to obtain the DAP4 output for the default option and to keep the netCDF-4-like dimensions by following the netCDF data model.

H5.EnableCF=false

5.4. Default Option: DMR Example

A ncdump header of a netCDF-4 file nc4_group_atomic.h5 :

netcdf nc4_group_atomic {
dimensions:
	dim1 = 2 ;
variables:
	int dim1(dim1) ;
	float d1(dim1) ;

group: g1 {
  dimensions:
  	dim2 = 3 ;
  variables:
  	int dim2(dim2) ;
  	float d2(dim1, dim2) ;
  } // group g1
}

The corresponding DMR:

<?xml version="1.0" encoding="ISO-8859-1"?>
<Dataset xmlns="http://xml.opendap.org/ns/DAP/4.0#" dapVersion="4.0" dmrVersion="1.0" name="nc4_group_atomic.h5">
    <Dimension name="dim1" size="2"/>
    <Int32 name="dim1">
        <Dim name="/dim1"/>
    </Int32>
    <Float32 name="d1">
        <Dim name="/dim1"/>
    </Float32>
    <Group name="g1">
        <Dimension name="dim2" size="3"/>
        <Int32 name="dim2">
            <Dim name="/g1/dim2"/>
        </Int32>
        <Float32 name="d2">
            <Dim name="/dim1"/>
            <Dim name="/g1/dim2"/>
        </Float32>
    </Group>
</Dataset>

Note: Both the dimension names and the dimension sizes in the original netCDF-4 files are kept as well as the group hierarchy.

6. Default Option for DAP2

6.1. Default Option: DAP2 Name

Same as section Default Option: DAP4 Name.

6.2. Default Option: DAP2 Datatype

  1. HDF5 Datatype to DAP2 for Default Option

HDF5 data type

DAP2 data name

Notes

8-bit unsigned integer

Byte

8-bit signed integer

Int16

DAP2 does not have 8-bit signed integer type, so it maps to 16-bit signed integer.

16-bit unsigned integer

UInt16

16-bit signed integer

Int16

32-bit unsigned integer

UInt32

32-bit signed integer

Int32

64-bit unsigned integer

Not Supported

DAP2 does not support 64-bit unsigned integer type.

64-bit signed integer

Not Supported

DAP2 does not support 64-bit signed integer type.

32-bit floating point

Float32

64-bit floating point

Float64

String

String

Object/region reference

URL

Compound

Structure

HDF5 compound variable can be mapped to DAP2 under the condition that the base members (excluding object/region references) of compound can be mapped to DAP2.

Other datatypes

Not Supported

The handler elides the mapping of the following datatypes: HDF5 variable length(excluding variable length string), enum,opaque, bitfield and time.

6.3. Default Option: DAP2 BES Keys

The H5.EnableCF key value must be set to false to obtain the DAP2 output for the default option. Note netCDF-4-like dimensions will NOT be handled according to the netCDF data model.

H5.EnableCF=false

6.4. Default Option: DDS and DAS Examples

The h5ls header of the HDF5 file d_group.h5 :

/                        Group
/a                       Group
/a/b                     Group
/a/b/c                   Group

Since this file does not have variables so the DDS is empty. The corresponding DAS is:

Attributes {
    HDF5_ROOT_GROUP {
        a {
            b {
                c {
                }
            }
        }
    }
    /a/ {
        String HDF5_OBJ_FULLPATH "/a/";
    }
    /a/b/ {
        String HDF5_OBJ_FULLPATH "/a/b/";
    }
    /a/b/c/ {
        String HDF5_OBJ_FULLPATH "/a/b/c/";
    }
}

The attribute container HDF5_ROOT_GROUP preserves the information of the group hierarchy.

Another example show an HDF5 dataset with HDF5 compound datatype. The h5dump header of the HDF5 file d_compound.h5 is:

HDF5 "d_compound.h5" {
GROUP "/" {
   DATASET "compound" {
      DATATYPE  H5T_COMPOUND {
         H5T_STD_I32BE "Serial number";
         H5T_STRING {
            STRSIZE H5T_VARIABLE;
            STRPAD H5T_STR_NULLTERM;
            CSET H5T_CSET_ASCII;
            CTYPE H5T_C_S1;
         } "Location";
         H5T_IEEE_F64BE "Temperature (F)";
         H5T_IEEE_F64BE "Pressure (inHg)";
      }
      DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
      ATTRIBUTE "value" {
         DATATYPE  H5T_COMPOUND {
            H5T_STD_I32BE "Serial number";
            H5T_STRING {
               STRSIZE H5T_VARIABLE;
               STRPAD H5T_STR_NULLTERM;
               CSET H5T_CSET_ASCII;
               CTYPE H5T_C_S1;
            } "Location";
            H5T_IEEE_F64BE "Temperature (F)";
            H5T_IEEE_F64BE "Pressure (inHg)";
         }
         DATASPACE  SIMPLE { ( 4 ) / ( 4 ) }
      }
   }
}

The corresponding DDS is:

Dataset {
    Structure {
        Int32 Serial%20number;
        String Location;
        Float64 Temperature%20%28F%29;
        Float64 Pressure%20%28inHg%29;
    } /compound[4];
} d_compound.h5;

Note the HDF5 compound variable array /compound maps to DAP’s array of Structure. The special characters inside the member names of the compound datatype are changed according to section Default Option: DAP4 Name.

7. BES Keys

In the course of supporting easy access to NASA HDF5/HDF-EOS5/netCDF4 files via Hyrax, various performance and other optimization tuning options are provided to hyrax service customers via BES keys. In this section, the descriptions for critical BES keys are provided. For the comprehensive BES key description, check the HDF5 handler configuration file h5.conf.in at github.

7.1. Keys for Both CF and Default Options

H5.EnableCF
  • default=true

  • When this key is set to true or is not present in the configuration file, the handler handle the HDF5 file by following the CF conventions. The handler is especially tuned to handle NASA HDF5/netCDF4/HDF-EOS5 data products. For the tested NASA products, see [NASA Products supported and tested by the CF option of the Handler]. The key benefit of this option is to allow OPeNDAP visualization clients to display remote data seamlessly. Please visit here for details.

  • When this key is set to false, the handler handle the HDF5 file by following generic mapping from HDF5 to DAP. If the HDF5 file is a netCDF-4/HDF5 file or follows the netCDF data model and the DAP4 DMR response is requested, the handler can map the HDF5 to DAP4 by following the netCDF data model.

H5.MetaDataMemCacheEntries
  • default=1000

  • Setting the H5.MetaDataMemCacheEntries to a value greater than zero enables caching DDS,DAS and DMR responses in memory. Our performance study shows that, by turning on this key, the DDS,DAS or DMR response time is much faster.

  • The cache uses an LRU policy for purging old entries. It starts purging its objects after the number of entries exceeds the number defined by this key.

  • One can tune its behavior by changing this value and the H5.CachePurgeLevel value below. Note that this feature is on by default. The default value is 1000.

H5.CachePurgeLevel
  • default=0.2

  • This key determines how much of the in-memory cache is removed when it is purged. The default value is 0.2. With the default value, it configures the software to remove the oldest 20% of items from the cache.

7.2. Keys for CF Option

Note the following keys only take effect when H5.EnableCF is set to true. Unless specifically mentioned, these keys apply to both DAP2 and DAP4.

H5.EnableCFDMR
  • default=true

  • When this key is set to true, the DAP4 DMR is generated directly rather than via DDS and DAS. With this feature on, the HDF5 signed 8-bit integer is mapped to DAP4 signed 8-bit integer and the HDF5 64-bit integer is mapped to the corresponding DAP4 integer.

  • If this key is set to false, the DMR is generated by DDS and DAS and it maps signed 8-bit integer to signed 16-bit integer.

  • Note: Starting from 1.16.5, this key is set to true by default.

H5.EnableCoorattrAddPath
  • default=true

  • When this key is set to true, the group path contained in the "coordinates" attribute value for some general HDF5 products(ICESAT-2 ATL03 etc.) will be added and flattened. This is to make the coordinate variable names stored in the "coordinates" attribute consistent with the flattened variables in the DAP output.

H5.ForceFlattenNDCoorAttr
  • default=true

  • If this key is set to true, the handler will try to flatten the coordinate variable path stored inside the "coordinates" attribute. Currently, this key only takes effect for the HDF5 file that follows the netCDF-4 data model when the 2-D latitude/longitude fields present.

H5.EnableDropLongString
  • default=true

  • If this key is set to true, under the conditions described below, the long string variables or attributes will be elided.

  • We find netCDF java has a string size limit(currently 32767). If an HDF5 string dataset has an individual element of which the size is greater than this limit, visualization tools(Panoply etc.) that depend on the netCDF Java may not open the HDF5 file. So this key is set to true to skip the HDF5 string of which size is greater than 32767. Users should set this key to false if that long string information is necessary or visualization clients are not used.

  • NOTE: For the following two cases, the long string won’t be dropped since the latest netCDF Java works.

    1) The size of an HDF5 string attribute exceeds 32767.
    2) Even if the total size of an HDF5 string dataset exceeds 32767, but the
       individual string element size does not exceed 32767.
H5.EnableAddPathAttrs
  • default=true

  • When this key is set to true, the original path of the HDF5 group or variable is kept as an attribute. Users can set this key to false if users don’t care about the absolute path of object names.

H5.EnableFillValueCheck
  • default=true

  • When this key is set to true, the handler will check if the _FillValue attribute holds the the correct datatype and the attribute value is inside the valid data range.

  • We find that occasionally that the datatype of attribute _FillValue is different than the datatype of the corresponding variable for some NASA HDF5 products. This violates the CF conventions. So the handler corrects the FillValue datatype to make it the same as the corresponding variable datatype. However, the original value of the _fillvalue may also fall out of the range of the variable datatype. This can be illustrated by the following example.

    • The variable and the _fillvalue are present as follows:

      • variable datatype: unsigned char

      • _fillvalue attribute datatype: signed char

      • the value of the _fillvalue: -127

    • NOTE: the value of the _filevalue(-127) is out of the data range of the unsigned char. An unsigned char number can not be negative.

    • If such a case occurs, we believe this is a data producer’s mistake and the hyrax service should return an error. The Hyrax data service center should report this issue back to the data producer. However, this may only occur for one or two variables and the data center may not want to stop the hyrax service. So we provide this BES key so that the data center can have an option to continue the service and may use NcML to patch the wrong fillvalue until the data producer corrects the wrong _fillvalue in the new release.

    • By default, this key is set to true. If the fillvalue is out of the range of the variable type, Hyrax generates an error and the service stops.

    • To ignore the _fillvalue check, set this key to false. The service runs normally but the _Fillvalue of some variables may be wrong and it will cause issues on the client-side.

H5.EnableDAP4Coverage
  • default=true

  • If this key is set to true, the handler adds the DAP4 coverage information to the DMR. This key only takes effect for DAP4 responses.

H5.EnableCheckNameClashing
  • default=false

  • When this key is set to true, the handler will check if there exists name clashing among variables and attributes. If name clashing occurs, the handler tries to resolve the name clashing by generating unique names for the clashed ones. For NASA HDF5 and HDF-EOS5 products, we don’t see any name clashings for variables and attributes. In fact, unlike HDF4, it is very rare to have name clashing for HDF5. So to reduce performance overhead, we set this key to false by default. Users can set this key to true if it becomes necessary.

H5.NoZeroSizeFullnameAttr
  • default=false

  • When this key is set to true, the fullnamepath attribute will NOT be added if the HDF5 variable data storage size is 0. This is necessary to generate correct HDF5 dmr++ files.

H5.EscapeUTF8Attr
  • defalut=true

  • When this key is set to true, the attribute values that use UTF-8 character encoding are escaped in the same way as values that use the ASCII encoding. To enable UTF-8 in attribute values, set this key to false.

H5.EnableDiskMetaDataCache
  • default=false

  • If this key is set to true, the DAS will be cached into a file. The handler will read DAS from the cached file instead of using the HDF5 library to build since the second time. Note this key only takes effect for DAP2 responses.

  • Since Hyrax 1.15, MetaData Store(MDS) has the similar feature as this key can achieve. By default, this key is set to false. Users are encouraged to check if turning this key on can improve performance before setting this key true.

H5.EnableEOSGeoCacheFile
  • default=false

  • When this key is set to true, HDF-EOS5 Geolocation data is cached to a file.

  • The latitude and longitude of an HDF-EOS5 grid will be calculated on-the-fly according to projection parameters stored in the HDF-EOS5 file. The same latitude and longitude are calculated each time when an HDF-EOS5 grid is fetched. When the H5.EnableEOSGeoCacheFile key is set to true, the calculated latitude and longitude are cached to two flat binary files so that the same latitude and longitude will be obtained from the cached files starting from the second fetch. Several associated keys must be set correctly when this key is set to true.

    • The description of these associated keys are:

      • H5.Cache.latlon.path - This key should provide the full path of an existing directory that grants the read and write permissions for the generated latitude and longitude cached files.

      • H5.Cache.latlon.prefix - This key provides a prefix for the cache file. This is required by BES.

      • H5.Cache.latlon.size - This key provides the size of the cache in megabytes, the value must be greater than 0.

      • Example:

        H5.EnableEOSGeoCacheFile=true
        H5.Cache.latlon.path=/tmp/latlon
        H5.Cache.latlon.prefix=l
        H5.Cache.latlon.size=2000
  • NOTE: When HDF-EOS5 level 3 Grid products are served by Hyrax, turning on this feature may greatly improve the data access performance. Hyrax service customers should take advantage of this feature if the served data products are HDF-EOS5 level 3. By default, this key is set to false since, when this feature is turned on, several BES Keys are involved, and it takes effort for service people to set the keys.

H5.EnableDiskDataCache
  • default=false

  • If this key is set to true, the variable data will write to a binary file in the server. Data will be read in from the cached file since the second fetch. Several associated keys must be set correctly when this key is set to true. The description of these associated keys are:

    • H5.DiskCacheDataPath - This key should provide the full path of an existing directory that grants the read and write permissions for the generated variable cached files.

    • H5.DiskCacheFilePrefix - This key provides a prefix for the cache file. This is required by BES.

    • H5.DiskCacheSize - This key provides the size of the cache in megabytes, the value must be greater than 0.

      • Example:

        H5.EnableDiskDataCache=true
        H5.DiskCacheDataPath=/tmp
        H5.DiskCacheSize=100000
H5.DiskCacheComp
  • default=true and this key only takes effect when the H5.EnableDiskDataCache key is set to true.

  • This key and its associated keys provide a way for users to fine tune the data to be cached in the disk.

  • NOTE: This key will take effect only when the H5.EnableDiskDataCache key is set to true.

  • The motive for this key is that users may not want to cache all variables either because there is disk limitation or the performance gain is less optimal for some variables. This key and the following associated keys will help mitigate these issues.

    • If this key is set to true, only compressed HDF5 variables are cached. If compressed variables are cached, there is no data decompression time when retrieving the data. Therefore, performance may get improved.

    • The following keys are provided to further limit the compressed variables of which the data is cached to the disk when the H5.DiskCacheComp is set to true.

      • H5.DiskCacheFloatOnlyComp: If this key is set to true, only floating-point compressed variables are cached.

      • H5.DiskCacheCompThreshold: To take advantage of this key its value must be a floating-point number greater than 1.

        • The handler will compare the compression ratio of a variable with this number, only when the compression ratio is smaller than this number(that is: the variable is hard to compress), the variable is cached. In other words, hard compressed variable usually takes longer decompression time. So using disk cache may greatly reduce the processing time.

      • H5.DiskCacheCompVarSize: The value of this key represents the variable size in kilobytes. It must be a positive integer number.

        • Only if the (uncompressed) variable size that is greater than this value, that variable data is cached. For example, if this number is 100, only the size of variable that is >100K will be cached.

7.3. Keys for Default Option

H5.DefaultHandleDimension
  • default=true

  • When this key is set to true, the handler follows the netCDF-4 data model to handle the HDF5 dimensions if possible.

  • Note: this key only takes effect for DAP4 responses.

Important
The BES keys listed in the Keys for CF Option will be no-op when the default option is used.

7.4. Default BES Key Values

This is the default setting for BES keys in Hyrax 1.16.5. It means that even without setting any BES key values, the handler will generate either DAP2 or DAP4 output as if these BES key values are set. As the software improves, the default setting may change; check the HDF5 handler configuration file h5.conf.in at github.

H5.EnableCF=true
H5.EnableCFDMR=true
H5.ForceFlattenNDCoorAttr=true
H5.EnableCoorattrAddPath=true
H5.EnableDAP4Coverage=true
H5.EnableAddPathAttrs=true
H5.EnableDropLongString=true
H5.EnableFillValueCheck=true

H5.EscapeUTF8Attr = true
H5.EnableCheckNameClashing=false
H5.NoZeroSizeFullnameAttr=false
H5.RmConventionAttrPath=true
H5.KeepVarLeadingUnderscore=false
H5.CheckIgnoreObj=false

H5.EnablePassFileID=false
H5.MetaDataMemCacheEntries=1000

H5.EnableDiskMetaDataCache=false
H5.EnableDiskDataCache=false
H5.DiskCacheComp=false

H5.DisableStructMetaAttr=true
H5.DisableECSMetaAttr=false
H5.EnableEOSGeoCacheFile=false

8. Limitations

Unless explicitly specified, the limitations listed below apply to both DAP2 and DAP4. CF Option:

  • The mappings of the following datatypes are not supported:

    • variable length(excluding variable length string), time, enum, bitfield, opaque, compound, array, and reference types are not supported.

  • The HDF5 files containing cyclic groups are not supported.

  • The handler does not handle the mapping of HDF5 soft links, external links and comments.

  • For DAP2 only, the mapping of HDF5 64-bit integer objects is not supported; the HDF5 8-bit signed integer datatype is mapped to DAP2 16-bit signed integer datatype.

Default option:

  • An HDF5 object name containing a period (“.”) is not supported.

  • The mappings of the following datatypes are not supported:

    • variable length(excluding variable length string), time, enum, bitfield, and opaque datatypes are not supported.

  • The HDF5 files containing cyclic groups are not supported.

  • The handler supports the mapping of soft links but not external links and comments.

  • DAP4 coverage is not supported. DAP2 grid is also not supported.

  • For DAP2 only, the mapping of HDF5 64-bit integer objects is not supported; the HDF5 8-bit signed integer datatype is mapped to DAP2 16-bit signed integer datatype.

9. Miscellaneous Information

9.1. NASA Products Supported and Tested by the CF option of the Handler

  • HDF-EOS5 products

    • HIRDLS, MLS, TES, OMI, MOPITT, LANCE AMSR_2, VIIRS, MEaSURES GSSTF

  • netCDF-4/HDF5 products

    • TROP-OMI, AirMSPI, OMPS-NPP, Arctas-CAR, many MEaSURES, Ocean color,GHRSST, ICESAT-2 ATL/Mable/GLAH

  • HDF5 products

    • SMAP, GPM, OCO2/ACOS/GOSAT, Aquarius

Note
The HDF5 handler should support any netCDF-4/HDF5 products and HDF-EOS5 products. The above just lists the data products that the handler explicitly tests.

9.2. Supporting netCDF-4 Products

Unless served by customized service like NASA-Compliant General Application Platform(NGAP), by default the netCDF-4 files with the file name suffix like .nc or .nc4 will be served by Hyrax’s netCDF handler. Unlike the HDF5 handler, the netCDF4 handler only supports netCDF classic data model. The group hierarchy is elided and the datatypes not supported by the netCDF classic data model are also elided.

One way to use the HDF5 handler to serve these netCDF4 files is to change the file name suffix to .h5 or to add the file name suffix .h5. For example, do the following:

change the file name of a netCDF-4 file: foo.nc -> foo.h5
Or add the file name suffix .h5 to a netCDF-4 file: foo2.nc4 -> foo2.nc4.h5

The second way is to use Hyrax’s site.conf feature to make a customized configuration file so that these netCDF-4 files can be served by the HDF5 handler. Check here on how to use site.conf.

9.3. Elided Object Check

The handler provides a way for Hyrax service customers to check and list the objects in the served HDF5 file that are not mapped to DAP2. This check is valid for the DAP2 service when the CF option is on although most of the checks are also valid for the corresponding DAP4 service. This key is useful for a hyrax data distributor to check the unsupported HDF5 objects by Hyrax before serving the data.

Warning
This feature has not been tested much and we welcome to the feedback.

To use this feature, make sure the following two BES keys to be set as follows:

H5.EnableCF=true
H5.CheckIgnoreObj=true

Check the DAS output. It will list the elided HDF5 objects and attributes when mapping HDF5 to DAP2.

Important
After checking the ignored HDF5 object and attribute information, make sure to change the CheckIgnoreObj key back to false. H5.CheckIgnoreObj=false

9.4. Variable Aggregation and Attribute Modification with NcML handler

One can modify the HDF5 attributes and aggregate HDF5 variables via the NcML handler . More information and examples on how to use the NcML handler can be found at http://hdfeos.org/examples/ncml.php and https://hdfeos.org/zoo/hdf5_handler/ncml_opendap.php.

10. Further Reading

The web page includes pointers to the demo page to access NASA HDF5 products as well as other older but useful documents.