Permalink
Cannot retrieve contributors at this time
This file describes the HDF5 handler developed by The HDF Group and OPeNDAP, | |
Inc. under a grant from NASA. For information about building the HDF5 handler, | |
see the INSTALL. | |
What is the HDF5 handler? | |
--------------------- | |
Hierarchical Data Format Version 5 (HDF5) is a general-purpose library and | |
file format for storing, managing, archiving, and exchanging scientific data. | |
The HDF5 data model includes two primary types of objects, a number of | |
supporting object types, and metadata describing how HDF5 files and objects | |
are to be organized and accessed. The HDF5 file format is self-describing in | |
the sense that the structures of HDF5 objects are described within the file. | |
The HDF5 handler is a Hyrax Back-end Server(BES) module that maps HDF5 | |
objects into OPeNDAP's DAP2 data model. This allows users to access the data | |
in remote HDF5 files using the OPeNDAP clients. There can be many ways to | |
serve HDF5 data but this handler differentiates itself by putting the goal of | |
following CF conventions to support NASA HDF5/HDF-EOS5 products first. The | |
HDF5 handler team strives to achieve what is called "CF-compliant" status of | |
all NASA HDF5 data products. This means that OPeNDAP visualization client | |
users should feel easy in accessing and visualizing the remote NASA | |
HDF5/HDF-EOS5 data products if NASA data centers provide the Hyrax OPeNDAP | |
services. | |
Since some NASA HDF5/HDF-EOS5 products either do not follow or only partially | |
follow CF conventions, the handler developers tried to make them be | |
CF-compliant so that OPeNDAP client tools can visualize these products. | |
This can be realized by the developers' knowledge and experiences as well | |
as intensive discussions with developers of the corresponding NASA data | |
centers. The output of this handler has been checked carefully with OPeNDAP | |
client tools such as IDV, Panoply, GrADS, Ferret, NCL, MATLAB, and IDL. | |
A comprehensive list of the improvements since the 2.0.0 release is | |
available on the next section. | |
What's new for Hyrax 1.16.3 | |
Default option: | |
1. Significantly enhance the handling of netCDF-4 like HDF5 files for the DAP4(DMR) response. | |
1) Correctly handle the pure netCDF-4 dimensions. | |
2) Remove the pre-defined netCDF-4 attributes. | |
3) Update the testsuite that tests NASA DAP4 response. | |
CF option: | |
1. ICESat-2 ATL03 and ATL08 support: | |
Add the group path and make the variable names follow the CF naming | |
conventions for "coordinates" attributes of ATL03-like variables. | |
2. Ensure the unsupported objects are removed in the DMR response. | |
3. Add the support of the new GPM DPR level 3 version product. | |
What's new for Hyrax 1.16.2 | |
CF option: | |
1. Support the correct generation of DMRPP files. | |
1) Add BES keys to turn off the mapping of 64-bit integer to DMR and | |
not to generate the fullnamepath attribute when the storage size is zero. | |
2) Ensure that the fullnamepath and origname attributes are not generated | |
when the corresonding BES keys are turned off. | |
2. Enhance the support to make general two dimensional latitude/longitude | |
HDF5 files follow CF so that NASA OMPS-NPP products can be supported. | |
3. Enhance the support of netCDF-4 like files. | |
What's new for Hyrax 1.16.1 | |
Performance Improvement for both CF and default options: | |
The Hyrax data access will not build the DAS if not necessary. | |
CF option: | |
1. Correct the unlimited dimension name for an HDF5 file that has both unlimited | |
dimensions and group hierarchy.GES DISC GPM level 3 belongs to this category. | |
2. Remove the netCDF-4 reserved NAME attributes from Hyrax DAS output since these | |
attributes will cause the the failure of the fileout netCDF-4 module. | |
3. Add a BES key so that the CF global attribute "Conventions" will not prepend | |
the HDF-EOS group path. | |
What's new for Hyrax 1.16.0 | |
CF option: | |
Fix a small memory leaking when handling OCO2 Lite product. | |
What's new for Hyrax 1.15.3 | |
The handler is updated to successfully compile and test with both HDF5 1.8.21 and HDF5 1.10.4. | |
What's new for Hyrax 1.15.0 | |
CF option: | |
1. Add the support of the HDF-EOS5 Polar Stereographic(PS) and Lambert Azimuthal Equal Area(LAMAZ) grid projection files. | |
Both projection files can be found in NASA LANCE products. | |
2. Add the HDF-EOS5 grid latitude and longitude cache support. This is the same as what we did for the HDF-EOS2 grid. | |
3. Add the support for TROP-OMI, new OMI level 2 and OMPS-NPP product. | |
4. Removed the internal reserved netCDF-4 attributes for DAP output. | |
5. Make the behavior of the drop long string BES key consistent with the current limitation of netCDF Java. | |
What's new for Hyrax 1.14.0 | |
CF option: | |
1. Add the support of Hybrid HDF-EOS5 products. An example is the NASA ASDC AirMSPI product. | |
(1) The HDF-EOS5 specified group path will be removed before flattening the variable name by following | |
the CF. | |
(2) If a hybrid HDF-EOS5 product follows the netCDF-4 data model, the handler will also follow | |
the netCDF-4 data model to map the HDF5 objects to DAP2. | |
(3) For a product that contains the "coordinate" attribute in a variable, the attribute value will | |
be adjusted to reflect the removal of the HDF-EOS5 specified group path. | |
(4) For a grid product that has CF grid_mapping information, the related grid_mapping information is | |
also adjusted to reflect the removal of the HDF-EOS5 specified group. | |
2. Enhance the HDF-EOS5 parser to support new OMI level 2 products. | |
What's new for Hyrax 1.13.4 | |
---------------------------- | |
CF option: | |
1. Add the disk cache support for raw data and DAS. | |
This support aims to improve the performance to access HDF5 data via Hyrax. | |
Since this support may vary from case to case, by default we turn it off. | |
(1) Users, who want to use the disk cache for raw data, should read the description of the following | |
BES keys at h5.conf under /etc/bes/modules and change the BES key values to fit for your own use | |
cases. | |
H5.DiskCacheComp=false | |
H5.DiskCacheFloatOnlyComp=true | |
H5.DiskCacheCompThreshold=2.0 | |
H5.DiskCacheCompVarSize=100 | |
(2) Users, who want to use the disk cache for DAS, should also go to the h5.conf file and change the | |
BES key H5.EnableDiskMetaDataCache to true. The appropriate path that stores the DAS cache file | |
should also be set with the BES key H5.DiskMetaDataCachePath. | |
2. Add the HDF-EOS5 sinussoidal projection support. | |
For the HDF-EOS5 sinusoidal projection, the latitude and longitude are calculated and CF grid projection | |
information is added. | |
***************************************************** | |
********Special note about the version number******** | |
***************************************************** | |
Since Hyrax 1.13.2, Hyrax just treats each handler as a module. | |
So We stop assigning an individual version number for the HDF5 handler. | |
What's new for version 2.3.3(Released with Hyrax 1.13.2) | |
---------------------------- | |
- The retrieval of BES key values are moved to the the constructor of the hdf5 handler to improve the | |
performance. | |
- The DAP metadata responses: DDS,DAS and DMR can be cached in memory to improve the performance. | |
CF option: | |
- Add the memory cache support to store data values of coordinate variables and specific data variables. | |
- Update the support of the fillvalue and the addition of new coordinate variables in the new GPM products. | |
- Fix a bug to identify variables latitude and longitude for SMAP-like products. | |
Default option: | |
- Add the mapping of root attributes to DAP4. | |
Note: | |
1) The description of memory cache feature can be found in h5.conf.in under | |
https://github.com/OPENDAP/hdf5_handler. | |
2) Since Hyrax 1.13.2 is an emergency release, the handler version is not bumped. | |
What's new for version 2.3.3(Released with Hyrax 1.13.1) | |
---------------------------- | |
CF option: | |
- HDF5 Scalar dataset reading | |
HDF5 scalar datasets with all atomic datatypes are supported. In previous versions, only string scalar | |
dataset is supported. | |
- Unlimited dimension | |
Now the client that understands the unlimited dimension can correctly retrieve this information. | |
- 0-size attribute | |
0-size attribute will be ignored. This case was not considered in the previous versions. | |
- empty array reading | |
Update the way to check if the array index is valid. This will ensure that the empty array reading | |
doesn't fail. | |
- _FillValue checking | |
Both _FillValue range and datatype are checked. | |
Sometimes the data producers will provide the wrong value and the wrong datatype. | |
In the previous versions, | |
the handler only corrects the datatype if the _FillValue type is not the same as the variable type. | |
What's new for version 2.3.2(Released with Hyrax 1.13.0) | |
---------------------------- | |
CF option: | |
- By default, the leading underscore of a variable path is removed for all files. Although not recommended, | |
Users can change the BES key H5.KeepVarLeadingUnderscore at h5.conf.in to be true for | |
backward compatibility if necessary. | |
- Significantly improve the support of generic HDF5 files that have 2-D lat/lon. This improvement makes | |
some SMAP level 1, level 3 and level 4 products plot-able by CF tools such as Panoply. | |
- Add the general support of netCDF-4-like HDF5 files that have 2-D lat/lon. This improvement makes TOMS | |
MEaSURES product plot-able by CF tools such as Panoply. It will also support future potential products | |
that follow netCDF generic data model. | |
What's new for version 2.3.1(Released with Hyrax 1.12.2) | |
---------------------------- | |
There are no new features added in this release. We improve the code quality by fixing a potential resource | |
leaking issue and other misc. issues. | |
What's new for version 2.3.0(Released with Hyrax 1.12.1) | |
---------------------------- | |
Default option: | |
- Add the pure DAP4 support. | |
a) HDF5 group is mapped to DAP4 group. | |
b) HDF5 dimensions that follow netCDF-4 data model are mapped to DAP4 dimensions. | |
c) HDF5 signed 8-bit integer, signed and unsigned 64-bit integers are mapped to correspondng DAP4 | |
datatypes. | |
- Re-implement the data access of DAP structure mapped from an HDF5 compound datatype dataset. | |
a) The nested compound datatype(array or scalar) and the array type inside a compound datatype are | |
supported. | |
b) The base datatype inside an HDF5 compound datatype can contain compound, array datatype and integer, | |
float and string(including variable length string)datatypes. Other HDF5 datatypes are not supported. | |
c) The base datatype of an array datatype inside an HDF5 compound datatype can be compound datatype | |
and integer, float and string(including variable length string). | |
- Re-implment the data access of a DAP string(array or scalar)mapped from an HDF5 variable length string | |
dataset. | |
- New enforced limitations | |
these limitations were not clearly stated in the previous versions. | |
a) We don't support the mapping of HDF5 array datatype to DAP except when the array datatype is used | |
inside an HDF5 compound datatype. | |
b) We don't support the mapping of HDF5 compound datatype to DAP when an attribute datatype is an HDF5 | |
compound. Such an HDF5 attribute is ignored in the DAP DAS. | |
CF option: | |
- Add an option to generate ignored object information from HDF5 to DAP2 mapping. | |
- Add the support of new GPM level 3 products. | |
- Add the support of OCO-2 products. | |
- Add the support of netCDF-4 classic-like HDF5 files that have 2-D lat/lon. This effectively supports | |
the ASF SeaSat product. | |
- Add the support of generic HDF5 files that have 1-D or 2-D lat/lon. This also generally supports the | |
LP DAAC ASTER GED product. | |
- Fix a few bugs related to _FillValue and duplicate coordiate variables discovered when testing with | |
OMI,GPM and Aquarius products. | |
[Known Issues] | |
- We found that the file netCDF module still fails to generate a netCDF-4 file when a DAP string array | |
is mapped to netCDF-4. It can generate the netCDF-3 files correctly. This is not a bug inside the HDF5 | |
handler or libdap and BES. | |
The detailed description of this issue can be found in OPeNDAP's trac ticket | |
https://opendap.atlassian.net/browse/TRAC-2185. | |
- We also found that "Get as NetCDF 4" function may not work with hdf5_handler especially when | |
you download the entire data on big HDF5 files without subsetting. One reason is that | |
the current CentOS 6 uses the old NetCDF-4 and HDF5 RPM packages. | |
Please contact RedHat directly to speed up the release of new RPM packages through EPEL. | |
What's new for version 2.2.3(Hyrax 1.11.2, 1.11.1, 1.11.0,1.10.1 1.10.0) | |
---------------------------- | |
For the CF option: | |
- Implement an option not to pass HDF5 file ID from DDS/DAS service to data service | |
since NcML may not work when the file ID is passed. | |
- Add support for several NASA HDF5 products: | |
GES DISC GPM level 1, level 2, level 3 DPR, level 3 GPROF, and level 3 IMERGE products | |
GES DISC some netCDF-4 like MEaSUREs products | |
OBPG level 3m HDF5 and MOPITT level 3 proudcts | |
- Performanc tuning: Add a BES option not to generate StructMetadata for HDF-EOS5-like files. | |
- Correct the values for the predefined attribute orig_dimname_list. | |
- Read the description of BES keys in the file h5.conf.in to see if the default values need | |
to be changed for your service. | |
[Known Issues] | |
- We found that the file netCDF module still fails to generate a netCDF-4 file when a DAP string array | |
is mapped to netCDF-4. It can generate the netCDF-3 files correctly. This is not a bug inside the HDF5 | |
handler or libdap or BES. | |
The detailed description of this issue can be found in OPeNDAP's trac ticket | |
https://opendap.atlassian.net/browse/TRAC-2185 | |
- We also found that "Get as NetCDF 4" function may not work with hdf5_handler especially when | |
you download the entire data on big HDF5 files without subsetting. One reason is that | |
the current CentOS 6 uses the old NetCDF-4 and HDF5 RPM packages. | |
Please contact RedHat directly to speed up the release of new RPM packages through EPEL. | |
What's new for version 2.2.2(Hyrax 1.9.7) | |
---------------------------- | |
For the CF option: | |
- Improve file I/O by reducing the number of HDF5 file open/close requests. | |
- Error handling is greatly improved: resources are released properly when errors occur. | |
For both CF and default options: | |
- Some memory leaks detected by valgrind are fixed. | |
What's new for version 2.2.1 | |
---------------------------- | |
Internal code improvements. | |
What's new for version 2.2.0 | |
---------------------------- | |
This version supports dimension scale and ICESat/GLAS product. It also fixes | |
a few bugs. Please see ChangeLog for details about bug fixes. | |
What's new for version 2.1.1 | |
---------------------------- | |
This version fixes a few bugs. It handles the concatenation of metadata files | |
in a format like "coremetadata.0" and "coremeata.0.1." In previous versions, | |
it handled only "coremetadata.x" format. It fixes a bug to access GESDISC | |
BUV Ozone files. | |
What's new for version 2.1.0 | |
---------------------------- | |
This version improves the performance to read HDF5 variables. The previous | |
assumption was that NASA files usually don't have many variables. Thus, to save | |
time of opening APIs and better coding for error handling, the handler just | |
held the API IDs and released them at last. However, GES DISC recently has | |
produced a file with more than 1000 objects and want it to be served by | |
OPeNDAP. It took much longer than expected. A thorough investigation | |
revealed that the retrieval of HDF5 objects was the performance bottle neck. | |
The new version addresses this issue by closing the HDF5 object IDs gradually. | |
Also, it fixes a bug that one HDF5 object API is not closed and leaked the | |
system resources. | |
A new BES key H5.DisableStructMetaAttr is added so the handler can skip | |
parsing StructMetadata and generating the attribute in DAS output for HDF-EOS5 | |
files. | |
What's new for version 2.0.0 | |
---------------------------- | |
This version has significant changes in handling NASA HDF5/HDF-EOS5 data | |
products. As the new major version number change indicates, the CF support | |
part of the handler is completely re-engineered. | |
Since the main effort of this version of the handler is to support the easy | |
access of most NASA HDF5/HDF-EOS5 data products by following the CF conventions, | |
the CF option of the HDF5 handler is turned on by default. | |
The --enable-cf configuration option is replaced with the BES key called | |
"H5.EnableCF". You can enable or disable CF feature of the HDF5 handler | |
dynamically by modifying the /etc/bes/modules/h5.conf configuration file | |
first and then restarting the BES server using "besctl restart". | |
The "IgnoreUnknownTypes" BES key is removed because the same functionality is | |
implemented with the new "EnableCF" key. We added three more keys and they | |
are explained in the NOTES section of INSTALL file. | |
The handler is tested with HDF5 version 1.8.8. We believe that the handler | |
should work with HDF5 1.8.5 and versions after. To achieve better performance, | |
we strongly suggest users to use the latest HDF5 release. See REQUIREMENTS | |
section of INSTALL file on how to get the latest RPMs of HDF5. | |
Supported NASA HDF5/HDF-EOS5 data products in CF Option in the current release | |
------------------------------------------------------------------------------- | |
AURA OMI/HIRDLS/MLS/TES | |
MEaSUREs SeaWiFS | |
MEaSUREs Ozone | |
Aquarius | |
GOSAT/acos | |
SMAP(simulation) | |
Please see the Limitation section below for special notes about OMI L2G and | |
GOSAT/acos products. We plan to add new NASA HDF5 and HDF-EOS5 products in the | |
future release. | |
Supported HDF5 data types for both CF and default options | |
--------------------------------------------------------- | |
NASA data products do not use all HDF5 datatypes provided by the HDF5 | |
library. Not all HDF5 datatypes can be mapped to DAP2 datatypes, either. | |
Thus, the HDF5 handler team focused on the most common HDF5 datatypes. | |
Generally, non-supported data types are ignored. | |
unsigned char, char, | |
unsigned 16-bit integer, 16-bit integer, | |
unsigned 32-bit integer, 32-bit integer, | |
32-bit and 64-bit floating data, | |
HDF5 string. | |
Supported HDF5 data types for the default option only | |
----------------------------------------------------- | |
Compounds: Compound data types are mapped into DAP2 Structure. | |
References: Object or regional references are mapped into URLs. | |
Other mapping information | |
------------------------- | |
CF option: | |
Group path: An HDF5 dataset's full path information can be found in | |
"fullnamepath" attribute. | |
Default option: | |
Group path: An HDF5 dataset's full path information can be found in | |
in "HDF5_OBJ_FULLPATH" attribute. | |
Group structure: Group structure, the relation among groups, is | |
mapped into a special attribute called "HDF5_ROOT_GROUP". | |
Soft/hard Links: Links are mapped to attributes in DAS. | |
Comments: Comments are mapped into DAS attributes. | |
Implementation details in general | |
---------------------------------- | |
The implementation largely follows the design. Please read the following | |
design note for details at | |
http://hdfeos.org/software/hdf5_handler/doc/Reengineering-HDF5-OPeNDAP-handler.pdf | |
Here are a few highlights for the implementation. | |
o The implementation of the CF option is separated from that of the | |
default option. | |
o The HDF5 1.8 APIs are used to retrieve HDF5 object information for | |
both the CF and the default options. | |
o The CF option only: | |
- HDF5 products are categorized and are separately handled except | |
for the modules that can be shared. One such example is the | |
module that makes the object names follow the CF name conventions. | |
- Translating metadata to DAP2 is separated from retrieving the | |
raw data. | |
- The handler provides an option to handle object name clashing. | |
- BES keys are used to replace the #ifdef macro. This makes the | |
code much cleaner and easier to maintain. | |
- The DAP2 variable and attribute names strictly follow the object | |
name conventions in the section 3.2.3 of the design note. | |
Implementation Details for HDF-EOS5 in CF option | |
------------------------------------------------ | |
Swath: Based on the dimension information specified in the | |
StructMetadata file, fake dimension variables are generated with | |
integer values. | |
Zonal Average: The current version only supports the zonal average | |
file augmented by the HDF-EOS5 augmentation tool since only the | |
augmented zonal average files are found among NASA HDF-EOS5 zonal | |
average products. Dimension variables are constructed based on the | |
augmentation information stored in the file. For more information | |
about the augmentation, please refer to the BACKGROUND section of | |
HDF-EOS5 augmentation tool page at | |
http://hdfeos.org/software/aug_hdfeos5.php | |
Grid: The fake dimension handling is the same as Swath. | |
In addition, based on the projection parameters specified in the | |
StructMetadata file, 1-D latitude and longitude arrays are | |
automatically computed and added in the DAP2 output. | |
Metadata: If metadata (e.g., StructMetadata or CoreMetadata) is | |
split and stored into multiple attributes (e.g., StructMetadata.0, | |
StructMetadata.1, ..., StructMetadata.n), they are merged into one | |
string and then parsed so that it can be represented in structured | |
attribute format in DAS output. | |
Testing the HDF5 handler | |
------------------------ | |
The handler source package has more than 50 test files under data/ directory. | |
If you build the handler from the source, the 'make check' command will test | |
both CF and default options using the test HDF5 files. The full C source codes | |
for generating the test HDF5 files are also available under data/src directory | |
although they are not compiled during the building or testing the handler. | |
Limitations for the CF option | |
----------------------------- | |
o Generally the mappings of 64-bit integer, time, enum, bitfield, | |
opaque, compound, array, and reference types are not supported. | |
The mapping of one HDF5 64-bit integer variable into two DAP2 32-bit | |
integers in GOSAT/acos is based on the discussions with the data | |
producers. Except one dimensional variable length string array, | |
the mapping of the variable length datatype is not supported either. | |
The handler simply ignores these unsupported datatypes. | |
o HDF5 files containing cyclic groups are not supported. | |
If such files are encountered, the handler hangs with infinite loops. | |
o The handler ignores soft links, external links and comments. | |
A hardlink is handled as an HDF5 object. | |
o For the HDF5 datasets created with the scalar dataspace, the handler | |
can only support the string datatypes. It ignores the datasets created | |
with other datatypes. HDF5 allows the size of a dimension to be 0 | |
(zero) for a dataspace. The handler also ignores the datasets created | |
with such dataspace. The mapping of any HDF5 datasets with NULL | |
dataspace is also ignored. | |
o Currently, GOSAT/acos and OMI level 2G products cannot be visualized | |
by OPeNDAP visualization tools because of the limitations of the | |
current CF conventions and netCDF-Java visualization tools (IDV, | |
Panoply, etc.) | |
o We found object reference attributes in several NASA products. | |
Since these attributes are only used to generate the DAP2 dimensions | |
and coordinate variables, ignoring the mapping of these attributes | |
doesn't lose any essential information for OPeNDAP users. | |
o fileout_netcdf prints H5Fclose() internal error message on CentOS6 | |
with Hyrax-1.8.8 and hdf5-1.8.5.patch1-7.el6.x86_64.rpm: | |
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0: | |
#000: ../../src/H5F.c line 1957 in H5Fclose(): invalid file identifier | |
major: Invalid arguments to routine | |
minor: Inappropriate type | |
However, users can still get HDF5 as either NetCDF-3 or NetCDF-4 | |
successfully. We strongly recommend you to use the latest HDF5 and | |
NetCDF RPMs. | |
Limitations in default option | |
----------------------------- | |
o No support for HDF5 files that have a '.' in a group/dataset | |
name. | |
o The mappings of HDF5 64-bit integer, time, enum, bitfield, and | |
opaque datatypes are not supported. | |
o Except for one dimensional HDF5 variable length string array, HDF5 | |
variable length datatype is not supported either. | |
o HDF5 external links are ignored. The mapping of HDF5 objects with | |
NULL dataspace is not supported. | |
Additional background on the HDF5 handler | |
----------------------------------------- | |
The HDF5 handler is one component of the Hyrax BES; the Hyrax BES | |
software is designed to allow any number of handlers to be | |
configured easily. See the BES Server README and INSTALL files for | |
information about configuration, including how to use this handler. | |
Installing the HDF5 handler in Hyrax | |
------------------------------------ | |
The Linux RPM package will install h5.conf file with all options true except | |
for the H5.EnableCheckNameClashing option. | |
A test HDF5 file is also installed, so after installing this handler, Hyrax | |
will have a data to serve providing an easy way to test your new | |
installation and to see how a working handler should look. To use | |
this, make sure that you first install the BES, and that dap-server | |
gets installed too. | |
Finally, every time you install or reinstall handlers, make sure to | |
restart the BES and OLFS. | |
Muqun Yang (myang6@hdfgroup.org) | |
Hyo-Kyung Lee (hyoklee@hdfgroup.org) |