-
Notifications
You must be signed in to change notification settings - Fork 3
/
README
453 lines (337 loc) · 20.7 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
This file describes the HDF5 handler developed by The HDF Group and OPeNDAP,
Inc. under a grant from NASA. For information about building the HDF5 handler,
see the INSTALL.
What is the HDF5 handler?
---------------------
Hierarchical Data Format Version 5 (HDF5) is a general-purpose library and
file format for storing, managing, archiving, and exchanging scientific data.
The HDF5 data model includes two primary types of objects, a number of
supporting object types, and metadata describing how HDF5 files and objects
are to be organized and accessed. The HDF5 file format is self-describing in
the sense that the structures of HDF5 objects are described within the file.
The HDF5 handler is a Hyrax Back-end Server(BES) module that maps HDF5
objects into OPeNDAP's DAP2 data model. This allows users to access the data
in remote HDF5 files using the OPeNDAP clients. There can be many ways to
serve HDF5 data but this handler differentiates itself by putting the goal of
following CF conventions to support NASA HDF5/HDF-EOS5 products first. The
HDF5 handler team strives to achieve what is called "CF-compliant" status of
all NASA HDF5 data products. This means that OPeNDAP visualization client
users should feel easy in accessing and visualizing the remote NASA
HDF5/HDF-EOS5 data products if NASA data centers provide the Hyrax OPeNDAP
services.
Since some NASA HDF5/HDF-EOS5 products either do not follow or only partially
follow CF conventions, the handler developers tried to make them be
CF-compliant so that OPeNDAP client tools can visualize these products.
This can be realized by the developers' knowledge and experiences as well
as intensive discussions with developers of the corresponding NASA data
centers. The output of this handler has been checked carefully with OPeNDAP
client tools such as IDV, Panoply, GrADS, Ferret, NCL, MATLAB, and IDL.
A comprehensive list of the improvements since the 2.0.0 release is
available on the next section.
What's new for version 2.3.3(Released with Hyrax 1.13.2)
----------------------------
- The retrieval of BES key values are moved to the the constructor of the hdf5 handler to improve the
performance.
- The DAP metadata responses: DDS,DAS and DMR can be cached in memory to improve the performance.
CF option:
- Add the memory cache support to store data values of coordinate variables and specific data variables.
- Updated the support of the fillvalue and the addition of new coordinate variables in the new GPM products.
- Fixed a bug to identify variables latitude and longitude for SMAP-like products.
Default option:
- Add the mapping of root attributes to DAP4.
Note:
1) The description of memory cache feature can be found in h5.conf.in under https://github.com/OPENDAP/hdf5_handler.
2) Since Hyrax 1.13.2 is an emergency release, the handler version is not bumped.
What's new for version 2.3.3(Released with Hyrax 1.13.1)
----------------------------
CF option:
- HDF5 Scalar dataset reading
All atomic HDF5 scalar dataset are supported. In previous versions, only string scalar dataset is supported.
- Unlimited dimension
Now the client that understands the unlimited dimension can correctly retrieve this information.
- 0-size attribute
0-size attribute will be ignored. This case was not considered in the previous versions.
- empty array reading
Update the way to check if the array index is valid. This will assure that the empty array reading doesn't fail.
- _FillValue checking
Both _FillValue range and datatype are checked.
Sometimes the data producers will provide the wrong value and the wrong datatype. In the previous versions,
the handler only corrects the datatype if the _FillValue type is not the same as the variable type.
What's new for version 2.3.2(Released with Hyrax 1.13.0)
----------------------------
CF option:
- By default, the leading underscore of a variable path is removed for all files. Although not recommended,
Users can change the BES key H5.KeepVarLeadingUnderscore at h5.conf.in to be true for backward compatibility if necessary.
- Significantly improve the support of generic HDF5 files that have 2-D lat/lon. This improvement makes some SMAP level 1, level 3
and level 4 products plot-able by CF tools such as Panoply.
- Add the general support of netCDF-4-like HDF5 files that have 2-D lat/lon. This improvement makes TOMS MEaSURES product plot-able
by CF tools such as Panoply. It will also support future potential products that follow netCDF generic data model.
What's new for version 2.3.1(Released with Hyrax 1.12.2)
----------------------------
There are no new features added in this release. We improved the code quality by fixing
a potential resource leaking issue and other misc. issues.
What's new for version 2.3.0(Released with Hyrax 1.12.1)
----------------------------
Default option:
- Add the pure DAP4 support.
a) HDF5 group is mapped to DAP4 group.
b) HDF5 dimensions that follow netCDF-4 data model are mapped to DAP4 dimensions.
c) HDF5 signed 8-bit integer, signed and unsigned 64-bit integers are mapped to correspondng DAP4 datatypes.
- Re-implement the data access of DAP structure mapped from an HDF5 compound datatype dataset.
a) The nested compound datatype(array or scalar) and the array type inside a compound datatype are supported.
b) The base datatype inside an HDF5 compound datatype can contain compound, array datatype and integer,float and string(including variable length string) datatypes. Other HDF5 datatypes are not supported.
c) The base datatype of an array datatype inside an HDF5 compound datatype can be compound datatype and integer, float and string(including variable length string).
- Re-implment the data access of a DAP string(array or scalar) mapped from an HDF5 variable length string dataset.
- New enforced limitations
these limitations were not clearly stated in the previous versions.
a) We don't support the mapping of HDF5 array datatype to DAP except when the array datatype is used inside an HDF5 compound datatype.
b) We don't support the mapping of HDF5 compound datatype to DAP when an attribute datatype is an HDF5 compound. Such an HDF5 attribute is ignored in the DAP DAS.
CF option:
- Add an option to generate ignored object information from HDF5 to DAP2 mapping.
- Add the support of new GPM level 3 products.
- Add the support of OCO-2 products.
- Add the support of netCDF-4 classic-like HDF5 files that have 2-D lat/lon. This effectively supports the ASF SeaSat product.
- Add the support of generic HDF5 files that have 1-D or 2-D lat/lon. This also generally supports the LP DAAC ASTER GED product.
- Fix a few bugs related to _FillValue and duplicate coordiate variables discovered when testing with OMI,GPM and Aquarius products.
[Known Issues]
- We found that the file netCDF module still fails to generate a netCDF-4 file when a DAP string array
is mapped to netCDF-4. It can generate the netCDF-3 files correctly. This is not a bug inside the HDF4 handler
or libdap and BES.
The detailed description of this issue can be found in OPeNDAP's trac ticket
http://scm.opendap.org/trac/ticket/2189 .
- We also found that "Get as NetCDF 4" function may not work with hdf5_handler especially when
you download the entire data on big HDF5 files without subsetting. One reason is that
the current CentOS 6 uses the old NetCDF-4 and HDF5 RPM packages.
Please contact RedHat directly to speed up the release of new RPM packages through EPEL.
What's new for version 2.2.3(Hyrax 1.11.2, 1.11.1, 1.11.0,1.10.1 1.10.0)
----------------------------
For the CF option:
- Implement an option not to pass HDF5 file ID from DDS/DAS service to data service
since NcML may not work when the file ID is passed.
- Add support for several NASA HDF5 products:
GES DISC GPM level 1, level 2, level 3 DPR, level 3 GPROF, and level 3 IMERGE products
GES DISC some netCDF-4 like MEaSUREs products
OBPG level 3m HDF5 and MOPITT level 3 proudcts
- Performanc tuning: Add a BES option not to generate StructMetadata for HDF-EOS5-like files.
- Correct the values for the predefined attribute orig_dimname_list.
- Read the description of BES keys in the file h5.conf.in to see if the default values need
to be changed for your service.
[Known Issues]
- We found that the file netCDF module still fails to generate a netCDF-4 file when a DAP string array
is mapped to netCDF-4. It can generate the netCDF-3 files correctly. This is not a bug inside the HDF4 handler
or libdap and BES.
The detailed description of this issue can be found in OPeNDAP's trac ticket
http://scm.opendap.org/trac/ticket/2189 .
- We also found that "Get as NetCDF 4" function may not work with hdf5_handler especially when
you download the entire data on big HDF5 files without subsetting. One reason is that
the current CentOS 6 uses the old NetCDF-4 and HDF5 RPM packages.
Please contact RedHat directly to speed up the release of new RPM packages through EPEL.
What's new for version 2.2.2(Hyrax 1.9.7)
----------------------------
For the CF option:
- Improve file I/O by reducing the number of HDF5 file open/close
requests.
- Error handling is greatly improved: resources are released properly when
errors occur.
For both CF and default options:
- Some memory leaks detected by valgrind are fixed.
What's new for version 2.2.1
----------------------------
Internal code improvements.
What's new for version 2.2.0
----------------------------
This version supports dimension scale and ICESat/GLAS product. It also fixes
a few bugs. Please see ChangeLog for details about bug fixes.
What's new for version 2.1.1
----------------------------
This version fixes a few bugs. It handles the concatenation of metadata files
in a format like "coremetadata.0" and "coremeata.0.1." In previous versions,
it handled only "coremetadata.x" format. It fixes a bug to access GESDISC
BUV Ozone files.
What's new for version 2.1.0
----------------------------
This version improves the performance to read HDF5 variables. The previous
assumption was that NASA files usually don't have many variables. Thus, to save
time of opening APIs and better coding for error handling, the handler just
held the API IDs and released them at last. However, GES DISC recently has
produced a file with more than 1000 objects and want it to be served by
OPeNDAP. It took much longer than expected. A thorough investigation
revealed that the retrieval of HDF5 objects was the performance bottle neck.
The new version addresses this issue by closing the HDF5 object IDs gradually.
Also, it fixes a bug that one HDF5 object API is not closed and leaked the
system resources.
A new BES key H5.DisableStructMetaAttr is added so the handler can skip
parsing StructMetadata and generating the attribute in DAS output for HDF-EOS5
files.
What's new for version 2.0.0
----------------------------
This version has significant changes in handling NASA HDF5/HDF-EOS5 data
products. As the new major version number change indicates, the CF support
part of the handler is completely re-engineered.
Since the main effort of this version of the handler is to support the easy
access of most NASA HDF5/HDF-EOS5 data products by following the CF conventions,
the CF option of the HDF5 handler is turned on by default.
The --enable-cf configuration option is replaced with the BES key called
"H5.EnableCF". You can enable or disable CF feature of the HDF5 handler
dynamically by modifying the /etc/bes/modules/h5.conf configuration file
first and then restarting the BES server using "besctl restart".
The "IgnoreUnknownTypes" BES key is removed because the same functionality is
implemented with the new "EnableCF" key. We added three more keys and they
are explained in the NOTES section of INSTALL file.
The handler is tested with HDF5 version 1.8.8. We believe that the handler
should work with HDF5 1.8.5 and versions after. To achieve better performance,
we strongly suggest users to use the latest HDF5 release. See REQUIREMENTS
section of INSTALL file on how to get the latest RPMs of HDF5.
Supported NASA HDF5/HDF-EOS5 data products in CF Option in the current release
-------------------------------------------------------------------------------
AURA OMI/HIRDLS/MLS/TES
MEaSUREs SeaWiFS
MEaSUREs Ozone
Aquarius
GOSAT/acos
SMAP(simulation)
Please see the Limitation section below for special notes about OMI L2G and
GOSAT/acos products. We plan to add new NASA HDF5 and HDF-EOS5 products in the
future release.
Supported HDF5 data types for both CF and default options
---------------------------------------------------------
NASA data products do not use all HDF5 datatypes provided by the HDF5
library. Not all HDF5 datatypes can be mapped to DAP2 datatypes, either.
Thus, the HDF5 handler team focused on the most common HDF5 datatypes.
Generally, non-supported data types are ignored.
unsigned char, char,
unsigned 16-bit integer, 16-bit integer,
unsigned 32-bit integer, 32-bit integer,
32-bit and 64-bit floating data,
HDF5 string.
Supported HDF5 data types for the default option only
-----------------------------------------------------
Compounds: Compound data types are mapped into DAP2 Structure.
References: Object or regional references are mapped into URLs.
Other mapping information
-------------------------
CF option:
Group path: An HDF5 dataset's full path information can be found in
"fullnamepath" attribute.
Default option:
Group path: An HDF5 dataset's full path information can be found in
in "HDF5_OBJ_FULLPATH" attribute.
Group structure: Group structure, the relation among groups, is
mapped into a special attribute called "HDF5_ROOT_GROUP".
Soft/hard Links: Links are mapped to attributes in DAS.
Comments: Comments are mapped into DAS attributes.
Implementation details in general
----------------------------------
The implementation largely follows the design. Please read the following
design note for details at
http://hdfeos.org/software/hdf5_handler/doc/Reengineering-HDF5-OPeNDAP-handler.pdf
Here are a few highlights for the implementation.
o The implementation of the CF option is separated from that of the
default option.
o The HDF5 1.8 APIs are used to retrieve HDF5 object information for
both the CF and the default options.
o The CF option only:
- HDF5 products are categorized and are separately handled except
for the modules that can be shared. One such example is the
module that makes the object names follow the CF name conventions.
- Translating metadata to DAP2 is separated from retrieving the
raw data.
- The handler provides an option to handle object name clashing.
- BES keys are used to replace the #ifdef macro. This makes the
code much cleaner and easier to maintain.
- The DAP2 variable and attribute names strictly follow the object
name conventions in the section 3.2.3 of the design note.
Implementation Details for HDF-EOS5 in CF option
------------------------------------------------
Swath: Based on the dimension information specified in the
StructMetadata file, fake dimension variables are generated with
integer values.
Zonal Average: The current version only supports the zonal average
file augmented by the HDF-EOS5 augmentation tool since only the
augmented zonal average files are found among NASA HDF-EOS5 zonal
average products. Dimension variables are constructed based on the
augmentation information stored in the file. For more information
about the augmentation, please refer to the BACKGROUND section of
HDF-EOS5 augmentation tool page at
http://hdfeos.org/software/aug_hdfeos5.php
Grid: The fake dimension handling is the same as Swath.
In addition, based on the projection parameters specified in the
StructMetadata file, 1-D latitude and longitude arrays are
automatically computed and added in the DAP2 output.
Metadata: If metadata (e.g., StructMetadata or CoreMetadata) is
split and stored into multiple attributes (e.g., StructMetadata.0,
StructMetadata.1, ..., StructMetadata.n), they are merged into one
string and then parsed so that it can be represented in structured
attribute format in DAS output.
Testing the HDF5 handler
------------------------
The handler source package has more than 50 test files under data/ directory.
If you build the handler from the source, the 'make check' command will test
both CF and default options using the test HDF5 files. The full C source codes
for generating the test HDF5 files are also available under data/src directory
although they are not compiled during the building or testing the handler.
Limitations for the CF option
-----------------------------
o Generally the mappings of 64-bit integer, time, enum, bitfield,
opaque, compound, array, and reference types are not supported.
The mapping of one HDF5 64-bit integer variable into two DAP2 32-bit
integers in GOSAT/acos is based on the discussions with the data
producers. Except one dimensional variable length string array,
the mapping of the variable length datatype is not supported either.
The handler simply ignores these unsupported datatypes.
o HDF5 files containing cyclic groups are not supported.
If such files are encountered, the handler hangs with infinite loops.
o The handler ignores soft links, external links and comments.
A hardlink is handled as an HDF5 object.
o For the HDF5 datasets created with the scalar dataspace, the handler
can only support the string datatypes. It ignores the datasets created
with other datatypes. HDF5 allows the size of a dimension to be 0
(zero) for a dataspace. The handler also ignores the datasets created
with such dataspace. The mapping of any HDF5 datasets with NULL
dataspace is also ignored.
o Currently, GOSAT/acos and OMI level 2G products cannot be visualized
by OPeNDAP visualization tools because of the limitations of the
current CF conventions and netCDF-Java visualization tools (IDV,
Panoply, etc.)
o We found object reference attributes in several NASA products.
Since these attributes are only used to generate the DAP2 dimensions
and coordinate variables, ignoring the mapping of these attributes
doesn't lose any essential information for OPeNDAP users.
o fileout_netcdf prints H5Fclose() internal error message on CentOS6
with Hyrax-1.8.8 and hdf5-1.8.5.patch1-7.el6.x86_64.rpm:
HDF5-DIAG: Error detected in HDF5 (1.8.5-patch1) thread 0:
#000: ../../src/H5F.c line 1957 in H5Fclose(): invalid file identifier
major: Invalid arguments to routine
minor: Inappropriate type
However, users can still get HDF5 as either NetCDF-3 or NetCDF-4
successfully. We strongly recommend you to use the latest HDF5 and
NetCDF RPMs.
Limitations in default option
-----------------------------
o No support for HDF5 files that have a '.' in a group/dataset
name.
o The mappings of HDF5 64-bit integer, time, enum, bitfield, and
opaque datatypes are not supported.
o Except for one dimensional HDF5 variable length string array, HDF5
variable length datatype is not supported either.
o HDF5 external links are ignored. The mapping of HDF5 objects with
NULL dataspace is not supported.
Additional background on the HDF5 handler
-----------------------------------------
The HDF5 handler is one component of the Hyrax BES; the Hyrax BES
software is designed to allow any number of handlers to be
configured easily. See the BES Server README and INSTALL files for
information about configuration, including how to use this handler.
Installing the HDF5 handler in Hyrax
------------------------------------
The Linux RPM package will install h5.conf file with all options true except
for the H5.EnableCheckNameClashing option.
A test HDF5 file is also installed, so after installing this handler, Hyrax
will have a data to serve providing an easy way to test your new
installation and to see how a working handler should look. To use
this, make sure that you first install the BES, and that dap-server
gets installed too.
Finally, every time you install or reinstall handlers, make sure to
restart the BES and OLFS.
Muqun Yang (myang6@hdfgroup.org)
Hyo-Kyung Lee (hyoklee@hdfgroup.org)