Skip to content

Commit

Permalink
HDF Reader (#2934)
Browse files Browse the repository at this point in the history
* Copy icebridge files over to hdf plugin

* WIP
Print out info from hard-coded autzen. remove test points in test file. iterate over dimensions and print size, offset, type and endianness

* Remove unused

* Compiles!

* Reads!

* Fix offset calculation

* Fix hard coded struct size

* Fix hard coded number of points

* Remove hard coded dataset name, add dataset selection from stage options

* Add case for non-compound datatypes

* Fix compound case

* Remove code for compound types

* Add ability to name dataset

* Cleanup

* Fix type bug

* Reads one chunk - wip

* WIP

* No longer segfaults - wip

* Reads an entire array

* Add JSON

* Read dataset and dims from JSON map

* WIP

* Remove unused

* Cleanup

* Fix key-value misordering

* Works for multiple dims

* Use proper loggers

* add simple readers.hdf test based on autzen

* Make chunkSize a property of dimension, rather than file

* Clean up loop

* Fix indexing bug

* Init buffers in prepare() instead of read()

* Streams!!!

* Fix sign-compare warning

* Add json validation

* Check that all datasets are the same length

* Update test data file with metadata

* Add HDF Reader docs WIP

* Add check for null map

* Add HDF plugin to AZP

* Connor's changes

* Stuff WIP

* Review changes

* Feedback

* Address feedback

* Add more test cases

* random chunk sizes, check GPSTime

* CMake feedback: Cleanup, remove LibXML2

* Get rid of compiler warnings

* Address doc feedback

* Cleanup

* Step 1

* Step 2

* Remove BufferInfo; return pointer to val instead of pointer to bufer

* Remove unused

* Get rid of m_buffers

* Style

* Get rid of all parallel vectors

* Don't open dataset on each point. Fixes 20x performance regression introduced by last commit

* Use range based for-loops

* Remove unused

* Move Handler class into hdf5 namespace

* Fix outdated error

* Sepearte initialization code into constructor

* Cleanup

* Make getValue a method of DimInfo instead of Handler

* Styling

* Remove unused

* Use fewer auto types

* Change shared_ptr to unique_ptr

* Fix macOS compiler warning

Co-authored-by: Howard Butler <howard@hobu.co>
  • Loading branch information
Ryan Pals and hobu committed Feb 21, 2020
1 parent b199b90 commit 366bd3b
Show file tree
Hide file tree
Showing 15 changed files with 821 additions and 0 deletions.
5 changes: 5 additions & 0 deletions cmake/options.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ option(BUILD_PLUGIN_ICEBRIDGE
add_feature_info("Icebridge plugin" BUILD_PLUGIN_ICEBRIDGE
"read data in the Icebridge format")

option(BUILD_PLUGIN_HDF
"Choose if HDF support should be built" FALSE)
add_feature_info("HDF plugin" BUILD_PLUGIN_HDF
"read data in the HDF format")

option(BUILD_PLUGIN_MATLAB
"Choose if Matlab support should be built" FALSE)
add_feature_info("Matlab plugin" BUILD_PLUGIN_MATLAB
Expand Down
112 changes: 112 additions & 0 deletions doc/stages/readers.hdf.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
.. _readers.hdf:

readers.hdf
===============

The **HDF reader** reads data from files in the
`HDF5 format. <https://www.hdfgroup.org/solutions/hdf5/>`_
You must explicitly specify a mapping of HDF datasets to PDAL
dimensions using the dimensions parameter. ALL dimensions must
be scalars and be of the same length. Compound types are not
supported at this time.


.. plugin::

.. streamable::

Example
-------
This example reads from the Autzen HDF example with all dimension
properly mapped and then outputs a LAS file.

.. code-block:: json
[
{
"type": "readers.hdf",
"filename": "test/data/hdf/autzen.h5",
"dimensions":
{
"X" : "autzen/X",
"Y" : "autzen/Y",
"Z" : "autzen/Z",
"Red" : "autzen/Red",
"Blue" : "autzen/Blue",
"Green" : "autzen/Green",
"Classification" : "autzen/Classification",
"EdgeOfFlightLine" : "autzen/EdgeOfFlightLine",
"GpsTime" : "autzen/GpsTime",
"Intensity" : "autzen/Intensity",
"NumberOfReturns" : "autzen/NumberOfReturns",
"PointSourceId" : "autzen/PointSourceId",
"ReturnNumber" : "autzen/ReturnNumber",
"ScanAngleRank" : "autzen/ScanAngleRank",
"ScanDirectionFlag" : "autzen/ScanDirectionFlag",
"UserData" : "autzen/UserData"
}
},
{
"type" : "writers.las",
"filename": "output.las",
"scale_x": 1.0e-5,
"scale_y": 1.0e-5,
"scale_z": 1.0e-5,
"offset_x": "auto",
"offset_y": "auto",
"offset_z": "auto"
}
]
.. note::
All dimensions must be simple numeric HDF datasets with
equal lengths. Compound types, enum types, string types,
etc. are not supported.


.. warning::
The HDF reader does not set an SRS.


Common Use Cases
----------------

A possible use case for this driver is reading NASA's ICESAT2 data.
This example reads the X, Y, and Z coordinates from the ICESAT2
ATL03 format and converts them into a LAS file.

.. note::
ICESAT2 data use EPSG:7912.

.. code-block:: json
[
{
"type": "readers.hdf",
"filename": "ATL03_20190906201911_10800413_002_01.h5",
"dimensions":
{
"X" : "gt1l/heights/lon_ph",
"Y" : "gt1l/heights/lat_ph",
"Z" : "gt1l/heights/h_ph"
}
},
{
"type" : "writers.las",
"filename": "output.las"
}
]
`ICESAT2 Data products Documentation <https://icesat-2.gsfc.nasa.gov/science/data-products>`_
~
~
Options
-------

.. include:: reader_opts.rst

dimensions
A JSON map with PDAL dimension names as the keys and HDF dataset paths as the values.

4 changes: 4 additions & 0 deletions plugins/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ if(BUILD_PLUGIN_ICEBRIDGE)
add_subdirectory(icebridge)
endif()

if(BUILD_PLUGIN_HDF)
add_subdirectory(hdf)
endif()

if(BUILD_PLUGIN_MATLAB)
add_subdirectory(matlab)
endif()
Expand Down
29 changes: 29 additions & 0 deletions plugins/hdf/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#
# HDF plugin CMake configuration
#

include (${PDAL_CMAKE_DIR}/hdf5.cmake)


if (NOT PDAL_HAVE_HDF5)
message(FATAL "HDF5 library is required to build HDF support.")
endif()

PDAL_ADD_PLUGIN(libname reader hdf
FILES
io/HdfReader.cpp
io/Hdf5Handler.cpp
LINK_WITH
${HDF5_LIBRARIES}
INCLUDES
${ROOT_DIR}
${NLOHMANN_INCLUDE_DIR}
)

if (WITH_TESTS)
PDAL_ADD_TEST(pdal_io_hdf_reader_test
FILES test/HdfReadertest.cpp
LINK_WITH ${libname}
INCLUDES
${NLOHMANN_INCLUDE_DIR})
endif()
199 changes: 199 additions & 0 deletions plugins/hdf/io/Hdf5Handler.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
/******************************************************************************
* Copyright (c) 2014, Connor Manning, connor@hobu.co
*
* All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following
* conditions are met:
*
* * Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in
* the documentation and/or other materials provided
* with the distribution.
* * Neither the name of Hobu, Inc. or Flaxen Geo Consulting nor the
* names of its contributors may be used to endorse or promote
* products derived from this software without specific prior
* written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
* "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
* COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
* OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
* AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
* OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
* OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
* OF SUCH DAMAGE.
****************************************************************************/

#include "Hdf5Handler.hpp"
#include <pdal/util/FileUtils.hpp>
#include <pdal/pdal_types.hpp>
#include <pdal/Dimension.hpp>

namespace pdal
{

using namespace hdf5;

void Handler::setLog(pdal::LogPtr log) {
m_logger = log;
}


void Handler::initialize(
const std::string& filename,
const std::map<std::string,std::string>& map)
{
try
{
m_h5File.reset(new H5::H5File(filename, H5F_ACC_RDONLY));
}
catch (const H5::FileIException&)
{
throw pdal_error("Could not open HDF5 file '" + filename + "'.");
}

// Create our vector of dimensions and associated data
for( auto const& entry : map) {
std::string const& dimName = entry.first;
std::string const& datasetName = entry.second;
m_dimInfos.emplace_back(DimInfo(dimName, datasetName, m_h5File.get()));
}

// Check that all dimensions have equal lengths
m_numPoints = m_dimInfos.at(0).getNumPoints();
for( DimInfo& info : m_dimInfos) {
if(m_numPoints != info.getNumPoints()) {
throw pdal_error("All given datasets must have the same length");
}
}
}


void Handler::close()
{
m_h5File->close();
}


uint8_t *DimInfo::getValue(pdal::point_count_t pointIndex) {
if(pointIndex < chunkLowerBound || pointIndex >= chunkUpperBound) {
// load new chunk
H5::DataSpace dspace = m_dset.getSpace();

chunkLowerBound = (pointIndex / m_chunkSize) * m_chunkSize;
chunkUpperBound = std::min(chunkLowerBound + m_chunkSize, m_numPoints);

hsize_t selectionSize = chunkUpperBound - chunkLowerBound;

H5::DataSpace memspace(1, &selectionSize);
dspace.selectHyperslab(H5S_SELECT_SET, &selectionSize, &chunkLowerBound);
m_dset.read(m_buffer.data(),
m_dset.getDataType(),
memspace,
dspace );

}
hsize_t pointOffsetWithinChunk = pointIndex - chunkLowerBound;
return m_buffer.data() + pointOffsetWithinChunk * m_size;
}


hsize_t Handler::getNumPoints() const
{
return m_numPoints;
}

DimInfo::DimInfo(
const std::string& dimName,
const std::string& datasetName,
H5::H5File *file
)
: m_name(dimName)
, m_dset(file->openDataSet(datasetName))
{
// Will throw if dataset doesn't exists. Gives adequate error message
H5::DataSpace dspace = m_dset.getSpace();

// Sanity check before we cast from signed to unsigned
if(dspace.getSelectNpoints() < 0)
throw pdal_error("Selection had a negative number of points. "
"this should never happen, and it's probably a PDAL bug.");
m_numPoints = (hsize_t) dspace.getSelectNpoints();

// check if dataset is 'chunked'
H5::DSetCreatPropList plist = m_dset.getCreatePlist();
if(plist.getLayout() == H5D_CHUNKED) {
int dimensionality = plist.getChunk(1, &m_chunkSize); //modifies m_chunkSize
if(dimensionality != 1)
throw pdal_error("Only 1-dimensional arrays are supported.");
} else {
//if dataset is not chunked, use an arbitrary number for buffer size
m_chunkSize = 1024; // completely arbitrary number
}

// populate fields base on HDF type
H5T_class_t vague_type = m_dset.getDataType().getClass();

if(vague_type == H5T_INTEGER) {
H5::IntType int_type = m_dset.getIntType();
H5T_sign_t sign = int_type.getSign();
m_size = int_type.getSize();
if(sign == H5T_SGN_2)
m_pdalType = Dimension::Type(unsigned(Dimension::BaseType::Signed) | int_type.getSize());
else
m_pdalType = Dimension::Type(unsigned(Dimension::BaseType::Unsigned) | int_type.getSize());
}
else if(vague_type == H5T_FLOAT) {
H5::FloatType float_type = m_dset.getFloatType();
m_size = float_type.getSize();
m_pdalType = Dimension::Type(unsigned(Dimension::BaseType::Floating) | float_type.getSize());
}
else {
throw pdal_error("Dataset '" + datasetName + "' has an " +
"unsupported type. Only integer and float types are supported.");
}

//allocate buffer for getValue() to write into
m_buffer.resize(m_chunkSize*m_size);
}


std::vector<pdal::hdf5::DimInfo>& Handler::getDimensions() {
return m_dimInfos;
}


void DimInfo::setId(Dimension::Id id) {
m_pdalId = id;
}


Dimension::Id DimInfo::getId() {
return m_pdalId;
}


Dimension::Type DimInfo::getPdalType() {
return m_pdalType;
}


std::string DimInfo::getName() {
return m_name;
}


hsize_t DimInfo::getNumPoints() {
return m_numPoints;
}

} // namespace pdal

0 comments on commit 366bd3b

Please sign in to comment.