Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TileDB support - https://www.tiledb.io #1402

Merged
merged 27 commits into from
Apr 12, 2019
Merged

Conversation

normanb
Copy link
Contributor

@normanb normanb commented Mar 29, 2019

What does this PR do?

This PR adds support for TileDB. https://www.tiledb.io

In particular it add simple copying of subdatasets from HDF5 and netCDF, by using the -SDS parameter.

e.g.

gdal_translate -OF TileDB -SDS DeepBlue-SeaWiFS-1.0_L3_20100101_v004-20130604T131317Z.h5 DeepBlue

The code also supports the GDAL API between regular rasters.

There are two items of particular interest;

Subdataset support in the TileDB driver is quite simple, I hack this by creating the XML tree on Create. This is currently overwritten by gdalinfo -stats, once we approve the metadata approach I will fix this bug.

Secondly, I copy a block of a dataset with multiple subdatasets. I would like to use the swath calculation code in rasterio, can this be public rather than a static internal function?

What are related issues/pull requests?

Tasklist

  • Add test case(s)
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

@normanb
Copy link
Contributor Author

normanb commented Mar 29, 2019

Some of travis builds are failing and yet the tiledb driver isn't enabled in travis. Is this a known problem or should I look into it some more?

@rouault
Copy link
Member

rouault commented Mar 29, 2019

should I look into it some more?

There are some code analysis being done even for uncompiled code, like no tabs, cppcheck, etc

@normanb normanb mentioned this pull request Apr 1, 2019
@normanb
Copy link
Contributor Author

normanb commented Apr 3, 2019

I am not seeing errors due to tabs/cppcheck, they are related to pytest errors with the GTIFF driver.

@rouault
Copy link
Member

rouault commented Apr 4, 2019

I am not seeing errors due to tabs/cppcheck, they are related to pytest errors with the GTIFF driver.

You have to check all targets. They don't do all the same checks.

Checking for tabulation characters...
frmts/tiledb/tiledbdataset.cpp
FAIL: tabulations detected. Please remove them!

I let you check other ones

@rouault
Copy link
Member

rouault commented Apr 6, 2019

Very partial review (I didn't have a look at tiledbdataset.cpp)...

  • DeepBlue-SeaWiFS-1.0_L3_20100101_v004-20130604T131317Z.h5 is a bit big. We try to keep files in autotest at max a few hundred KB each
  • the mechanism to copy all subdatasets is a bit hacky (admitedly the subdataset concept in GDAL is a bit of a hack itself), in particular the use of bStrict in gdal_translate_lib.cpp. Probably that GDALDriver would need a dedicated method to copy all subdatasets at once. Or a dedicated creation option COPY_ALL_SUBDATASETS ?
  • please use X/MIT license for autotest files (the use of LPLv2 in a few existing ones is historical)
  • would it be possible to add compile & runtime support in one of the Travis-CI for the driver ?

@normanb
Copy link
Contributor Author

normanb commented Apr 6, 2019

Thanks for the quick look over. I use bStrict and -SDS to get past the initial gdal_translate check if the file has raster bands or subdatasets otherwise the command never reaches the driver. It is a hack for sure.

If I add a dedicated method to that driver then I lose the additional gdal tooling (translate and warp in particular).

I can find a small hdf file for testing, I was using a larger one for development :)

@rouault
Copy link
Member

rouault commented Apr 9, 2019

@normanb I've a branch based on yours at https://github.com/rouault/gdal/tree/tiledb_fixes_1 with a few fixes. I tried to issue a PR on TileDB-Inc/gdal:tiledb-dev but github doesn't show TileDB-Inc/gdal as a potential base (probably because not forked from github UI)

@normanb
Copy link
Contributor Author

normanb commented Apr 9, 2019

@rouault thanks for the changes, I have created a PR here TileDB-Inc#2. I have a couple of CI issues to fix and then I will merge your changes in.

@normanb
Copy link
Contributor Author

normanb commented Apr 9, 2019

@rouault I will hold off merging your changes until you are ready

@rouault
Copy link
Member

rouault commented Apr 9, 2019

pytest ../autotest/gcore/misc.py crashes, likely when trying to create datasets in invalid paths.
Can be reproduced simply in 2 code paths with

$ gdal_translate ../autotest/gcore/data/byte.tif /foo/tmp_tiledb -of tiledb 
Input file size is 20, 20
ERROR 1: [TileDB::IO] Error: Cannot create directory '/foo/tmp_tiledb'; No such file or directory
ERROR 1: [TileDB::StorageManager] Error: Cannot open array; Array does not exist
ERROR 1: [TileDB::Array] Error: Cannot get array schema; Array is not open
ERROR 1: [TileDB::Array] Error: Cannot get array schema; Array is not open
ERROR 1: [TileDB::Array] Error: Cannot get query_type; Array is not open
ERROR 1: Error: Cannot create query; Input array is not open
ERROR 1: Error: Invalid TileDB query object
ERROR 1: Error: Invalid TileDB array schema object
ERROR 1: Error: Invalid TileDB domain object
terminate called after throwing an instance of 'tiledb::TypeError'
  what():  Static type (UINT64) does not match expected type (INT32)
Abandon (core dumped)

or

$ gdal_translate ../autotest/gdrivers/data/DeepBlue-SeaWiFS-1.0_L3_20100101_v004-20130604T131317Z.h5 /foo/tmp_tiledb -of tiledb -sds
ERROR 1: [TileDB::IO] Error: Cannot create directory '/foo/tmp_tiledb'; No such file or directory
ERROR 1: [TileDB::StorageManager] Error: Cannot open array; Array does not exist
ERROR 1: [TileDB::Array] Error: Cannot get array schema; Array is not open
0ERROR 1: [TileDB::Array] Error: Cannot get array schema; Array is not open
ERROR 1: [TileDB::Array] Error: Cannot get query_type; Array is not open
ERROR 1: Error: Cannot create query; Input array is not open
ERROR 1: Error: Invalid TileDB query object
ERROR 1: Error: Invalid TileDB array schema object
terminate called after throwing an instance of 'tiledb::TypeError'
  what():  Static type (FLOAT32) does not match expected type ()
Abandon (core dumped)

I see that a custom error handler is installed, but I guess early exit should be done when tiledb::Array::create() fails

@rouault
Copy link
Member

rouault commented Apr 9, 2019

Another crashing situation

$ rm -rf tmp_tiledb && gdal_translate ../autotest/gdrivers/data/DeepBlue-SeaWiFS-1.0_L3_20100101_v004-20130604T131317Z.h5 tmp_tiledb -of tiledb -sds
0...10...20...30...40...50...60...70...80...90...100 - done.

$ gdal_translate ../autotest/gcore/data/byte.tif tmp_tiledb -of tiledb
Input file size is 20, 20
ERROR 1: [TileDB::StorageManager] Error: Cannot create array; Array 'file:///home/even/gdal/git/gdal_tmp/gdal/tmp_tiledb' already exists
terminate called after throwing an instance of 'tiledb::SchemaMismatch'
  what():  Subarray should have num_dims * 2 values: (low, high) for each dimension.
Abandon (core dumped)

@rouault
Copy link
Member

rouault commented Apr 9, 2019

I'm done for tonight

@normanb
Copy link
Contributor Author

normanb commented Apr 9, 2019

Thanks @rouault, your changes look great.

For the invalid path I propose using tiledb::VFS (tiledb virtual file system) to check the parent path (or remote) exists and if not exiting early.

The same with the second crash, if the path exists then the code will error out early.

You have changed some of this code to use unique pointers so I will merge that in first.

I will get to this tomorrow (central time).

@rouault
Copy link
Member

rouault commented Apr 10, 2019

Regarding the crashes, in addition to the fixes you suggest, shouldn't exceptions be caught if libtiledb can throw ?

@normanb
Copy link
Contributor Author

normanb commented Apr 10, 2019

this is tied to the error_handler, currently I have it using CPLError the default TileDB ErrorHandler throws a TileDBError. I can change this code.

@rouault
Copy link
Member

rouault commented Apr 10, 2019

this is tied to the error_handler, currently I have it using CPLError the default TileDB ErrorHandler throws a TileDBError. I can change this code.

ok, I think it might be more reliable to do put the content of every method that calls the tiledb API within try { } catch( ) { CPLErrror(...) }

@normanb
Copy link
Contributor Author

normanb commented Apr 11, 2019

lets hold off GDALCopyWholeRasterGetSwathSize() with this PR and it can be added as a performance enhancement later

@normanb
Copy link
Contributor Author

normanb commented Apr 11, 2019

I hosed my local install of gdal for testing invalid creation paths, I will check again tomorrow (central time)

@rouault
Copy link
Member

rouault commented Apr 11, 2019

new PR: TileDB-Inc#3

@normanb
Copy link
Contributor Author

normanb commented Apr 12, 2019

@rouault I ran the docker image travisci/ci-garnet:packer-1512502276-986baf0 and was able to build tiledb with GDAL as root without any problems. I will add this to my tasks for the osgeo code sprint, I have backed out my changes so we can move forward with merging this PR.

@rouault rouault merged commit 7601a63 into OSGeo:master Apr 12, 2019
@rouault
Copy link
Member

rouault commented Apr 12, 2019

Squashed and merged

@normanb
Copy link
Contributor Author

normanb commented Apr 12, 2019

Thanks @rouault for the review and code changes :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants