Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlatGeobuf driver #1742

Merged
merged 14 commits into from
Sep 16, 2019
Merged

FlatGeobuf driver #1742

merged 14 commits into from
Sep 16, 2019

Conversation

bjornharrtell
Copy link
Contributor

@bjornharrtell bjornharrtell commented Jul 24, 2019

What does this PR do?

This PR implements initial support for FlatGeobuf (read/write) and is more or less feature complete. See https://github.com/bjornharrtell/flatgeobuf for more information about the format.

Tasklist

  • Await format spec 1.0
  • Rebase on GDAL 3.x master
  • Initial CRS read/write support
  • Option to opt out spatial index generation
  • Squash/cleanup commits
  • Remove unintentional modications of ogr_geometry.h and ogrcsvdatasource.cpp
  • Remove conf.sh
  • Write docs
  • Handle Z and M dimensions
  • All CI builds and checks have passed

Known weaknesses

  • Writing logic requires at least as much RAM as the final file size. To improve it, significant work must be done to be able to sort features on hilbert value using some external sorting algorithm.
  • More test coverage and oss-fuzz would not hurt

Copy link
Member

@rouault rouault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a partial and at a skyscrapper height review...

gdal/ogr/ogrsf_frmts/GNUmakefile Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/packedrtree.h Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
@rouault
Copy link
Member

rouault commented Aug 4, 2019

It wouldn't hurt to test the robustness of the driver against corrupted/hostile files,by running oss-fuzz run locally. Some instructions in fuzzers/README.TXT. You'll need to modify https://github.com/google/oss-fuzz/blob/master/projects/gdal/Dockerfile in your clone to point to the git URL of your GDAL fork.

@bjornharrtell
Copy link
Contributor Author

Thanks for the review @rouault, it's fair enough it's at high level for now.

@bjornharrtell
Copy link
Contributor Author

Rebased and squashed. Remains to put it through oss-fuzz and probably other as of yet unknown shortcomings but otherwise ready for further scrutiny. :)

@bjornharrtell
Copy link
Contributor Author

I'll be at FOSS4G and might have time to hack on this there and if anyone interested want to join in or discuss don't hesitate to make contact. 🙂

Copy link
Member

@rouault rouault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI errors should also be addressed

gdal/ogr/ogrsf_frmts/generic/makefile.vc Show resolved Hide resolved
autotest/ogr/ogr_flatgeobuf.py Show resolved Hide resolved
autotest/ogr/ogr_flatgeobuf.py Show resolved Hide resolved
gdal/doc/source/drivers/vector/flatgeobuf.rst Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogr_flatgeobuf.h Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogr_flatgeobuf.h Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobufdataset.cpp Outdated Show resolved Hide resolved
gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
@jratike80
Copy link
Collaborator

You mean that not even empty geometries like "GEOMETRYCOLLECTION EMPTY" are not supported? I know that they are nasty to handle but they still tend to appear in real world data. But if it's documented then users can select for example GeoPackage for delivering such data.

@bjornharrtell
Copy link
Contributor Author

@jratike80 I don't want to support geometry collections either :) However I have reconsidered null geoms and it is possible in the now stable spec. As for "empty" it should be equivalent to null as geometry type is given. I agree, limitations should be clearly documented.

@jratike80
Copy link
Collaborator

We considered a few years ago how to express empty geometries in OpenJUMP and we ended up to support all possible variations: POINT EMPTY, MULTIPOINT EMPTY etc. I think that the idea was something like to support "this feature does not have geometry yet, but once it is digitized it will be POINT" GEOMETRYCOLLECTION EMPTY was selected to mean just any empty geometry. I am pretty sure that this is off-topic in context of your innovative flat buffer format and driver.

@bjornharrtell bjornharrtell force-pushed the flatgeobuf branch 2 times, most recently from d77a1b6 to 81e3c62 Compare August 29, 2019 15:23
bool m_hasM = false;
bool m_hasZ = false;
bool m_hasT = false;
bool m_hasTM = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what means m_hasTM ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TM stands for "time measurement" introduced by the discussion at flatgeobuf/flatgeobuf#6 (comment). Note that T is for geodetic decimal year time, also discussed on that same issue. May be a futile attempt to support something not yet standardized. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, when compared to https://lists.osgeo.org/pipermail/proj/2019-August/008806.html, I guess T would be the "Epoch of Expression" , and TM the "Epoch of Observation". In any case adding somewhere a comment to explain the semantics wouldn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right yes. It's somewhat documented uptream at https://github.com/bjornharrtell/flatgeobuf/blob/master/src/fbs/feature.fbs but deserve something more public if this becomes useful.

@bjornharrtell bjornharrtell force-pushed the flatgeobuf branch 4 times, most recently from 8c65498 to be41b9e Compare September 4, 2019 20:54
@rouault
Copy link
Member

rouault commented Sep 5, 2019

For ogr/ogr_basic_test.py::test_ogr_basic_10 - AssertionError: assert (0 !..., run test_ogrsf -all_drivers and see where/why it crashes

For ```./flatbuffers/base.h:354:1: error: unknown attribute 'no_sanitize' ignored [-Werror,-Wattributes]

supress_ubsan("alignment")

^

./flatbuffers/base.h:238:50: note: expanded from macro 'supress_ubsan'

#define supress_ubsan(type) attribute((no_sanitize(type)))

``
#if defined(__has_feature)
#  if __has_feature(alignment)
...

or test for a minimum gcc / clang version

For https://ci.appveyor.com/project/OSGeo/gdal/builds/27188300/job/or3lqhsxi570t0nb, missing cast

For https://ci.appveyor.com/project/OSGeo/gdal/builds/27188300/job/43yd0vh885ws1f9u, OBJ = ogrflatgeobufdataset.obj ogrflatgeobufdataset.obj packedrtree.obj from makefile.vc is buggy. It references ogrflatgeobufdataset.obj twice, instead of referencing ogrflatgeobuflayer.obj

@bjornharrtell
Copy link
Contributor Author

Thanks the the help @rouault. I have been able to fix all of them and a few more cases but still have one failing build job at https://travis-ci.com/OSGeo/gdal/jobs/232126400 (BUILD_NAME=trusty_clang DETAILS="optimized build, no libtool") which again I have trouble interpreting what is actually wrong.

@rouault
Copy link
Member

rouault commented Sep 10, 2019

The failure on gcore/vrt_read.py::test_vrt_shared_no_proxy_pool in trusty_clang is indeed a bit weird, and likely unrelated to your changes, so probably some 'random' issue that triggers due to the test suite being a bit different now.
Could you possibly try to add a ds = None assignment at the end of the test_vrt_no_explicit_dataAxisToSRSAxisMapping and test_vrt_explicit_dataAxisToSRSAxisMapping_1_2 steps that are just above test_vrt_shared_no_proxy_pool ?

@rouault
Copy link
Member

rouault commented Sep 13, 2019

You had me fooled me by pointing

ah sorry about that. Not obvious at first sight with of google/oss-fuzz vs oss-fuzz/oss-fuzz is the official repo. But indeed oss-fuzz/oss-fuzz is indicated as a fork...

@rouault
Copy link
Member

rouault commented Sep 13, 2019

test_ogrsf -all_drivers leaks

Direct leak of 40 byte(s) in 2 object(s) allocated from:
    #0 0x7f47e7e242f3 in __interceptor_malloc /tmp/final/llvm.src/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:88:3
    #1 0x7f47df45e66c in VSIMalloc /home/travis/build/OSGeo/gdal/gdal/port/cpl_vsisimple.cpp:561:12
    #2 0x7f47def57add in CPLMalloc /home/travis/build/OSGeo/gdal/gdal/port/cpl_conv.cpp:185:21
    #3 0x7f47def58119 in CPLStrdup /home/travis/build/OSGeo/gdal/gdal/port/cpl_conv.cpp:300:43
    #4 0x7f47e2b66ce9 in OGRGetXMLDateTime(OGRField const*) /home/travis/build/OSGeo/gdal/gdal/ogr/ogrutils.cpp:1374:18
    #5 0x7f47e10035cb in OGRFlatGeobufLayer::ICreateFeature(OGRFeature*) /home/travis/build/OSGeo/gdal/gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp:830:35
    #6 0x7f47e1816516 in OGRLayer::CreateFeature(OGRFeature*) /home/travis/build/OSGeo/gdal/gdal/ogr/ogrsf_frmts/generic/ogrlayer.cpp:630:12
    #7 0x44f930 in TestCreateLayer(GDALDriver*, OGRwkbGeometryType) /home/travis/build/OSGeo/gdal/gdal/apps/test_ogrsf.cpp:745:16
    #8 0x411194 in TestCreate(GDALDriver*, int) /home/travis/build/OSGeo/gdal/gdal/apps/test_ogrsf.cpp:999:13
    #9 0x40df61 in ThreadFunctionInternal(ThreadContext*) /home/travis/build/OSGeo/gdal/gdal/apps/test_ogrsf.cpp:311:25
    #10 0x40dac9 in ThreadFunction(void*) /home/travis/build/OSGeo/gdal/gdal/apps/test_ogrsf.cpp:269:9
    #11 0x40d087 in main /home/travis/build/OSGeo/gdal/gdal/apps/test_ogrsf.cpp:220:9
    #12 0x7f47c9b1982f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

auto org = crs->org();
auto code = crs->code();
auto wkt = crs->wkt();
m_poSRS->SetAuthority(nullptr, org->c_str(), code);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is likely useless. A SRS with just an authority node is invalid

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok, the idea was to handle the case where org is something other than EPSG and no WKT is supplied and as such cannot be imported by the logic that follows.

gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp Outdated Show resolved Hide resolved
@rouault
Copy link
Member

rouault commented Sep 14, 2019

The attached oss-fuzz generated file causes leaks
leak-e90271eb9e68e567d93c6736aff428c92b5a74c2.zip

[...]
    #14 0x79eee5 in OGRSpatialReference::importFromWkt(char const*) /src/gdal/gdal/ogr/ogrspatialreference.cpp:1728:12
    #15 0x14a9dc8 in OGRFlatGeobufLayer::OGRFlatGeobufLayer(FlatGeobuf::Header const*, unsigned char*, char const*, unsigned long) /src/gdal/gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobuflayer.cpp:78:22
    #16 0x14a5ef0 in OGRFlatGeobufDataset::Open(GDALOpenInfo*) /src/gdal/gdal/ogr/ogrsf_frmts/flatgeobuf/ogrflatgeobufdataset.cpp:185:39
    #17 0x10fcef0 in GDALOpenEx /src/gdal/gdal/gcore/gdaldataset.cpp:3377:20
    #18 0x6a3181 in OGROpen /src/gdal/gdal/ogr/ogrsf_frmts/generic/ogrsfdriverregistrar.cpp:113:24
    #19 0x57588e in LLVMFuzzerTestOneInput /src/gdal/gdal/./fuzzers/ogr_fuzzer.cpp:106:26
    #20 0x47d311 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:553:15
    #21 0x47ca35 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool*) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:469:3
    #22 0x47ed37 in fuzzer::Fuzzer::MutateAndTestOne() /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:695:19
    #23 0x47faf5 in fuzzer::Fuzzer::Loop(std::Fuzzer::vector<fuzzer::SizedFile, fuzzer::fuzzer_allocator<fuzzer::SizedFile> >&) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:831:5
    #24 0x46d546 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:825:6
    #25 0x497112 in main /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerMain.cpp:19:10
    #26 0x7f2df70d682f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)

@bjornharrtell
Copy link
Contributor Author

oss-fuzz sure is fun stuff :)

@bjornharrtell
Copy link
Contributor Author

Getting a timeout in fuzzing when calculating tree size, I thought it would go away with bounds checking introduced in 2b8dedf but it appears not to. Error is like follows:

==11== ERROR: libFuzzer: timeout after 87 seconds
    #0 0x54cc61 in __sanitizer_print_stack_trace /src/llvm/projects/compiler-rt/lib/asan/asan_stack.cpp:86:3
    #1 0x496c28 in fuzzer::PrintStackTrace() /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:205:5
    #2 0x47bd19 in fuzzer::Fuzzer::AlarmCallback() /src/llvm/projects/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:300:5
    #3 0x7f285f51a38f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1138f)
    #4 0x6d8e97 in FlatGeobuf::PackedRTree::size(unsigned long, unsigned short) /src/gdal/gdal/ogr/ogrsf_frmts/flatgeobuf/packedrtree.cpp:318:35

Is there any way to find out what arguments is seemingly causing the algorithm at https://github.com/bjornharrtell/gdal/blob/2b8dedf3dfb75e5b9385633695d440c844cb6f3f/gdal/ogr/ogrsf_frmts/flatgeobuf/packedrtree.cpp#L310-L324 to go into an infinite loop or is it just that it tries it with so many permutations that it takes forever?

@rouault
Copy link
Member

rouault commented Sep 14, 2019

Is there any way to find out what arguments is seemingly causing the algorithm at https://github.com/bjornharrtell/gdal/blob/2b8dedf3dfb75e5b9385633695d440c844cb6f3f/gdal/ogr/ogrsf_frmts/flatgeobuf/packedrtree.cpp#L310-L324 to go into an infinite loop or is it just that it tries it with so many permutations that it takes forever?

It should normally generate a file that reproduces the issue in ./build/out/gdal/.
But looking at that code, I can see that if it is called with numItems = 0, then n will never be 1, hence the infinite loop

@bjornharrtell
Copy link
Contributor Author

Of course! 🤦

@bjornharrtell
Copy link
Contributor Author

Not getting any errors from oss-fuzz anymore but the run never stops, left it running for more than 3 hours when the latest line it outputted was:

#50736769 REDUCE cov: 2154 ft: 2775 corp: 103/54Kb lim: 4096 exec/s: 4107 rss: 95Mb L: 537/3120 MS: 2 ChangeBit-EraseBytes-

Mabye it needs to be put though the real oss-fuzzer now.

@rouault
Copy link
Member

rouault commented Sep 16, 2019

@bjornharrtell OK, if you're happy with this, I can merge it.

@bjornharrtell
Copy link
Contributor Author

@rouault I'm happy and hope it is in good enough shape for real world use. I will take care of it in the future to the best of my ability.

@rouault rouault merged commit 638a6f8 into OSGeo:master Sep 16, 2019
@jratike80
Copy link
Collaborator

I guess that the document page that is referred to in the code is to come because I could not find it yet:
poDriver->SetMetadataItem(GDAL_DMD_HELPTOPIC, "drv_flatgeobuf.html");

@rouault
Copy link
Member

rouault commented Sep 16, 2019

I guess that the document page that is referred to in the code...

Good observation. Up to now we provided URIs to the old static HTML pages. Probably here we should provide as URI "drivers/vector/flatgeobuf.html"

@bjornharrtell bjornharrtell deleted the flatgeobuf branch September 17, 2019 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants