Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapefile to FlatGeobuf conversion fails with mixed Polygons and MultiPolygons #2828

Closed
kylebarron opened this issue Aug 4, 2020 · 9 comments

Comments

@kylebarron
Copy link

Expected behavior and actual behavior.

cc @bjornharrtell

Converting a Shapefile with mixed Polygon/MultiPolygon data to FlatGeobuf fails with error:

ERROR 1: ICreateFeature: Mismatched geometry type
ERROR 1: Unable to write feature 2 from layer polygons.
ERROR 1: Terminating translation prematurely after failed
translation of layer polygons (use -skipfailures to skip errors)

Converting the same GeoJSON to FlatGeobuf works fine.

I believe this is because Shapefiles have header metadata describing their geometry type while GeoJSON does not. Shapefiles don't have a MultiPolygon type, rather a single Polygon type that I believe can hold both single Polygons and MultiPolygons (see spec). FlatGeobuf files have two different geometry types, one for Polygon, another for MultiPolygon (see here).

So I believe this happens because the Shapefile metadata tells the FlatGeobuf writer to set the header to Polygon from the start, while the GeoJSON reader needs to scan the entire file before determining the output file type.

Related issues: #2255

Steps to reproduce the problem.

Test data: polygons.zip. Includes four files: polygons.{shx,shp,dbf} (shapefile version) and polygons.json (GeoJSON version).

Try to convert Shapefile to FlatGeobuf:

> ogr2ogr -f Flatgeobuf polygons.fgb polygons.shp
ERROR 1: ICreateFeature: Mismatched geometry type
ERROR 1: Unable to write feature 2 from layer polygons.
ERROR 1: Terminating translation prematurely after failed
translation of layer polygons (use -skipfailures to skip errors)

Output file only has the first two Polygon geometries, and not the third MultiPolygon geometry:

> ogrinfo polygons.fgb polygons
...
OGRFeature(polygons):0
  FID (Integer64) = 1
  POLYGON ((0 0,0 4,4 4,4 0,0 0),(1 1,3 1,3 3,1 3,1 1))

OGRFeature(polygons):1
  FID (Integer64) = 0
  POLYGON ((0 0,0 1,1 1,1 0,0 0))

Convert GeoJSON to FlatGeobuf successfully:

> ogr2ogr -f Flatgeobuf polygons.fgb polygons.json

Output file has three geometries, where one is a MultiPolygon:

OGRFeature(polygons):0
  FID (Integer) = 2
  MULTIPOLYGON (((2 2,2 3,3 3,3 2,2 2)),((4 4,4 5,5 5,5 4,4 4)))

OGRFeature(polygons):1
  FID (Integer) = 1
  POLYGON ((0 0,0 4,4 4,4 0,0 0),(1 1,3 1,3 3,1 3,1 1))

OGRFeature(polygons):2
  FID (Integer) = 0
  POLYGON ((0 0,0 1,1 1,1 0,0 0))

Operating system

MacOS 10.15.5

GDAL version and provenance

GDAL/OGR 3.1.0 from conda-forge

> ogrinfo --version
GDAL 3.1.0, released 2020/05/03
@kylebarron
Copy link
Author

kylebarron commented Aug 4, 2020

One other thing to note which may or may not be a bug: the order of the geometries is flipped in the output compared to the input JSON. In the original JSON, the MultiPolygon is the third feature while in the output FlatGeobuf it's the first feature. (Or at least, converting from JSON to FlatGeobuf to JSON, the order of the features is not consistent).

@bjornharrtell
Copy link
Contributor

I'm not sure any of these things should be regarded bugs, even if they are correct observations. In the case of mixed poly/multiply from shape I think it is reasonable that you should force it to multi, or possibly force it to unknown (but I haven't tried that).

@kylebarron
Copy link
Author

In the case of mixed poly/multiply from shape I think it is reasonable that you should force it to multi

Yes I agree, it should be forced to MultiPolygon (that's how it is when converting from GeoJSON) but the point of this issue is that it doesn't coerce to MultiPolygon, and there doesn't appear to be a way to make it do so (other than Shapefile -> GeoJSON -> FlatGeobuf). I'd argue that this should happen by default (as it does with GeoJSON) rather than raising an error and not finishing file conversion.

@bjornharrtell
Copy link
Contributor

If I recall correctly you can use the -nlt option for ogr2ogr to coerce, or option PROMOTE_TO_MULTI.

@kylebarron
Copy link
Author

kylebarron commented Aug 4, 2020

I still feel that my point stands about this being a default for Shapefile... Given that for other formats like GeoJSON it converts with no issue, I'd argue users would be confused and surprised when the same data originating from Shapefile wouldn't work.

Re -nlt PROMOTE_TO_MULTI, with that option the conversion succeeds, but then every geometry is considered a MultiPolygon (I haven't looked yet how the flatbuffers themselves have changed). Compare:

> ogr2ogr -f Flatgeobuf -nlt PROMOTE_TO_MULTI polygons.fgb polygons.shp
> ogrinfo polygons.fgb polygons
...
OGRFeature(polygons):0
  FID (Integer64) = 2
  MULTIPOLYGON (((2 2,2 3,3 3,3 2,2 2)),((4 4,4 5,5 5,5 4,4 4)))

OGRFeature(polygons):1
  FID (Integer64) = 1
  MULTIPOLYGON (((0 0,0 4,4 4,4 0,0 0),(1 1,3 1,3 3,1 3,1 1)))

OGRFeature(polygons):2
  FID (Integer64) = 0
  MULTIPOLYGON (((0 0,0 1,1 1,1 0,0 0)))

to

> ogr2ogr -f Flatgeobuf polygons2.fgb polygons.json
> ogrinfo polygons2.fgb polygons
OGRFeature(polygons):0
  FID (Integer) = 2
  MULTIPOLYGON (((2 2,2 3,3 3,3 2,2 2)),((4 4,4 5,5 5,5 4,4 4)))

OGRFeature(polygons):1
  FID (Integer) = 1
  POLYGON ((0 0,0 4,4 4,4 0,0 0),(1 1,3 1,3 3,1 3,1 1))

OGRFeature(polygons):2
  FID (Integer) = 0
  POLYGON ((0 0,0 1,1 1,1 0,0 0))

@bjornharrtell
Copy link
Contributor

Problem is that if the shapefile does not contain mixed types, you wouldn't want it to be unknown/mixed at the target. And at the same time you probably want to avoid scanning the shape file content as preprocessing. @rouault do you have any opinion here?

@rouault
Copy link
Member

rouault commented Aug 4, 2020

do you have any opinion here?

Your analysis is right. The shapefile case is a bit a annoying one. There is no real distinct Polygon vs Multipolygon geometries in shapefiles. For historic reason, the driver will always report Polygon, even if there are some (or only) multipolygon geometries emitted. The only proper solution would probably be for the driver to report Multipolygon layer geometry type and upgrade polygons to multipolygons. I've never decided to go for it as I'd suspect it would break a lot of things (in GDAL autotest mainly, and probably in downstream users)
Was just thinking that a potential mitigation would be to have a concept of "secondary layer geometry type", that would be set to MultiPolygon, and ogr2ogr would use it to inform the output driver. But the idea is still a bit fuzzy and somewhat messy.

@jratike80
Copy link
Collaborator

jratike80 commented Aug 5, 2020

There are other drivers behaving in the same way: PostGIS, GeoPackage, SpatiaLite etc. What is a proper thing to do depends on many things. Using POLYGON if shapefile contains just polygons and casting to MULTIPOLYGON if both types appear may feel right. But then you want to edit the resulting POLYGON table and add a multipolygon (first municipality of your country that is split to consist of two non-contiguous areas) and it is not possible because PostGIS table was created with POLYGON constraint. Converting all areas into multipolygons is very often OK but there are some processing tools that do not accept multipolygons, even if they have only one part. Using generic GEOMETRY that accepts all kind of geometries is the default in Oracle and suits well for storing data also into PostGIS, SpatiaLite and others, but if the storing side is simple the usage of such tables with clients like QGIS is not so simple.

It may be possible to select a better default for GDAL but sooner or later users will face a situation when they need to take the control and use the -nlt switch.

@jratike80
Copy link
Collaborator

Not a bug and not specific to FlatGeoBuf. Behavior can be controlled with -nlt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants