Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.8.13.post1 causes mask filter to act like a bounding box filter #905

Closed
nikmolnar opened this issue May 30, 2020 · 16 comments
Closed

1.8.13.post1 causes mask filter to act like a bounding box filter #905

nikmolnar opened this issue May 30, 2020 · 16 comments
Assignees

Comments

@nikmolnar
Copy link

Expected behavior and actual behavior.

I have a script that summarizes features in dataset A that intersect with features in dataset B. This code relies on .items(mask=geometry_B) to select the features of dataset A that exist within each feature of dataset B.

This code works fine in Fiona 1.8.13. But in 1.8.13.post1, it seems to filter by bounding box, rather than the actual feature geometry.

Steps to reproduce the problem.

This is a simplified version of the code in question. I am unable to share the data, but I could come up with other data to reproduce the problem, if it becomes necessary.

with fiona.open('dataset_b.gpkg') as dataset_b:
    for feature in dataset_b:
        with fiona.open('dataset_a.gpkg') as dataset_a:
            items = dataset_a.items(mask=feature["geometry"])

Operating system

macOS 10.15.3

Fiona and GDAL version and provenance

Fiona: 1.8.13.post1
GDAL: 2.4.2 (built from source)

@sgillies
Copy link
Member

@nikmolnar I think there must be something else going on. The fiona code in that area hasn't changed between those versions: https://github.com/Toblerity/Fiona/blame/4750ae99119c7bd2bc38d7842fd57c1c2db2cd2e/fiona/ogrext.pyx#L1233-L1236.

Now, the GDAL/OGR function that fiona calls, OGR_L_SetSpatialFilter, is format dependent. Is it possible that your application changed formats and that the new one doesn't support geometry filtering? GeoPackage files might have special indexing needs. I'm not very familiar with the format, I confess.

It's also noted in https://gdal.org/api/vector_c_api.html#_CPPv422OGR_L_SetSpatialFilter9OGRLayerH12OGRGeometryH that the test isn't 100% accurate and that there may be some false positives. It doesn't say that this is format dependent, but that is my take.

@nikmolnar
Copy link
Author

@sgillies l suspected format at first, too, since I had previously used shapefiles. However, with no code or data changes, I've observed this working correctly at version 1.8.13 and incorrectly at version 1.8.13.post1.

I will see if the same happens with shapefiles, and try to put together a minimally complete example to reproduce.

@nikmolnar
Copy link
Author

nikmolnar commented May 31, 2020

Ok, here is a minimum reproduction of the problem, using shapefiles.

import fiona


def main():
    with fiona.open('california/california.shp') as california_shp:
        feature = next(california_shp)

        with fiona.open('counties/counties.shp') as counties_shp:
            items = list(counties_shp.items(mask=feature['geometry']))

            print('{} counties intersecting California:'.format(len(items)))
            for county in sorted(item[1]['properties']['NAME'] for item in items):
                print(county)


if __name__ == '__main__':
    main()

Code + data are attached (California + southwest US counties).

Here is the output with Fiona 1.8.13 (note this picks up counties bordering CA, not only counties within it).

$ python intersect.py
69 counties intersecting California:
...snip...

Now with Fiona 1.8.13.post1, and no other changes. These aren't just bordering counties, this is anything withing California's bounding box (includes Lander, which is in the middle of Nevada).

$ python intersect.py
88 counties intersecting California:
...snip...
Lander
...snip...

fiona-intersect.zip

@rbuffat
Copy link
Contributor

rbuffat commented May 31, 2020

I can't reproduce it on Linux, but I can reproduce it on OSX builds on travis with Fiona 1.8.13.post1 wheels:

With Fiona 1.8.13.post1:

88 counties intersecting California:

https://travis-ci.org/github/rbuffat/debug_mask/jobs/693223111

With Fiona 1.8.13:

69 counties intersecting California:

https://travis-ci.org/github/rbuffat/debug_mask/jobs/693223114

All builds:
https://travis-ci.org/github/rbuffat/debug_mask/builds/693223107
Github repo:
https://github.com/rbuffat/debug_mask

@sgillies
Copy link
Member

sgillies commented May 31, 2020

The OS X wheels for 1.8.13 include GEOS 3.6.2 and GDAL 2.4.3. The wheels for 1.8.13.post1 include GEOS 3.8.0 and GDAL 2.4.4. No fiona code changed. I think we should look for changes in GDAL and GEOS. But as the GDAL/OGR API say that the filter is approximate, this might not be a regression from the GDAL project's perspective and may have been changed without a change log entry.

@rbuffat
Copy link
Contributor

rbuffat commented Jun 2, 2020

I extended Fiona's travis install scripts to compile GEOS from source and added the test case. But with this setup, the behavior can not be reproduced on OSX.

https://travis-ci.org/github/rbuffat/Fiona/builds/693742321
https://github.com/rbuffat/Fiona/tree/debug_mask_1_8_13

While testing the OSX wheels, I noticed the following tests succeed.

def test_mask():
    
    schema = {'geometry': 'Point', 'properties': OrderedDict([('position_i', 'int'), ('position_j', 'int')])}
    driver = 'ESRI Shapefile'
    records = [{'geometry': {'type': 'Point', 'coordinates': (float(i), float(j))}, 'properties': {'position_i': i, 'position_j': j}} for i in range(100) for j in range(100)]

    tmpdir = tempfile.mkdtemp()
    path = os.path.join(tmpdir, "test.shp")
    
    with fiona.open(path, 'w',
                    driver=driver,
                    schema=schema) as c:
        c.writerecords(records)

    with fiona.open(path) as c:
        items = list(c.items(mask={'type': 'Polygon', 'coordinates': (((19.5, 9.5), (19.5, 19.5), (9.5, 19.5), (9.5, 9.5), (19.5, 9.5)),)}))
        assert len(items) == 100
        
    shutil.rmtree(tmpdir)


def test_mask_poly():
    
    schema = {'geometry': 'Polygon', 'properties': OrderedDict([('position_i', 'int'), ('position_j', 'int')])}
    driver = 'ESRI Shapefile'
    records = [{'geometry': {'type': 'Polygon', 'coordinates': (((float(i), float(j)), (float(i + 1), float(j)), (float(i + 1), float(j + 1)), (float(i), float(j + 1)), (float(i), float(j))),)}, 'properties': {'position_i': i, 'position_j': j}} for i in range(100) for j in range(100)]

    tmpdir = tempfile.mkdtemp()
    path = os.path.join(tmpdir, "test.shp")
    
    with fiona.open(path, 'w',
                    driver=driver,
                    schema=schema,
                    crs=from_epsg(4326)) as c:
        c.writerecords(records)

    with fiona.open(path) as c:
        items = list(c.items(mask={'type': 'Polygon', 'coordinates': (((19.5, 10.5), (19.5, 19.5), (10.5, 19.5), (10.5, 10.5), (19.5, 10.5)),)}))
        assert len(items) == 100
    
    shutil.rmtree(tmpdir)

@nikmolnar
Copy link
Author

@rbuffat It looks like both those test are using rectangular polygons as the masks; so the bounding box and the shape would both cover the same features. Something like this may better expose the issue:

mask={'type': 'Polygon', 'coordinates': (((19.5, 10.5), (19.5, 19.5), (10.5, 10.5), (19.5, 10.5)),)}

@rbuffat
Copy link
Contributor

rbuffat commented Jun 11, 2020

You are right, adding a triangle as polygon triggers the issue. The issue is present for at least GeoJSON, GPKG as well as Shapefile.

Just for visualization, here is the result with the OSX wheel:
osx

And how it is returned otherwise:
correct

I could not reproduce it when goes, proj, gdal, and Fiona are compiled from source with GEOS 3.6.2 and 3.8.0: https://travis-ci.org/github/rbuffat/Fiona/builds/697293783 / https://github.com/rbuffat/Fiona/tree/debug_mask_1_8_13

As it returns the same result at least on some configurations on OSX, I assume it's something with the wheels. But I don't know what.

As @sgillies already mentioned, the result is still within the specification of GDALs OGR_L_SetSpatialFilter:

Currently this test is may be inaccurately implemented, but it is guaranteed that all features whose envelope (as returned by OGR_G_GetEnvelope()) overlaps the envelope of the spatial filter will be returned. This can result in more shapes being returned that should strictly be the case.

What we could do is:

  • Add a test to our CI for this case
  • Improve documentation that a mask can return inaccurate results
  • Ask the GDAL maintainers if they have an idea of how to approach this issue.

@rbuffat
Copy link
Contributor

rbuffat commented Jun 11, 2020

@sgillies Not sure if this is something, but the Linux wheels contain libgeos*so, but the OSX Wheels seems not to contain any geos libs in .dylibs.
osx_wheel
linux_wheel

@rbuffat
Copy link
Contributor

rbuffat commented Jun 13, 2020

I had a look into fiona-wheels, gdal is compiled with
--with-geos=${DEPS_PREFIX}/bin/geos-config \

DEPS_PREFIX is different depending on the OS:

    if [ -n "$IS_OSX" ]; then
        EXPAT_PREFIX=/usr
        DEPS_PREFIX=/gdal
    else
        EXPAT_PREFIX=$BUILD_PREFIX
        DEPS_PREFIX=$BUILD_PREFIX
    fi

GEOS is installed into $BUILD_PREFIX, GDAL is thus compiled without GEOS support on OSX:

GEOS support:              no

https://travis-ci.org/github/rbuffat/fiona-wheels/jobs/698042200

p.s. I had to modify .travis.yml to

language: generic

services: docker

dist: xenial

filter_secrets: false

(based on https://github.com/matthew-brett/multibuild/blob/devel/README.rst) and remove the exclude block from the matrix, otherwise, no travis jobs was generated (
Build config did not create any jobs.).

@sgillies
Copy link
Member

@rbuffat thanks for digging into this! I'll plan for new wheels ASAP.

@rbuffat
Copy link
Contributor

rbuffat commented Jun 15, 2020

@sgillies Rasterios wheels could also be affected: https://github.com/rasterio/rasterio-wheels/blob/master/config.sh#L186

@sgillies
Copy link
Member

@rbuffat @nikmolnar this issue turns out to be complicated. On OS X we have a conflict over libgeos between fiona and shapely that may not be avoidable.

In #383 (comment) I reported that I had resolved the conflict. This turns out not be true: the conflict went away because I had accidentally, as you reported in this issue, removed GEOS support from fiona's copy of libgdal.

Shapely 2.0 will stop using ctypes and dlopen to load libgeos, and I think that may eliminate the conflict entirely. Until then, I think the best we can do is continue to leave GEOS support out of fiona's copy of libgdal on OS X. This degrades the geometry filtering feature, but we haven't seen any other issues yet.

@sgillies
Copy link
Member

I've marked this as something that we won't fix. It only manifests with the OS X wheels on PyPI, and for those wheels we will need to build GDAL without GEOS support to avoid the conflict with shapely's libgeos. That's been completed in sgillies/fiona-wheels#18.

@nikmolnar
Copy link
Author

Thank you @sgillies and @rbuffat for all your work figuring this out!

@rbuffat
Copy link
Contributor

rbuffat commented Jun 18, 2020

Until new wheels are released, it should be possible to compile Fiona from source with:

brew install gdal
pip install  --upgrade --force-reinstall --no-deps --no-binary :all: Fiona rasterio shapely

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants