Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Next link missing WFS 2.0 getfeature response for specific bbox #6325

Closed
arbakker opened this issue May 12, 2021 · 6 comments · Fixed by PDOK/mapserver-docker#16
Closed

Next link missing WFS 2.0 getfeature response for specific bbox #6325

arbakker opened this issue May 12, 2021 · 6 comments · Fixed by PDOK/mapserver-docker#16
Assignees

Comments

@arbakker
Copy link

arbakker commented May 12, 2021

Hi, I am running into a problem with WFS 2.0 paged GetFeature request. The problem is that for a particular boundingbox a series of paged WFS 2.0 GetFeature requests fails to retrieve all the features. The geometry type of the layer is of MULTILINESTRING, the datasource is a GeoPackage file. I also sent an email to the mapserver-users email list regarding this issue.

The problem is that the second paged response is missing the next link. The missing next link does actually produces the expected features and contains a next link itself.

In the MapServer logs, when requesting the page with the missing next link, I see the following log output: msOGRFileNextShape: Returning MS_DONE (no more shapes). This log output also shows when requesting the last page of results, of a paged WFS request.

The interesting thing is that this behaviour does not occur when requesting a slightly bigger bbox.

image

The SQL query used by Mapserver when requesting the page with the missing next link is:

SELECT "beheer_leiding"."uri" AS "uri", "beheer_leiding"."geometry" AS "geometry" FROM "beheer_leiding" JOIN "rtree_beheer_leiding_geometry" ms_spat_idx ON "beheer_leiding".ROWID = ms_spat_idx.id AND ms_spat_idx.minx <= 81976.8 AND ms_spat_idx.maxx >= 80034.6 AND ms_spat_idx.miny <= 453965.8 AND ms_spat_idx.maxy >= 452005.1  ORDER BY "beheer_leiding".ROWID LIMIT 1001 OFFSET 1000

This produces 1001 features as expected (other paged requests with more features to come have the same result). So not sure why MapServer decides there are no more results left. The MS_DONE value is returned from here (7.6.2 release) in the MapServer source code.

The problem seems to be caused by an interaction between the data, the requested bbox and a suspected bug in MapServer. In particul in the code that determines if there are any features left in the result set. Btw when changing the datasource to a GeoJSON file this behaviour does not occur, but it does occur with a PostGIS datasource.

I created a repo with a minimal example to reproduce the issue. The repo contains the mapfile, docker-compose, test script, data and a README describing steps in more detail to reproduce the issue.

Tested with MapServer release 7.6.2.

Seems to be related to: #6181

@rouault
Copy link
Contributor

rouault commented May 12, 2021

My suspicion of https://lists.osgeo.org/pipermail/mapserver-users/2021-May/082119.html was true

So msQueryByRect() starts by issuing a msOGRFileWhichShapes() call equivalent to

ogr2ogr tmp.gpkg  data.gpkg -sql  "SELECT "beheer_leiding"."uri" AS "uri", "beheer_leiding"."geometry" AS "geometry" FROM "beheer_leiding" JOIN "rtree_beheer_leiding_geometry" ms_spat_idx ON "beheer_leiding".ROWID = ms_spat_idx.id AND ms_spat_idx.minx <= 81976.8 AND ms_spat_idx.maxx >= 80034.6 AND ms_spat_idx.miny <= 453965.8 AND ms_spat_idx.maxy >= 452005.1  ORDER BY "beheer_leiding".ROWID LIMIT 1001 OFFSET 1000" -nln subquery

This returns 1001 features.

But then msQueryByRect() iterate over each feature to an actual intersection test of the geometry with the BBOX (the previous step only tested the intersect of the bounding box of the geometry with the BBOX) .
This is equivalent to
ogr2ogr tmp2.gpkg tmp.gpkg -sql "SELECT subquery.* FROM subquery WHERE Intersects(subquery.geometry, ST_Envelope(MakeLine(MakePoint(80034.6,452005.1), MakePoint(81976.8,453965.8))))"
which then return 1000 points and eliminate feature :

$ ogrinfo tmp.gpkg -al -where "uri = 'lei35910-31764-1'" -al -q

Layer name: subquery
OGRFeature(subquery):896
  uri (String) = lei35910-31764-1
  MULTILINESTRING ((80039.65 451993.11,80003.83 452034.64))

So the fix is to include the Intersects() test in the initial query

@rouault rouault self-assigned this May 12, 2021
rouault added a commit to rouault/mapserver that referenced this issue May 12, 2021
rouault added a commit to rouault/mapserver that referenced this issue May 12, 2021
@jratike80
Copy link

jratike80 commented May 12, 2021

But is there a way to do true Intersects() from geopackage in any other way than reading the whole table if the resultset must contain exactly 1000 features? Isn't it awfully inefficient?

A realistic test should be performed for a layer of one million features or so.

rouault added a commit to rouault/mapserver that referenced this issue May 12, 2021
rouault added a commit to rouault/mapserver that referenced this issue May 12, 2021
@rouault
Copy link
Contributor

rouault commented May 12, 2021

But is there a way to do true Intersects() from geopackage in any other way than reading the whole table if the resultset must contain exactly 1000 features? Isn't it awfully inefficient?

The request that is issued combine the use of the RTree and Intersects(). I hope this will be efficient:

$ ./mapserv QUERY_STRING="MAP=debug.map&SERVICE=WFS&SERVICE=WFS&version=2.0.0&service=wfs&request=Getfeature&bbox=80034.6,452005.1,81976.8,453965.8&outputformat=gml3&srsname=EPSG%3A28992&typename=gwsw%3Abeheer_leiding&STARTINDEX=1000"  2>&1 |head -n 20
[...]
msOGRFileWhichShapes: SQL = SELECT "beheer_leiding"."uri" AS "uri",
"beheer_leiding"."geometry" AS "geometry" FROM "beheer_leiding"
JOIN "rtree_beheer_leiding_geometry" ms_spat_idx
ON "beheer_leiding".ROWID = ms_spat_idx.id AND ms_spat_idx.minx <= 81976.8
AND ms_spat_idx.maxx >= 80034.6 AND ms_spat_idx.miny <= 453965.8 AND ms_spat_idx.maxy >= 452005.1 
WHERE  Intersects("geometry", BuildMbr(80034.600000,452005.100000,81976.800000,453965.800000))
ORDER BY "beheer_leiding".ROWID LIMIT 1001 OFFSET 1000.
[...]

@rouault
Copy link
Contributor

rouault commented May 12, 2021

hope this will be efficient:

yep, demo

Create huge.gpkg ( ~ 90 MB) with

from osgeo import ogr
ds = ogr.GetDriverByName('GPKG').CreateDataSource('huge.gpkg')
lyr = ds.CreateLayer('test')
lyr.StartTransaction()
for y in range(1000):
    for x in range(1000):
        f = ogr.Feature(lyr.GetLayerDefn())
        f.SetGeometry(ogr.CreateGeometryFromWkt('POINT(%d %d)' % (x, y)))
        lyr.CreateFeature(f)
lyr.CommitTransaction()

then query only through the RTree:

$ time ogrinfo huge.gpkg -sql "select count(*) from test join rtree_test_geom idx ON test.ROWID=idx.id AND idx.minx <= 500.5 AND idx.maxx >= 498.5 AND idx.miny <= 500.5 AND idx.maxy >= 498.5  ORDER BY test.ROWID" -al  -q

Layer name: SELECT
OGRFeature(SELECT):0
  count(*) (Integer) = 4


real	0m0,075s
user	0m0,051s
sys	0m0,024s

Combined that with a Intersects() test:

$ time ogrinfo huge.gpkg -sql "select count(*) from test join rtree_test_geom idx ON test.ROWID=idx.id AND idx.minx <= 500.5 AND idx.maxx >= 498.5 AND idx.miny <= 500.5 AND idx.maxy >= 498.5 WHERE Intersects(geom, BuildMbr(498.5,498.5,500.5,500.5)) ORDER BY test.ROWID" -al  -q

Layer name: SELECT
OGRFeature(SELECT):0
  count(*) (Integer) = 4


real	0m0,076s
user	0m0,067s
sys	0m0,008s

1 ms slower.

vs do only a Intersects() test:

$ time ogrinfo huge.gpkg -sql "select count(*) from test WHERE Intersects(geom, BuildMbr(498.5,498.5,500.5,500.5)) ORDER BY test.ROWID" -al  -q

Layer name: SELECT
OGRFeature(SELECT):0
  count(*) (Integer) = 4


real	0m1,140s
user	0m1,102s
sys	0m0,036s

Takes more than 1 second

@jratike80
Copy link

jratike80 commented May 12, 2021

I try hard to think of something that makes is fail but I can't. So perhaps it is OK.

ORDER BY rowid is the default and it is changed into whatever is set with &SORTBY= if that is used in GetFeature, I guess.

rouault added a commit that referenced this issue May 12, 2021
WFS: fix paging with GPKG/Spatialite datasources and non-point geometries (fixes #6325)
@arbakker
Copy link
Author

arbakker commented May 12, 2021

Thanks for the quick fix all and @rouault in particular, much appreciated 👍

rouault added a commit that referenced this issue May 12, 2021
[Backport branch-7-6] WFS: fix paging with GPKG/Spatialite datasources and non-point geometries (fixes #6325)
RoelvandenBerg pushed a commit to PDOK/mapserver that referenced this issue Jun 2, 2021
RoelvandenBerg pushed a commit to PDOK/mapserver that referenced this issue Jun 2, 2021
RoelvandenBerg added a commit to PDOK/mapserver-docker that referenced this issue Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants