Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WFS: Don't issue STARTINDEX if feature count is small #8146

Merged
merged 2 commits into from
Jul 29, 2023

Conversation

craigds
Copy link
Contributor

@craigds craigds commented Jul 24, 2023

What does this PR do?

WFS datasources with no primary key cannot be naturally ordered by the server, and so using the STARTINDEX argument causes a 400 error with the response:

Cannot do natural order without a primary key, please add it or
specify a manual sort over existing attributes

An example is here: https://geo.irceline.be/wfs?request=GetFeature&version=2.0.0&service=WFS&TYPENAMES=bc_24hmean&COUNT=1000000&STARTINDEX=0

This change avoids issuing STARTINDEX if:

  • the feature count is known by the time the first GetFeature request is issued
  • the feature count is less than the page size.

In other words, this change allows us to use small datasets that have no primary key, while still allowing paging in larger datasets (as long as they have a primary key)

What's not included

  1. In the gdal-dev post, @rouault suggested adding a new value to the existing OGR_WFS_PAGING_ALLOWED (COUNT_WITH_HITS) in addition to the existing boolean values. I attempted to implement that but quickly came up against infinite recursion, since this makes MakeGetFeatureURL call GetFeatureCount which in turn eventually calls MakeGetFeatureURL. I decided to skip that complication for this PR, which is still useful in most circumstances without it. The WIP change is here if you want to see it.
  2. Tests. I've now included a test

What are related issues/pull requests?

As discussed in gdal-dev a month ago

Tasklist

  • Add test case(s)
  • Add documentation
  • Updated Python API documentation (swig/include/python/docs/)
  • Review
  • Adjust for comments
  • All CI builds and checks have passed

Environment

Provide environment details, if relevant:

  • OS:
  • Compiler:

craigds added a commit to koordinates/gdal that referenced this pull request Jul 24, 2023
This adds the `OGR_WFS_PAGING_ALLOWED=CHECK_WITH_HITS` config option
value, which causes OGR to issue a GetFeatureCount() request before
deciding whether to paginate WFS layers.

It doesn't work at present because it can easily cause infinite
recursion: `MakeGetFeatureURL -> GetFeatureCount -> MakeGetFeatureURL`

Builds on PR OSGeo#8146. Not sure this will proceed though.
@coveralls
Copy link
Collaborator

coveralls commented Jul 24, 2023

Coverage Status

coverage: 66.995% (+0.002%) from 66.993% when pulling f890482 on koordinates:wfs-disable-paging-for-small-layers into 69cd93c on OSGeo:master.

craigds added a commit to koordinates/gdal that referenced this pull request Jul 25, 2023
Backports OSGeo#8146

Datasources with no primary key cannot be naturally ordered by the
server, and so using the STARTINDEX argument causes a 400 error with the
response:

> Cannot do natural order without a primary key, please add it or
> specify a manual sort over existing attributes

This change avoids issuing STARTINDEX if:
* the feature count is known by the time the first GetFeature request is
  issued
* the feature count is less than the page size.

In other words, this change allows us to use small datasets that have no
primary key, while still allowing paging in larger datasets (as long
as they have a primary key)
@rouault
Copy link
Member

rouault commented Jul 25, 2023

For tests, the best is to simulate a fake server by providing GetCapabilities, DescribeFeatureCount, GetFeature responses in temporary files like done in most of the tests of ogr_wfs.py.
test_ogr_wfs_vsimem_wfs110_multiple_layers_same_name_different_ns() or test_ogr_wfs_vsimem_wfs200_paging() could potentially serve as inspiration

craigds added a commit to koordinates/gdal that referenced this pull request Jul 26, 2023
Backports OSGeo#8146

Datasources with no primary key cannot be naturally ordered by the
server, and so using the STARTINDEX argument causes a 400 error with the
response:

> Cannot do natural order without a primary key, please add it or
> specify a manual sort over existing attributes

This change avoids issuing STARTINDEX if:
* the feature count is known by the time the first GetFeature request is
  issued
* the feature count is less than the page size.

In other words, this change allows us to use small datasets that have no
primary key, while still allowing paging in larger datasets (as long
as they have a primary key)
Datasources with no primary key cannot be naturally ordered by the
server, and so using the STARTINDEX argument causes a 400 error with the
response:

> Cannot do natural order without a primary key, please add it or
> specify a manual sort over existing attributes

This change avoids issuing STARTINDEX if:
* the feature count is known by the time the first GetFeature request is
  issued
* the feature count is less than the page size.

In other words, this change allows us to use small datasets that have no
primary key, while still allowing paging in larger datasets (as long
as they have a primary key)
@craigds craigds force-pushed the wfs-disable-paging-for-small-layers branch from 6575ecb to aeb4a11 Compare July 27, 2023 07:52
@craigds
Copy link
Contributor Author

craigds commented Jul 27, 2023

I have added a test that exercises the new behaviour.

@craigds
Copy link
Contributor Author

craigds commented Jul 27, 2023

The CI failures seem to be unrelated to the change I made here. Is there a known issue with CI?

@rouault
Copy link
Member

rouault commented Jul 29, 2023

Is there a known issue with CI?

for the Mac one, now, yes :-) Due to a version upgrade of a dependency in homebrew
The build-windows-minimum one must be a flake

@rouault rouault merged commit 48bf60a into OSGeo:master Jul 29, 2023
28 of 30 checks passed
@rouault rouault added this to the 3.8.0 milestone Jul 29, 2023
elpaso pushed a commit to elpaso/gdal that referenced this pull request Sep 4, 2023
Datasources with no primary key cannot be naturally ordered by the
server, and so using the STARTINDEX argument causes a 400 error with the
response:

> Cannot do natural order without a primary key, please add it or
> specify a manual sort over existing attributes

This change avoids issuing STARTINDEX if:
* the feature count is known by the time the first GetFeature request is
  issued
* the feature count is less than the page size.

In other words, this change allows us to use small datasets that have no
primary key, while still allowing paging in larger datasets (as long
as they have a primary key)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants