Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPKG: disable by default multi-threaded ArrowArray interface. Make it… #9018

Merged
merged 1 commit into from
Jan 3, 2024

Conversation

rouault
Copy link
Member

@rouault rouault commented Jan 3, 2024

… opt-in with the OGR_GPKG_NUM_THREADS config option

Cf https://lists.osgeo.org/pipermail/gdal-dev/2024-January/058177.html

@jratike80
Copy link
Collaborator

Out of interest, does the "consecutive feature ID numbering" support cases with missing ID numbers when some rows have been deleted from the middle of a table?

@rouault
Copy link
Member Author

rouault commented Jan 3, 2024

does the "consecutive feature ID numbering" support cases with missing ID numbers when some rows have been deleted from the middle of a table?

no, consecutive feature ID numbering means no holes and N features with id = 1, 2, .... N where N is the total number of features.

@coveralls
Copy link
Collaborator

coveralls commented Jan 3, 2024

Coverage Status

coverage: 68.734% (-0.001%) from 68.735%
when pulling c4d5e5d on rouault:OGR_GPKG_NUM_THREADS
into 98c3491 on OSGeo:master.

@rouault rouault merged commit cdd114f into OSGeo:master Jan 3, 2024
30 checks passed
@jratike80
Copy link
Collaborator

jratike80 commented Jan 3, 2024

no, consecutive feature ID numbering means no holes and N features with id = 1, 2, .... N where N is the total number of features.

Is that tested so that user cannot activate numthreads>1 and ArrowArray interface if that is doomed to fail?
With a valid GeoPackage it should be enough to test that max(fid)=count(*). Or rather use the feature_count in the gpkg_ogr_contents for comparison.

@rouault
Copy link
Member Author

rouault commented Jan 3, 2024

Is that tested so that user cannot activate numthreads>1 and ArrowArray interface if that is doomed to fail?

yes, the optimization checks that max(fid) = count(*) ( actually using the gpkg_ogr_contents special table to get count(*) in a fast way). So it should be safe to set OGR_GPKG_NUM_THREADS unconditionally, in theory. I'm not sure why the user on the mailing list hit that issue. I've never hit myself.

@leofuhrmann
Copy link

leofuhrmann commented Jan 4, 2024

User from the mailing list here, here's a GeoPackage to hopefully (?) reproduce the problem. The original dataset contains geometries that were removed for the example, leaving only the fid column.

ogr2ogr test.gpkg example.gpkg
throws "ERROR 1: Worker thread task has not expected m_iStartShapeId value"

ogr2ogr test.gpkg example.gpkg -where 1
doesn't produce the error.

example.gpkg.zip

@rouault
Copy link
Member Author

rouault commented Jan 4, 2024

User from the mailing list here, here's a GeoPackage to hopefully (?) reproduce the problem

Thanks. I did reproduce. The issue was related to the number of features vs the batch size. As far as I can see, there should be no corruption and the error message could thus be ignored. fix in #9026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants