Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using GDAL in forking multiprocessing environment causes SQLite failure #2221

Closed
TimoRoth opened this issue Feb 5, 2020 · 2 comments
Closed
Milestone

Comments

@TimoRoth
Copy link

TimoRoth commented Feb 5, 2020

Expected behavior and actual behavior.

See pyproj4/pyproj#426 for reference.

The error caused by this looks like this: https://travis-ci.org/OGGM/OGGM-Anaconda/jobs/580670196#L1406

GDAL internally, on the C side of things, creates proj contexts, but does not set them to autoclose the backing SQLite database.
As a result, when using GDAL from Python with multiprocessing, it causes sqlite errors complaining about database corruption. This is caused by sqlite explicitly not supporting re-using an open Database from a forked process.

Proj has a mode specially for this, where it ensures to close the DB after every function call. But GDAL does not make use of it.

Steps to reproduce the problem.

I'm not aware of a straight forward way to reproduce this, since it happens somewhere within a rather complex construct of dependencies.

Operating system

Any Unix where Python multiprocessing uses forking mode by default

@rouault
Copy link
Member

rouault commented Feb 5, 2020

Is the forking mode using just fork() or fork()+exec() ? I suspect the former (fork() only), in which case the underlying file descriptors are shared by the parent and the child, which is quite annoying.
I guess we could make OSRGetProjTLSContext() to take into account the process id. Hum...

@TimoRoth
Copy link
Author

TimoRoth commented Feb 5, 2020

I'm 99% sure Python multiprocessing forking mode is just a plain fork, no exec. The fork+exec mode is called spawn, and has quite some overhead due to that.
proj has a mode to auto-close the sqlite database: https://github.com/OSGeo/PROJ/blob/master/src/iso19111/c_api.cpp#L250

@rouault rouault closed this as completed in 095bc42 Feb 6, 2020
rouault added a commit that referenced this issue Feb 6, 2020
Fix PROJ usage accross fork() calls (fixes #2221)
@rouault rouault added this to the 3.0.5 milestone Feb 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants