You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GDAL internally, on the C side of things, creates proj contexts, but does not set them to autoclose the backing SQLite database.
As a result, when using GDAL from Python with multiprocessing, it causes sqlite errors complaining about database corruption. This is caused by sqlite explicitly not supporting re-using an open Database from a forked process.
Proj has a mode specially for this, where it ensures to close the DB after every function call. But GDAL does not make use of it.
Steps to reproduce the problem.
I'm not aware of a straight forward way to reproduce this, since it happens somewhere within a rather complex construct of dependencies.
Operating system
Any Unix where Python multiprocessing uses forking mode by default
The text was updated successfully, but these errors were encountered:
Is the forking mode using just fork() or fork()+exec() ? I suspect the former (fork() only), in which case the underlying file descriptors are shared by the parent and the child, which is quite annoying.
I guess we could make OSRGetProjTLSContext() to take into account the process id. Hum...
I'm 99% sure Python multiprocessing forking mode is just a plain fork, no exec. The fork+exec mode is called spawn, and has quite some overhead due to that.
proj has a mode to auto-close the sqlite database: https://github.com/OSGeo/PROJ/blob/master/src/iso19111/c_api.cpp#L250
Expected behavior and actual behavior.
See pyproj4/pyproj#426 for reference.
The error caused by this looks like this: https://travis-ci.org/OGGM/OGGM-Anaconda/jobs/580670196#L1406
GDAL internally, on the C side of things, creates proj contexts, but does not set them to autoclose the backing SQLite database.
As a result, when using GDAL from Python with multiprocessing, it causes sqlite errors complaining about database corruption. This is caused by sqlite explicitly not supporting re-using an open Database from a forked process.
Proj has a mode specially for this, where it ensures to close the DB after every function call. But GDAL does not make use of it.
Steps to reproduce the problem.
I'm not aware of a straight forward way to reproduce this, since it happens somewhere within a rather complex construct of dependencies.
Operating system
Any Unix where Python multiprocessing uses forking mode by default
The text was updated successfully, but these errors were encountered: