New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: multiprocessing version of raster retrieval #154
Conversation
Codecov Report
@@ Coverage Diff @@
## master #154 +/- ##
==========================================
- Coverage 98.23% 98.16% -0.08%
==========================================
Files 43 43
Lines 2094 2122 +28
Branches 258 260 +2
==========================================
+ Hits 2057 2083 +26
- Misses 20 22 +2
Partials 17 17
Continue to review full report at Codecov.
|
executor = ProcessPoolExecutor(max_workers=3) | ||
except OSError: | ||
# fall back to serial evaluation | ||
executor = ThreadPoolExecutor(max_workers=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@dionhaefner did you run any benchmarking to see any performance improvement using multiprocess instead of multithreading ? I wonder how it will perform within AWS Lambda ? |
On machines with more than 3 cores I saw about a 10% performance increase, on Travis (2 cores) about a 10% performance decrease. Lambda is a different story, since it doesn't support shared memory ( The reason for this change has nothing to do with performance though. It seems like GDAL + multithreading is fundamentally broken at the moment (OSGeo/gdal#1244), so we had to find something else until that is fixed. |
excellent thanks for the explanation @dionhaefner |
GDAL has some problems with multi-threaded access to rasters: OSGeo/gdal#1244
This is a multi-processing version of tile retrieval. A process pool is created the first time it is used (to keep startup time reasonable). Cache and database access is managed by the main process.