Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: multiprocessing version of raster retrieval #154

Merged
merged 12 commits into from Oct 30, 2019

Conversation

dionhaefner
Copy link
Collaborator

GDAL has some problems with multi-threaded access to rasters: OSGeo/gdal#1244

This is a multi-processing version of tile retrieval. A process pool is created the first time it is used (to keep startup time reasonable). Cache and database access is managed by the main process.

@codecov
Copy link

codecov bot commented Oct 30, 2019

Codecov Report

Merging #154 into master will decrease coverage by 0.07%.
The diff coverage is 95.45%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #154      +/-   ##
==========================================
- Coverage   98.23%   98.16%   -0.08%     
==========================================
  Files          43       43              
  Lines        2094     2122      +28     
  Branches      258      260       +2     
==========================================
+ Hits         2057     2083      +26     
- Misses         20       22       +2     
  Partials       17       17
Impacted Files Coverage Δ
terracotta/drivers/raster_base.py 94.69% <95.45%> (-0.22%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08d7cdc...99b1030. Read the comment docs.

executor = ProcessPoolExecutor(max_workers=3)
except OSError:
# fall back to serial evaluation
executor = ThreadPoolExecutor(max_workers=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@dionhaefner dionhaefner merged commit 03f42b1 into master Oct 30, 2019
@dionhaefner dionhaefner deleted the multiprocessing branch October 30, 2019 14:30
@vincentsarago
Copy link

@dionhaefner did you run any benchmarking to see any performance improvement using multiprocess instead of multithreading ? I wonder how it will perform within AWS Lambda ?

@dionhaefner
Copy link
Collaborator Author

On machines with more than 3 cores I saw about a 10% performance increase, on Travis (2 cores) about a 10% performance decrease.

Lambda is a different story, since it doesn't support shared memory (/dev/shm), so we can't use concurrent.futures.ProcessPoolExecutor there. As a workaround we just don't do concurrency on Lambda and load everything serially, which is only a 20% performance decrease (seems to be mostly CPU bound).

The reason for this change has nothing to do with performance though. It seems like GDAL + multithreading is fundamentally broken at the moment (OSGeo/gdal#1244), so we had to find something else until that is fixed.

@vincentsarago
Copy link

excellent thanks for the explanation @dionhaefner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants