Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vector tile processing #676

Merged
merged 2 commits into from Feb 11, 2019
Merged

Optimize vector tile processing #676

merged 2 commits into from Feb 11, 2019

Conversation

@lewfish
Copy link
Contributor

@lewfish lewfish commented Jan 31, 2019

Overview

This PR changes the way vector tiles are processed to be more efficient. The two main improvements are: caching the GeoJSON that is extracted from a tile (since many scenes may use the same tile, especially at low zoom levels) and doing the feature merge after irrelevant features are filtered out.

Testing

I tested this on the Vegas roads example using the following and found that the debug chips looked correct. I then timed the run by prefixing the invocation with time and also set debug=False in spacenet.vegas (to avoid timing the part where it's producing debug chips, which is not relevant to the changes in this PR). I repeated this 3 times on this branch, and 3 times on develop and found that this PR decreased the run time by ~25%. That's not the dramatic speedup I was hoping for, but maybe this example doesn't do the best job of utilizing the improvements.

rastervision run local -e spacenet.vegas \
    -a test True \
    -a use_remote_data False \
    -a root_uri /opt/data/spacenet/vegas/ \
    -a target roads \
    -a task_type semantic_segmentation \
    -a vector_tile_options "https://s3.amazonaws.com/osmesa-vectortiles/full/2018-08-02/{z}/{x}/{y}.mvt,14,__id" \
    -r chip

Closes #557

@lewfish lewfish added the review label Jan 31, 2019
@lewfish lewfish force-pushed the lf/opt-vt branch 3 times, most recently from 2a85f7b to e3116e5 Feb 5, 2019
@codecov
Copy link

@codecov codecov bot commented Feb 5, 2019

Codecov Report

Merging #676 into develop will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #676      +/-   ##
===========================================
+ Coverage    71.21%   71.27%   +0.06%     
===========================================
  Files          171      171              
  Lines         8192     8210      +18     
===========================================
+ Hits          5834     5852      +18     
  Misses        2358     2358
Impacted Files Coverage Δ
rastervision/data/vector_source/vector_source.py 85.71% <100%> (ø) ⬆️
...vision/data/vector_source/geojson_vector_source.py 100% <100%> (ø) ⬆️
...on/data/vector_source/vector_tile_vector_source.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b532123...49103f1. Read the comment docs.

@lewfish lewfish changed the title WIP: Optimize vector tile processing Optimize vector tile processing Feb 5, 2019
@lewfish lewfish requested a review from jamesmcclain Feb 5, 2019
Copy link
Member

@jamesmcclain jamesmcclain left a comment

Looks okay, but the question that raised should be considered.


Merges features that are split across tiles and crops against the extentself.
"""
tile_feature_cache = {}
Copy link
Member

@jamesmcclain jamesmcclain Feb 6, 2019

Is there a replacement policy? Are there concerns about scalability (this is a question not a loaded question)?

Copy link
Contributor Author

@lewfish lewfish Feb 11, 2019

I'm not sure, but it's a better idea to set some limit on the size of the cache. Python 3 has a nice lru_cache callback that I'm using now. I'm not sure what a good limit size is for the cache, but 32 seems safe.

Copy link
Member

@jamesmcclain jamesmcclain Feb 11, 2019

I'm not sure, but it's a better idea to set some limit on the size of the cache. Python 3 has a nice lru_cache callback that I'm using now. I'm not sure what a good limit size is for the cache, but 32 seems safe.

That sounds like a fine plan.

* Extract GeoJSON from tiles and cache it
* Move class inference from VectorSource to subclasses so it
can be done at a point that increases efficiency of the pipeline.
@lewfish lewfish merged commit 326d618 into develop Feb 11, 2019
2 checks passed
@lewfish lewfish deleted the lf/opt-vt branch Feb 11, 2019
@lewfish lewfish removed the review label Feb 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants