Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vector tile processing #676

Merged
merged 2 commits into from Feb 11, 2019

Conversation

Projects
None yet
2 participants
@lewfish
Copy link
Contributor

lewfish commented Jan 31, 2019

Overview

This PR changes the way vector tiles are processed to be more efficient. The two main improvements are: caching the GeoJSON that is extracted from a tile (since many scenes may use the same tile, especially at low zoom levels) and doing the feature merge after irrelevant features are filtered out.

Testing

I tested this on the Vegas roads example using the following and found that the debug chips looked correct. I then timed the run by prefixing the invocation with time and also set debug=False in spacenet.vegas (to avoid timing the part where it's producing debug chips, which is not relevant to the changes in this PR). I repeated this 3 times on this branch, and 3 times on develop and found that this PR decreased the run time by ~25%. That's not the dramatic speedup I was hoping for, but maybe this example doesn't do the best job of utilizing the improvements.

rastervision run local -e spacenet.vegas \
    -a test True \
    -a use_remote_data False \
    -a root_uri /opt/data/spacenet/vegas/ \
    -a target roads \
    -a task_type semantic_segmentation \
    -a vector_tile_options "https://s3.amazonaws.com/osmesa-vectortiles/full/2018-08-02/{z}/{x}/{y}.mvt,14,__id" \
    -r chip

Closes #557

@lewfish lewfish added the review label Jan 31, 2019

@lewfish lewfish force-pushed the lf/opt-vt branch 3 times, most recently from 2a85f7b to e3116e5 Jan 31, 2019

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 5, 2019

Codecov Report

Merging #676 into develop will increase coverage by 0.06%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #676      +/-   ##
===========================================
+ Coverage    71.21%   71.27%   +0.06%     
===========================================
  Files          171      171              
  Lines         8192     8210      +18     
===========================================
+ Hits          5834     5852      +18     
  Misses        2358     2358
Impacted Files Coverage Δ
rastervision/data/vector_source/vector_source.py 85.71% <100%> (ø) ⬆️
...vision/data/vector_source/geojson_vector_source.py 100% <100%> (ø) ⬆️
...on/data/vector_source/vector_tile_vector_source.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b532123...49103f1. Read the comment docs.

@lewfish lewfish changed the title WIP: Optimize vector tile processing Optimize vector tile processing Feb 5, 2019

@lewfish lewfish requested a review from jamesmcclain Feb 5, 2019

@jamesmcclain
Copy link
Member

jamesmcclain left a comment

Looks okay, but the question that raised should be considered.


Merges features that are split across tiles and crops against the extentself.
"""
tile_feature_cache = {}

This comment has been minimized.

Copy link
@jamesmcclain

jamesmcclain Feb 6, 2019

Member

Is there a replacement policy? Are there concerns about scalability (this is a question not a loaded question)?

This comment has been minimized.

Copy link
@lewfish

lewfish Feb 11, 2019

Author Contributor

I'm not sure, but it's a better idea to set some limit on the size of the cache. Python 3 has a nice lru_cache callback that I'm using now. I'm not sure what a good limit size is for the cache, but 32 seems safe.

This comment has been minimized.

Copy link
@jamesmcclain

jamesmcclain Feb 11, 2019

Member

I'm not sure, but it's a better idea to set some limit on the size of the cache. Python 3 has a nice lru_cache callback that I'm using now. I'm not sure what a good limit size is for the cache, but 32 seems safe.

That sounds like a fine plan.

Optimize vector tile processing
* Extract GeoJSON from tiles and cache it
* Move class inference from VectorSource to subclasses so it
can be done at a point that increases efficiency of the pipeline.

@lewfish lewfish force-pushed the lf/opt-vt branch from e3116e5 to 49103f1 Feb 11, 2019

@lewfish lewfish merged commit 326d618 into develop Feb 11, 2019

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@lewfish lewfish deleted the lf/opt-vt branch Feb 11, 2019

@lewfish lewfish removed the review label Feb 11, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.