Skip to content

Latest commit

 

History

History
129 lines (96 loc) · 4.98 KB

README.md

File metadata and controls

129 lines (96 loc) · 4.98 KB

Insideout

Insideout is a suit of software dedicated to give you the best performances to accomplish one operation, query for the position within one or more polygons:

  • is this location in a building ?
  • in which city are we in ?
  • what timezone ?
  • anything where a closed polygon describes a geographical region

This is the opensourced part of a project including ready to serve docker images with embedded pre indexed datasets.

Strategy

Several strategies are available:

  • On disk index (more reads) data can be larger than memory
  • Inside Tree in memory (fast when a location is inside cover), data can be larger than memory only indexes are in memory
  • full s2 index, fastest but huge memory consumption, wait for start since indexation is made on start

These 3 strategies give you enough choices to perform better according to your data.

APIS

Two sets of API are provided:

  • one using gRPC
       service Inside {
           //  Stab returns features containing lat lng
           rpc Within(WithinRequest) returns (WithinResponse) {}
           // Get returns a feature by its internal ID and polygon index
           rpc Get(GetRequest) returns (Feature) {}
       }
  • one basic HTTP /api/within/{lat}/{lng}

Metrics are provided via Prometheus at http://host:httpMetricsPort/metrics.

A debug visual map is available at http://host:httpAPIPort/debug/.

Health status is provided via gRPC host:healthPort or via basic HTTP http://host:httpAPIPort/healthz.

Docker & Kubernetes

Main goal of insideout is to be used with container image with pre embedded indexes, ready to run.

There is a simple demo with preindexed Natural Earth data 110m countries:

docker run --rm -it -p 8080:8080 akhenakh/insideout-demo:latest 

Point your browser onto http://yourip:8080/debug/

Indexer

Tune your index parameters according to your data:
Small sparse buildings should be indexed differently than cities also use stopOnFirstFound if you know only one polygon is encircling a position.

Usage of ./cmd/indexer/indexer:
  -dbPath="inside.db": Database path
  -filePath="": FeatureCollection GeoJSON file to index
  -insideMaxCellsCover=24: Max s2 Cells count for inside cover
  -insideMaxLevelCover=16: Max s2 level for inside cover
  -insideMinLevelCover=10: Min s2 level for inside cover
  -logLevel="INFO": DEBUG|INFO|WARN|ERROR
  -outsideMaxCellsCover=16: Max s2 Cells count for outside cover
  -outsideMaxLevelCover=15: Max s2 level for outside cover
  -outsideMinLevelCover=10: Min s2 level for outside cover
  -warningCellsCover=1000: warning limit cover count

Insided

Usage of ./cmd/insided/insided:
  -cacheCount=200: Features count to cache, 0 to disable the cache
  -dbPath="inside.db": Database path
  -grpcPort=9200: gRPC API port
  -healthPort=6666: grpc health port
  -httpAPIPort=9201: http API port
  -httpMetricsPort=8088: http port
  -logLevel="INFO": DEBUG|INFO|WARN|ERROR
  -stopOnFirstFound=false: Stop in first feature found
  -strategy="db": Strategy to use: insidetree|shapeindex|db

K/V Engines

Different engines have been tested: bbolt, pogreb, badger 1.6, goleveldb.

For Insideout particular load (read only random reads), bbolt is the best performer.

Test with loadtester 10s fr-communes using db engines & insidetree when available:

 ./insided -stopOnFirstFound=true -strategy=db -cacheCount=0 -dbPath=../leveldbindexer/inside.db -dbEngine=leveldb
count 31083 rate mean 3108/s rate1 3110/s 99p 980665
Alloc = 13 MiB  TotalAlloc = 3686 MiB   Sys = 71 MiB    NumGC = 321

./insided -stopOnFirstFound=true -strategy=db -cacheCount=0 -dbPath=../bboltindexer/inside.db -dbEngine=bbolt
count 42190 rate mean 4219/s rate1 4211/s 99p 4760278
Alloc = 1 MiB   TotalAlloc = 3479 MiB   Sys = 71 MiB    NumGC = 1635

./insided -stopOnFirstFound=true -strategy=insidetree -cacheCount=0 -dbPath=../bboltindexer/inside.db -dbEngine=bbolt 
count 42135 rate mean 4214/s rate1 4206/s 99p 2259642
Alloc = 208 MiB TotalAlloc = 3638 MiB   Sys = 411 MiB   NumGC = 29

./insided -stopOnFirstFound=true -strategy=insidetree -cacheCount=0 -dbPath=../leveldbindexer/inside.db -dbEngine=leveldb
count 41021 rate mean 4102/s rate1 4091/s 99p 13443368
Alloc = 390 MiB TotalAlloc = 3441 MiB   Sys = 480 MiB   NumGC = 22

./insided -stopOnFirstFound=true -strategy=insidetree -cacheCount=0 -dbPath=../badgerindexer/inside.db -dbEngine=badger
count 38936 rate mean 3894/s rate1 3874/s 99p 2599252
Alloc = 554 MiB TotalAlloc = 3988 MiB   Sys = 680 MiB   NumGC = 15

./insided -stopOnFirstFound=true -strategy=insidetree -cacheCount=0 -dbPath=../progrebindexer/inside.db -dbEngine=progreb
count 44853 rate mean 4485/s rate1 4476/s 99p 2374910
Alloc = 286 MiB TotalAlloc = 3954 MiB   Sys = 479 MiB   NumGC = 32

Pogreb is faster but does not supports prefix range and consumes a bit more than bbolt.

bbolt is more capable for this load.

Prefiltering

zcat France.geojson.gz| jq '.features[] | select(.properties.admin_level==10)'