Convert file to COGs and create mosaic at scale using AWS Lambda
This repo host the code for a serverless architecture enabling creation of Cloud Optimized GeoTIFF and Mosaic-JSON at scale using a map-reduce
like model.
- Start job (distribute tasks)
- Run processing task in parallel (e.g COG creation)
- Run a summary task (e.g. Create mosaic-json)
Note: This work was inspired by the awesome ecs-watchbot.
Not really. To be able to run a map-reduce
like model we need a fast and reliable database to store the job
status.
We use AWS ElastiCache Redis this part thus the stack is not fully serverless.
- serverless
- docker
- aws account
- Install and configure serverless
# Install and Configure serverless (https://serverless.com/framework/docs/providers/aws/guide/credentials/)
$ npm install serverless -g
- Create VPC and Redis Database
$ cd services/redis
$ sls deploy --region us-east-1
- Create Lambda package
$ make build
- Create Bucket (optional)
We need to create a bucket to store the COGs and mosaic-json. The bucket must be created before the lambda deploy.
$ aws s3api create-bucket --bucket my-bucket --region us-east-1
- Deploy the Watchbot Serverless stack
$ sls deploy --stage production --bucket my-bucket --region us-east-1
- Get a list of files you want to convert
$ aws s3 ls s3://spacenet-dataset/spacenet/SN5_roads/test_public/AOI_7_Moscow/PS-RGB/ --recursive | awk '{print "https://spacenet-dataset.s3.amazonaws.com/"$NF}' > list_moscow.txt
Note: we use https://spacenet-dataset.s3.amazonaws.com
prefix because we don't want to add IAM role for this bucket
- Use scripts/create_job.py
$ pip install rio-cogeo
$ cd scripts/
$ cat ../list_moscow.txt | python -m create_job - \
-p webp \
--co blockxsize=256 \
--co blockysize=256 \
--op overview_level=6 \
--op overview_resampling=bilinear > test.json
- Validate JSON (Optional)
$ jsonschema -i test.json schema.json
- upload to S3 and start processing
$ aws s3 cp spacenet_moscow.json s3://my-bucket/jobs/spacenet_moscow.json