Cirrus is a STAC-based processing pipeline. As input, Cirrus takes a GeoJSON FeatureCollection with 1 or more STAC Items. This input is run through workflows that generate 1 or more STAC Items as output. These output Items are added to the Cirrus static STAC catalog, and are also broadcast via an SNS topic that can be subscribed to for triggering additional workflows, such as keeping a dynamic STAC catalog up to date (for example, STAC-server).
Cirrus workflows can be as simple as containing no processing at all, where the input is passed through and published. It could be more complex where the STAC Items and underlying data are transformed, and then those are published. The current state (QUEUED, PROCESSING, COMPLETED, FAILED) is tracked during processing, preventing inputs from getting ingested more than once and allows for a user to follow the state of any input through the pipeline.
As shown in this high-level overview of Cirrus, users input data to Cirrus through the user of feeders. Feeders are simply programs that get/generate some type of STAC metadata, combine it with processing parameters and passes it into Cirrus in the format Cirrus expects.
Because Cirrus output is published via SNS, a Feeder can be configured to subscribe to that SNS and thus workflows can be chained, such that the output of one workflow becomes the input to another workflow and creates multiple levels of products, all with published STAC metadata and clear links showing data provenance.
Cirrus is divided up into several repositories, all under the cirrus-geo organization on GitHub, with this repository (cirrus
) the main one of interest to users.
Repository | Purpose |
---|---|
cirrus | Main Cirrus repo containing serverless config and deployment files, along with the standard set of Lambda functions |
cirrus-dashboard | A front-end interface to the Cirrus API |
cirrus-lib | A Python library of convenience functions to interact with Cirrus. Lambda functions are kept lightweight |
cirrus-task-images | Dockerfiles and code for publishing Cirrus Docker images to Docker Hub that are used in Cirrus Batch tasks |
The cirrus
repository is what users would clone, modify and deploy. The cirrus-dashboard
repo is for users to deploy if they want a web-app for tracking stats of data through Cirrus. The pip-installable python library cirrus-lib
is used from all Cirrus Lambdas and tasks and is available to developers for writing their own tasks.
This repository, cirrus
contains all the files for deploying a Cirrus instance including all the core Lambda functions, workflows (AWS Step Functions), State database (AWS DynamoDB), Compute Environments (AWS Batch), and API (API Gateway + Lambda).
Users may need to edit the deployment YAML files as needed for their Cirrus instance, and may also wish to add new tasks, Lambda functions, and workflows.
Folder | Purpose |
---|---|
core | Core lambda functions for validating and orchestrating workflows |
deploy | yaml files used for deployment with Serverless (referenced from serverless.yml) |
docs | Keeping details documentation for this application in a single place |
feeders | Feeder Lambda functions used to add data to Cirrus |
lambdas | Code for lambdas |
tasks | Lambda tasks |
test | All test files for the application, including test fixtures, are kept here |
workflows | Definitions for AWS Step Functions and schemas |
Documentation for deploying, using, and customizing Cirrus is contained within the docs directory:
- Understand the architecture of Cirrus and key concepts
- Deploy Cirrus to your own AWS account
- Use Cirrus to process input data and publish resulting STAC Items
- Customize Cirrus by adding tasks, workflows, and compute environments
Cirrus is an Open-Source pipeline for processing geospatial data in AWS. Cirrus was developed by Element 84 originally under a NASA Access project.