Persist and analyze metadata in a transient Amazon MWAA environment
This repository contains sample code for persisting and analyzing metadata in a transient Amazon MWAA environment. Storing this metadata in your data lake enables you to better perform pipeline monitoring and analysis. Tearing down instances whilst preserving the metadata enables you to further optimize the costs of Amazon MWAA.
This blog provides a detailed overview and step-by-step instructions on how to export, persist, and analyze Airflow metadata.
The below diagram illustrates the solution architecture. Please note, Amazon QuickSight is NOT included as part of the CloudFormation stack in this repository. It has been placed in the diagram to illustrate that metadata can be visualized using a business intelligence tool.
To implement the solution, you will need following :
- An AWS Account
- Basic understanding of Apache Airflow, Amazon Athena, Amazon Simple Storage Service (Amazon S3), AWS Glue, Amazon Managed Workflows for Apache Airflow (MWAA) and AWS Cloud Formation
The provisioning takes about 30 minutes to complete.
The CloudFormation template generates the following resources:
- VPC infrastructure that uses public routing over the Internet.
- Amazon S3 buckets required to support Amazon MWAA
- Amazon MWAA environment
- AWS Glue jobs for data processing and help generate airflow metadata
- AWS Lambda-backed custom resources to upload to Amazon S3 the sample data, AWS Glue scripts and DAG configuration files
- AWS Identity and Access Management (IAM) users, roles, and policies
What this repo contains
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.