Setup to run Airflow in AWS ECS containers
- Docker
- Docker Compose
- AWS IAM User for the infrastructure deployment, with admin permissions
- awscli, intall running
pip install awscli
- terraform >= 0.13
- setup your IAM User credentials inside
~/.aws/credentials
- setup these env variables in your .zshrc or .bashrc, or in your the terminal session that you are going to use
export AWS_ACCOUNT=your_account_id export AWS_DEFAULT_REGION=us-east-1 # it's the default region that needs to be setup also in infrastructure/config.tf
-
Generate a Fernet Key:
pip install cryptography export AIRFLOW_FERNET_KEY=$(python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
More about that here
-
Start Airflow locally simply running:
docker-compose up --build
If everything runs correctly you can reach Airflow navigating to localhost:8080. The current setup is based on Celery Workers. You can monitor how many workers are currently active using Flower, visiting localhost:5555
To run Airflow in AWS we will use ECS (Elastic Container Service) with components in AWS:
- AWS ECS Fargate: run all Airflow services (Webserver, Flower, Workers and Scheduler);
- ElasticCache (Redis): communication between Airflow Services;
- RDS for Postgres: database MetadataDB for Airflow servies;
- EFS: persistent storage for Airflow dags;
- ELB: Application Load Balance for Airflow WebServer access;
- CloudWatch: logs for container services and Airflow run tasks;
- IAM: communications services permission for ECS containers;
- ECR: image repository Docker for storage Airflow images.
Run the following commands:
Exports System Variables:
export AWS_ACCOUNT=xxxxxxxxxxxxx
export AWS_DEFAULT_REGION=us-east-1
And build all infraestructure and upload Docker Image:
bash scripts/deploy.sh airflow-dev
By default the infrastructure is deployed in us-east-1
.
The file that runs all airflow services is entrypoint.sh located in the configs folder under the project root. It is parameterized according to the commands passed in tasks definitions called command.
If when uploading the Airflow containers an error occurs such as:
ResourceInitializationError: failed to invoke EFS utils commands to set up EFS volumes: stderr: b'mount.nfs4...
You will need to mount the EFS on an EC2 instance and perform the following steps:
- mount the EFS on an EC2 in the same VPC;
- access EFS and create the /data/airflow folder structure;
- give full and recursive permission on the root folder, something like chmod 777 -R /data.
- with this the AIRflow containers will be able to access the volume and create the necessary folders;
- refact terraform on best practices;
- use SSM Parameter Store to keep passwords secret;
- automatically update task definition when uploading a new Airflow version.