This code repository contains exercises that go with the Data Minded Academy workshop on "Orchestrating work with Apache Airflow".
Data transformation pipelines rarely run by themselves. There are typically boundary conditions at play, like "we first need to have the results from this API, before we can upload the data to the database". Such workflows can be coded as part of your pipeline, but you risk creating an intangible mess that won't allow you to continue from halfway if an error occurred only halfway through. Learn about Apache Airflow, one of the most popular ways to orchestrate work, while allowing for a pleasant dashboard to follow up the daily progress of tasks that had to be completed.
You can simply click the button below to start the exercise environment within Gitpod.
The Gitpod environment will set up an Airflow instance for you to use during the exercise session. The Airflow UI will load in a new browser window once the startup is complete. You can log in to the UI using "airflow" both as username and password.
The Gitpod environment you receive will contain three folders:
- exercises
- solutions
- mount
The folder named mount will contain three sub-folders: dags, logs and plugins. These three folders will reflect the internal state of Airflow for these points, and can be used to upload DAGs or plugins into Airflow, or download log files.
When you need access to the containerized Airflow environment, use
docker-compose run airflow-cli bash