Apache Airflow is a pipeline system to manage jobs. It’s mainly used to create data-pipeline systems.
This is used a lot in the industry because of its ability to structure code execution. You can also find a lot of tools that are based on Airflow, so mastering this tool will help you with others. Kubeflow is a great example of that.
We will use Docker to run Airflow. This is a great way to run Airflow because it will allow you to run it in any environment. You can run it in your local machine, in a server, or even in a cloud provider.
Let's see how we can do that.
- Docker installed and running
- Docker Compose installed
Execute the starting script that will:
- Build the Docker image for each task
- Start Airflow
bash ./scripts/start.sh