Apache Airflow is an open-source workflow management platform for data engineering pipelines. It started at Airbnb in October 2014 as a solution to manage the company's increasingly complex workflows. Creating Airflow allowed Airbnb to programmatically author and schedule their workflows and monitor them via the built-in Airflow user interface. From the beginning, the project was made open source, becoming an Apache Incubator project in March 2016 and a top-level Apache Software Foundation project in January 2019.
The development setup of this repository comprises of multiple steps that need to be executed IN ORDER.
- Python Version: 3.9
- Shell: BASH (GitBASH in Windows)
This is the part where we set up the python development environment.
python -m venv env
to create a new virtual environment.source env/bin/activate
to activate the virtual environment.source env/Scripts/activate
in Windows.
python -m pip install pip-tools
to install setup tools for the next step.pip-compile
to generate a platform-dependent list of dependencies.python -m pip install -r requirements.txt
to install all the required dependencies.chmod +x install_airflow.sh
to permit the mention shell-script to execute.sh install_airflow.sh
to install the pip-version ofairflow
- Stricly speaking, this step is absolutely not necessary, but makes code completion in an IDE possible and does not hamper the actual airflow DAGs.
- If the script cannot connect for some reason run the following command directly:
python -m pip install "apache-airflow[celery]==2.5.2" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.5.2/constraints-3.9.txt"
.
This is the part where we install and set up docker
to run Airflow.
- Install
docker
anddocker-compose
in your system with the instructions given here. Alternately, you can also install theDocker Desktop
if your system supports it.- If you are running a Windows machine, you will need to install
WSL/2
and select it as the backend when installingdocker
.
- If you are running a Windows machine, you will need to install
mkdir dags plugins logs
to create the mandatory directories if they were not already created.echo -e "AIRFLOW_UID=$(id -u)" > .env
to set theUser ID
in the.env
file.- Follow the instructions provided here to set up the container to run
docker
. - Do NOT change or redownload the
YML
file.
- Follow the instructions provided here to set up the container to run
docker-compose up airflow-init
to initialize the airflow database(s).docker-compose up -d
to run the airflow webserver and the miscellaneous services indetached
mode.docker-compose down
to stop the containers.docker-compose down -v
to stop the containers and destroy all created volumes.