
🛠️ Setting Up Apache Airflow with Python – Step-by-Step with Explanations

This setup installs Apache Airflow using version-specific dependency constraints to ensure compatibility, starts a standalone instance, and prepares your first DAG folder.

⸻

📌 Step 1: Set Airflow home directory

export AIRFLOW_HOME=~/airflow

📝 What this does:
Sets an environment variable AIRFLOW_HOME which tells Airflow where to store its files — such as logs, configuration, and the default database. It defaults to ~/airflow, but you can change it if needed.

⸻

📌 Step 2: Define Airflow and Python version

AIRFLOW_VERSION=3.0.2
PYTHON_VERSION="$(python -c 'import sys; print(f"{sys.version_info.major}.{sys.version_info.minor}")')"

📝 What this does:
	•	AIRFLOW_VERSION specifies the version of Airflow you want to install.
	•	PYTHON_VERSION dynamically fetches your current Python version (like 3.10) — which is required because Airflow has different dependency constraints for different Python versions.

⸻

📌 Step 3: Create the URL for version-specific dependency constraints

CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"

📝 What this does:
Builds a URL pointing to Airflow’s official constraint file. This file lists compatible versions of libraries required by that specific version of Airflow, ensuring a stable and conflict-free installation.

⸻

📦 Step 4: Install Airflow using pip

pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

📝 What this does:
Installs Apache Airflow with version-locked dependencies using the constraint file you just built. This prevents common installation issues caused by incompatible packages.

⸻

🚀 Step 5: Run Airflow in standalone mode

airflow standalone

📝 What this does:
Starts Airflow in a simplified mode ideal for local testing. It launches:
	•	the scheduler
	•	the web server (by default at http://localhost:8080)
	•	initializes the database
	•	creates a default admin user

It will print the username/password you can use to log in.

⸻

📁 Step 6: Create a DAGs folder if it doesn’t exist

mkdir -p ~/airflow/dags

📝 What this does:
Creates a folder called dags inside your Airflow home. This is where all your DAG (Directed Acyclic Graph) Python files will be stored and picked up by the Airflow scheduler.

⸻

📝 Step 7: Create your first DAG file

nano ~/airflow/dags/retail_ml_demo_dag.py

📝 What this does:
Opens the nano text editor (you can use any editor) to create your first DAG Python file. Example DAG contents:

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime

def print_hello():
    print("Hello from Retail ML DAG!")

with DAG(
    dag_id='retail_ml_demo_dag',
    start_date=datetime(2024, 1, 1),
    schedule_interval='@daily',
    catchup=False
) as dag:
    task = PythonOperator(
        task_id='hello_task',
        python_callable=print_hello
    )

🗂️ Save this in ~/airflow/dags/retail_ml_demo_dag.py. It will automatically appear in your Airflow UI under DAGs.

⸻

✅ Summary

Step	Purpose
Set AIRFLOW_HOME	Define storage location
Specify versions	Match Airflow to Python
Use constraints	Avoid package conflicts
Install Airflow	With pip and constraints
Run standalone	Launch Airflow locally
Create DAGs folder	Store your workflows
Write DAG	Create your first ML task


⸻

Would you like me to generate this as a shareable PDF or Markdown file too?