This project provides a robust, containerized Apache Airflow environment, integrated with Keycloak for secure authentication and PostgreSQL as its metadata database. Designed for easy deployment and management, this stack is ideal for orchestrating data pipelines and workflows with enterprise-grade authentication.
- Apache Airflow: A platform to programmatically author, schedule, and monitor workflows.
- Keycloak Integration: Secure user authentication and authorization via Keycloak Identity and Access Management.
- PostgreSQL Backend: Reliable and scalable database for Airflow metadata.
- Dockerized Environment: All components are containerized using Docker for consistency and isolation.
- Docker Compose Orchestration: Easily manage and run the entire stack with a single command.
- Automated Setup Scripts: Scripts for initial Airflow setup and Keycloak client creation.
- Apache Airflow
- Keycloak
- PostgreSQL
- Docker
- Docker Compose
- Bash Scripting
.
├── docker-compose.yml # Defines the services for the Airflow stack
├── .env # Environment variables for the stack (e.g., database credentials)
├── README.md # Project documentation (this file!)
├── airflow # Airflow service configuration and DAGs
│ ├── Dockerfile # Builds the custom Airflow image
│ ├── requirements.txt # Python dependencies for Airflow and DAGs
│ ├── config/ # Airflow configuration files
│ │ └── webserver_config.py # Web server custom configuration (e.g., OAuth setup)
│ ├── dags/ # Your Airflow Directed Acyclic Graphs (DAGs)
│ │ └── example_dag.py # Example DAG to get you started
│ ├── logs/ # Airflow runtime logs
│ └── plugins/ # Custom Airflow plugins, operators, hooks
├── configs # Global configuration files for the stack
│ └── airflow.cfg # Main Airflow configuration file
├── keycloak # Keycloak service configuration
│ ├── Dockerfile # Builds the custom Keycloak image
│ └── realm-export.json # Keycloak realm configuration for initial setup
├── postgres # PostgreSQL service configuration
│ └── init.sql # SQL script for initial database setup
└── scripts # Helper scripts for setup and management
├── init_airflow.sh # Initializes Airflow (e.g., database, admin user)
└── create_keycloak_client.sh # Automates Keycloak client creation for Airflow
Follow these steps to get your Airflow stack up and running:
-
Clone the Repository:
git clone https://github.com/AkashBhadana/Airflow-Stack.git cd Airflow-Stack -
Environment Variables: Create a
.envfile in the root directory (if not already present) and populate it with necessary environment variables, such as database credentials or Keycloak client secrets. A.env.examplemight be provided (not currently, but good practice). -
Build and Run the Stack:
docker-compose up --build -d
The
-dflag runs the containers in detached mode. -
Access Airflow UI: Once all services are up and running, access the Airflow UI in your web browser: http://localhost:8080
-
Access Keycloak Admin Console: The Keycloak admin console will be available at: http://localhost:8081 (You may need to refer to Keycloak setup documentation for initial admin credentials.)
- Airflow: Develop and deploy your data pipelines by adding DAG files to the
airflow/dagsdirectory. Manage and monitor them via the Airflow UI. - Keycloak: Use the Keycloak admin console to manage users, roles, and clients for authenticating into Airflow.
- Ensure Docker and Docker Compose are installed on your system.
- The first user will be auto-created in Airflow upon their initial OAuth login via Keycloak.
- Remember to secure your
.envfile and any sensitive configurations.
Feel free to fork this repository, open issues, or submit pull requests to improve this Airflow stack.
Built with ❤️ for robust data orchestration.