This project implements a fully scalable, production-ready data engineering pipeline using Apache Airflow on Google Cloud Platform (GCP). The pipeline is designed to automate workflows and manage data processing tasks efficiently.
apache-airflow-on-gcp
├── dags
│ ├── example_dag.py # Example Directed Acyclic Graph (DAG) for Airflow
├── scripts
│ ├── setup_gcp.sh # Script to set up GCP resources
│ ├── deploy_airflow.sh # Script to deploy Apache Airflow on GCP
├── terraform
│ ├── main.tf # Main Terraform configuration for GCP resources
│ ├── variables.tf # Variables for Terraform configuration
│ ├── outputs.tf # Output values from Terraform deployment
├── ci-cd
│ ├── .github
│ │ └── workflows
│ │ └── ci-cd.yml # CI/CD pipeline configuration using GitHub Actions
├── config
│ ├── airflow.cfg # Configuration file for Apache Airflow
├── Dockerfile # Dockerfile for building Airflow image
├── docker-compose.yml # Docker Compose configuration for local development
└── requirements.txt # Python dependencies for the project
-
Clone the repository:
git clone https://github.com/yourusername/apache-airflow-on-gcp.git cd apache-airflow-on-gcp -
Set up GCP resources: Run the setup script to create necessary IAM roles and enable APIs.
chmod +x scripts/setup_gcp.sh ./scripts/setup_gcp.sh
-
Deploy Apache Airflow: Use the deployment script to initialize and configure Airflow on GCP.
chmod +x scripts/deploy_airflow.sh ./scripts/deploy_airflow.sh
-
Build Docker Image: Build the Docker image for Apache Airflow.
docker build -t apache-airflow-gcp . -
Run Locally (Optional): For local development, use Docker Compose.
docker-compose up
- Define your workflows in the
dagsdirectory by creating new Python files. - Modify the
airflow.cfgfile in theconfigdirectory to adjust Airflow settings as needed. - Use the CI/CD pipeline defined in
.github/workflows/ci-cd.ymlfor automated testing and deployment.
Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.
This project is licensed under the MIT License. See the LICENSE file for details.