Skip to content

DoctorDatah/apache-airflow-on-gcp-compute-engine

Repository files navigation

Apache Airflow on GCP

Overview

This project implements a fully scalable, production-ready data engineering pipeline using Apache Airflow on Google Cloud Platform (GCP). The pipeline is designed to automate workflows and manage data processing tasks efficiently.

Project Structure

apache-airflow-on-gcp
├── dags
│   ├── example_dag.py          # Example Directed Acyclic Graph (DAG) for Airflow
├── scripts
│   ├── setup_gcp.sh            # Script to set up GCP resources
│   ├── deploy_airflow.sh        # Script to deploy Apache Airflow on GCP
├── terraform
│   ├── main.tf                 # Main Terraform configuration for GCP resources
│   ├── variables.tf            # Variables for Terraform configuration
│   ├── outputs.tf              # Output values from Terraform deployment
├── ci-cd
│   ├── .github
│   │   └── workflows
│   │       └── ci-cd.yml       # CI/CD pipeline configuration using GitHub Actions
├── config
│   ├── airflow.cfg             # Configuration file for Apache Airflow
├── Dockerfile                   # Dockerfile for building Airflow image
├── docker-compose.yml           # Docker Compose configuration for local development
└── requirements.txt             # Python dependencies for the project

Installation Instructions

  1. Clone the repository:

    git clone https://github.com/yourusername/apache-airflow-on-gcp.git
    cd apache-airflow-on-gcp
  2. Set up GCP resources: Run the setup script to create necessary IAM roles and enable APIs.

    chmod +x scripts/setup_gcp.sh
    ./scripts/setup_gcp.sh
  3. Deploy Apache Airflow: Use the deployment script to initialize and configure Airflow on GCP.

    chmod +x scripts/deploy_airflow.sh
    ./scripts/deploy_airflow.sh
  4. Build Docker Image: Build the Docker image for Apache Airflow.

    docker build -t apache-airflow-gcp .
  5. Run Locally (Optional): For local development, use Docker Compose.

    docker-compose up

Usage Guidelines

  • Define your workflows in the dags directory by creating new Python files.
  • Modify the airflow.cfg file in the config directory to adjust Airflow settings as needed.
  • Use the CI/CD pipeline defined in .github/workflows/ci-cd.yml for automated testing and deployment.

Contributing

Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published