Skip to content

End-to-end ETL pipeline using Airflow, S3, and Snowflake with interactive Tableau dashboards for data visualization.

Notifications You must be signed in to change notification settings

MariaAtefGhaly/ETLVisualizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Project Structure

. ├── dags/ # [ORCHESTRATION LAYER] │ ├── snowflake_etl_dag.py # Main Airflow DAG defining the workflow │ └── alerts/ │ └── slack_notifier.py # Custom alerts for pipeline failures │ ├── data_quality/ # [VALIDATION LAYER - Great Expectations] │ ├── check_points/ # Checkpoint configurations │ ├── expectations/ # Defined data quality rules (JSON/YAML) │ └── static_reports/ # HTML Data Docs (Visual quality reports) │ ├── scripts/ # [COMPUTE LAYER - Python Scripts] │ ├── extract/ │ │ └── api_to_s3.py # Logic to fetch API data and upload to S3 │ └── transform/ │ └── snowflake_logic.py # Python-based transformations (if needed) │ ├── sql_models/ # [DATA WAREHOUSE LAYER - Snowflake] │ ├── L1_bronze_raw/ # DDL for Raw Landing tables │ ├── L2_silver_cleaned/ # Views/Tables for cleaning & deduplication │ └── L3_gold_analytics/ # Final Star Schema (Fact/Dim) for Tableau │ ├── notebooks/ # [RESEARCH & EDA LAYER] │ └── api_exploration.ipynb # Initial data testing & schema discovery │ ├── config/ # [CONFIGURATION LAYER] │ └── settings.yaml # Connections, bucket names, and API keys │ ├── infrastructure/ # [DEVOPS LAYER] │ ├── docker-compose.yaml # To run Airflow/Postgres locally │ └── requirements.txt # Project dependencies (boto3, gx, etc.) │ ├── .env.example # Template for environment variables ├── .gitignore # Files to exclude from Git (logs, creds) └── README.md # Project documentation (The Face of your Repo)

🚀 Project Milestones & Phases Phase 1: Ingestion & Storage (Bronze Layer) Goal: Fetch data from the source and persist it safely.

Tools: Python (Requests), AWS S3.

Action: Airflow triggers a script to pull data from the API and saves it as a "Raw" file in an S3 bucket.

Phase 2: Data Quality & Validation (The Filter) Goal: Ensure data integrity before loading it into the Warehouse.

Tools: Great Expectations (GX).

Action: GX checks for nulls, data types, and out-of-range values. If it fails, the pipeline stops and sends an Alert.

Phase 3: Warehousing & Transformation (Silver & Gold Layers) Goal: Structured storage and business logic application.

Tools: Snowflake (SQL).

Action: Use COPY INTO to move data from S3 to Snowflake.

Silver: Data is casted to correct types and filtered.

Gold: Data is modeled into Fact and Dimension tables ready for analysis.

Phase 4: Orchestration & Monitoring Goal: Automate and schedule the entire flow.

Tools: Apache Airflow.

Action: Defining the DAG to handle dependencies, retries, and scheduling.

Phase 5: Visualization & Insights Goal: Communicate data findings to stakeholders.

Tools: Tableau.

Action: Connect Tableau to the Gold Layer in Snowflake to build interactive dashboards.

About

End-to-end ETL pipeline using Airflow, S3, and Snowflake with interactive Tableau dashboards for data visualization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •