GitHub - abhishekshah25/3-layer-Medallion-Data-Pipeline: Data Pipeline based on Medallion Architecture using Azure Data Factory, Databricks and DBT.

Medallion Architecture Data Pipeline

Overview

This project implements a data pipeline based on the Medallion Architecture, leveraging Microsoft Azure services including Azure Data Factory, Azure Databricks, and DBT (Data Build Tool). The pipeline facilitates the efficient extraction, transformation and loading (ETL) of data, enabling seamless data processing & analysis.

Architecture

The Medallion Architecture is a data processing framework designed to ensure scalability, reliability and maintainability of data pipelines. Our implementation utilizes the following components:

Azure Data Factory: Orchestrates and automates data movement and transformation workflows. It provides a visual interface for constructing, monitoring, and managing pipelines.
Azure Databricks: A unified analytics platform that integrates with Azure services for big data processing. Databricks clusters enable scalable data processing using Apache Spark and it's notebooks facilitate collaborative development and execution of data transformation logic.
DBT (Data Build Tool): A command line tool that enables the transformation of data in your warehouse more effectively. It's specifically designed for those who want to build code that's modular, verifiable, and optimized for change.

Features

Modular Pipeline: The pipeline is modular, allowing easy addition or modification of data sources, transformations, and destinations.
Scalability: Leveraging Azure services ensures scalability to handle large volumes of data & varying workloads.
Automated Workflow: Data movement, transformation, and orchestration are automated, reducing manual intervention and potential errors.
Version Control: DBT enables version control of data transformation logic, promoting collaboration and ensuring reproducibility.

Getting Started

To get started with the data pipeline, follow the steps mentioned in the Procedure.pdf file. Feel free to make modifications in the data flow structure while creating your own pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
medallion_dbt_spark		medallion_dbt_spark
.gitignore		.gitignore
Procedure.pdf		Procedure.pdf
README.md		README.md
databricks_base_notebook.py		databricks_base_notebook.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

medallion_dbt_spark

medallion_dbt_spark

.gitignore

.gitignore

Procedure.pdf

Procedure.pdf

README.md

README.md

databricks_base_notebook.py

databricks_base_notebook.py

Repository files navigation

Medallion Architecture Data Pipeline

Overview

Architecture

Features

Getting Started

About

Releases

Packages

Languages

abhishekshah25/3-layer-Medallion-Data-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Medallion Architecture Data Pipeline

Overview

Architecture

Features

Getting Started

About

Topics

Resources

Stars

Watchers

Forks

Languages