A guide to designing and implementing ETL on Azure with Medallion architecture, using Azure Databricks, Azure Data Factory, PySpark, Spark Streaming, Delta Live tables, SCD, and dimensional data modelling.
This project involves:
- Data Architecture: Designing a Modern Data Warehouse Using Medallion Architecture Bronze, Silver, and Gold layers.
- ETL Pipelines: Extracting, transforming, and loading data from source systems in Azure Databricks using PySpark and Azure Data Factory(ADF).
- Data Modeling: Developing fact and dimension tables optimized for analytical queries.
- Analytics & Reporting: Creating SQL-based reports and dashboards for actionable insights on Azure Synapses connected to Azure Databricks.
- Governance & ACID compliance: Implementing Unity catalog and Delta Lake on Azure Databricks.
The architecture for this project follows Medallion Architecture Bronze, Silver, and Gold layers:
- Free Azure Account: https://azure.microsoft.com/en-us/pricing/purchase-options/azure-account
- Azure Resource Group: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/manage-resource-groups-portal
- Azure Data Factory: https://azure.microsoft.com/en-us/products/data-factory#Resources-6
- Delta live table: https://www.databricks.com/discover/pages/getting-started-with-delta-live-tables
- Slowly changing dimensions: https://learn.microsoft.com/en-us/fabric/data-factory/slowly-changing-dimension-type-one
- Unity Catalog: https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/
- Azure Data Lake Storage: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-best-practices
- Azure Databricks: https://azure.microsoft.com/en-us/pricing/purchase-options/azure-account?icid=databricks
The end to end workflow from data ingestion to all layers Bronze, Silver, and Gold is depicted below:

