Azure Databricks End-to-End Data Engineering Project

Sales Analytics Pipeline using Delta Lake, DLT, and Power BI

Overview

This project demonstrates a complete end-to-end data engineering and analytics solution implemented using Azure Databricks, ADLS Gen2, Delta Lake, Delta Live Tables (DLT), and Power BI. The goal of the project is to ingest, transform, and analyze sales datasets including Customers, Orders, and Products, and generate business insights such as sales performance and customer behavior.

The solution leverages the Medallion Architecture (Bronze, Silver, Gold) and Unity Catalog for centralized governance, access control, and data lineage tracking.

High-Level Architecture

The following diagram represents the overall architecture for the solution pipeline.

Project Workflow Summary

Provisioned Azure resource group and created an ADLS Gen2 storage account, Access Connector for Azure Databricks, and Azure Databricks workspace.
Enabled hierarchical namespace in ADLS and created a storage container with four logical zones:
- source
- bronze
- silver
- gold
Uploaded raw datasets to the source folder and configured a dynamic ingestion pipeline for external data pulls.
Created a Unity Metastore and enabled Unity Catalog using the ADLS Gen2 path.
Configured external locations for Bronze, Silver, and Gold layers using Access Connector credentials.
Developed data processing notebooks to implement incremental ingestion using Auto Loader, transformations in Silver layer, and dimensional modeling in Gold layer.
Built a Delta Live Tables (DLT) pipeline for automated streaming transformations with lineage tracking.
Created a Databricks Job pipeline to orchestrate notebook and DLT execution.
Connected Power BI directly to Gold Delta tables for visualization and reporting.

Databricks Job Workflow Diagram

This represents the full end-to-end execution pipeline inside Databricks.

Delta Live Tables Gold Pipeline

The DLT pipeline performs streaming ingestion, transformation, and creation of business-ready tables.

Medallion Architecture Implementation

Bronze Layer

Raw incremental ingestion from ADLS source zone using Databricks Auto Loader
Stored in Delta format with minimal transformation

Silver Layer

Cleansing, standardization, and joining of Customers, Orders, and Products datasets
Handling of type conversions and missing field values

Gold Layer

Creation of Fact and Dimension tables for analytical consumption
Optimized for dashboarding and reporting through Power BI

Dataset Description

The Sales Analytics dataset used for this project contains:

Customer information
Product catalog
Order transaction history

The dataset enables KPIs such as revenue trends, product contribution analysis, and customer purchase patterns.

Technologies Used

Azure Databricks
Delta Lake
Delta Live Tables (DLT)
Unity Catalog
ADLS Gen2
Power BI
Python / PySpark / SQL

Future Enhancements

Automating deployment using Azure DevOps CI/CD pipelines
Adding slowly changing dimensions (SCD Type 2) support
Introducing streaming-based ingestion for real-time updates
Expanding lineages visualization and governance details using Unity Catalog features

Conclusion

This project demonstrates a scalable and production-ready data engineering pipeline using Azure cloud components and Databricks. With automated ingestion, secure governed storage, transformation orchestration, and analytical dashboards, the solution addresses modern data analytics requirements and provides a solid foundation for enterprise-grade data platforms.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dataset		Dataset
Untiy Catalog_metadata		Untiy Catalog_metadata
configs		configs
notebooks		notebooks
Databricks_arch.webp		Databricks_arch.webp
End-To-End_Pipeline.png		End-To-End_Pipeline.png
Gold_Products_Pipeline.png		Gold_Products_Pipeline.png
Readme.md		Readme.md
external_location.sql		external_location.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure Databricks End-to-End Data Engineering Project

Overview

High-Level Architecture

Project Workflow Summary

Databricks Job Workflow Diagram

Delta Live Tables Gold Pipeline

Medallion Architecture Implementation

Bronze Layer

Silver Layer

Gold Layer

Dataset Description

Technologies Used

Future Enhancements

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Azure Databricks End-to-End Data Engineering Project

Overview

High-Level Architecture

Project Workflow Summary

Databricks Job Workflow Diagram

Delta Live Tables Gold Pipeline

Medallion Architecture Implementation

Bronze Layer

Silver Layer

Gold Layer

Dataset Description

Technologies Used

Future Enhancements

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages