Customer Data Integration and Analytics Pipeline

Overview

This project demonstrates the design and implementation of a modern data engineering pipeline that consolidates structured and semi-structured customer data into a centralized analytics warehouse for reporting and predictive analytics.

The solution integrates relational (MS SQL Server) and NoSQL (MongoDB) data sources, processes and transforms data using ETL pipelines, and loads it into a star-schema analytics warehouse.

The pipeline is orchestrated using both Azure Data Factory (ADF) and a prototype Apache Airflow workflow for comparison. Data loading is also demonstrated with SQL Server Integration Services (SSIS).

This project highlights best practices in data integration, transformation, and workflow management, with a focus on maintainability and scalability.

📐 Architecture

flowchart LR
    A[MS SQL Server: CRM Data] --> B[ETL Pipeline (ADF/Airflow)]
    C[MongoDB: Customer Events] --> B
    B --> D[Staging]
    D --> E[SQL Server: Analytics Warehouse]
    E --> F[Reporting & Dashboards]

Sources:
- MS SQL Server: Structured customer and transaction data
- MongoDB: Semi-structured customer event logs
Orchestration:
- Azure Data Factory (production-level orchestration)
- Apache Airflow (academic prototype for DAG-based orchestration)
Loading:
- SSIS packages for efficient batch inserts

🎯 Objectives

✅ Integrate multiple heterogeneous data sources into a single warehouse
✅ Automate data cleaning, transformation, and enrichment
✅ Enable timely and reliable business reporting
✅ Demonstrate proficiency in modern data engineering tools

🛠️ Technologies Used

Tool/Technology	Purpose
MS SQL Server	Relational database and analytics warehouse
MongoDB	NoSQL source for event-based data
Azure Data Factory (ADF)	Cloud-based ETL orchestration
SQL Server Integration Services (SSIS)	ETL for batch loading
Apache Airflow	Workflow orchestration (prototype)
Python	Data cleaning and Airflow DAG scripts
Jupyter Notebook	Exploratory data analysis

📁 Repository Structure

customer-data-integration-pipeline/
├── README.md
├── data/
│   ├── sample_sql_server_export.csv
│   ├── sample_mongodb_export.json
├── dags/
│   └── airflow_dag.py
├── ssis/
│   └── ssis_package.ispac
├── adf/
│   └── pipeline_definition.json
├── sql/
│   ├── warehouse_schema.sql
│   └── transformations.sql
├── notebooks/
│   └── exploratory_analysis.ipynb
├── docs/
│   └── architecture_diagram.png
└── LICENSE

🚀 Getting Started

Prerequisites

SQL Server instance (local or cloud)
MongoDB instance (local or cloud)
Azure account with Data Factory (optional for demonstration)
Apache Airflow (optional, can be run locally using Docker)

Steps

Clone the repository.
Create the analytics warehouse schema using sql/warehouse_schema.sql.
Load the sample data into SQL Server and MongoDB from /data.
Deploy the ADF pipeline from /adf/pipeline_definition.json (or simulate steps with Airflow DAG).
Run the SSIS package in /ssis to load staged data into the warehouse.
Explore the results or extend with custom dashboards.

📊 Outcomes

✅ Consolidated customer data from multiple systems into one analytics-ready database.
✅ Automated repeatable workflows for daily ETL operations.
✅ Improved data consistency, quality, and availability for reporting teams.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Data Integration and Analytics Pipeline

Overview

📐 Architecture

🎯 Objectives

🛠️ Technologies Used

📁 Repository Structure

🚀 Getting Started

Prerequisites

Steps

📊 Outcomes

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
adf		adf
dags		dags
data		data
docs		docs
notebooks		notebooks
sql		sql
ssis		ssis
README.md		README.md
requirments.txt		requirments.txt

Folders and files

Latest commit

History

Repository files navigation

Customer Data Integration and Analytics Pipeline

Overview

📐 Architecture

🎯 Objectives

🛠️ Technologies Used

📁 Repository Structure

🚀 Getting Started

Prerequisites

Steps

📊 Outcomes

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages