This project demonstrates the end-to-end Data Warehouse pipeline using a multi-layered architecture β Bronze, Silver, and Gold layers. It integrates data from CRM and ERP systems, performs data cleaning, transformation, and analytics, and applies Exploratory Data Analysis (EDA) and Advanced Data Analysis (ADA).
-
Data is extracted directly from source systems:
CRM(Customer Relationship Management)ERP(Enterprise Resource Planning)
-
Data files included:
- CRM:
cust_info.xlsx,prd_info.xlsx,sales_details.xlsx - ERP:
CUST_AZ12.xlsx,LOC_A101.xlsx,PX_CAT_G1V2.xlsx
- CRM:
-
This layer stores raw, unprocessed data.
-
Cleaned and standardized data from Bronze Layer.
-
Key transformations performed:
- Removed duplicates and null values.
- Standardized column names and formats.
- Handled missing values.
- Created relationships between CRM and ERP datasets.
-
Tables include:
crm_sales_detailscrm_cust_infocrm_prd_infoerp_cust_az12erp_loc_a101erp_px_cat_g1v2
-
Contains final, analysis-ready dimension and fact tables.
-
Tables:
fact_salesdim_customersdim_products
-
Relationships established between these tables:
fact_saleslinks to bothdim_customersanddim_productsusing foreign keys.
Performed in this phase:
- Analyzed customer demographics (country, gender, marital status).
- Reviewed sales trends by product and location.
- Checked for outliers and missing values.
- Validated data consistency between CRM and ERP sources.
- Top-performing products.
- Sales distribution by country and category.
- Customer segmentation based on purchase behavior.
- SQL β for data transformation and analysis
- Excel / CSV β for source data
SQL_datawarehouse_project_1/ β βββ datasets/ β βββ source_crm/ β β βββ cust_info.xlsx β β βββ prd_info.xlsx β β βββ sales_details.xlsx β βββ source_erp/ β βββ CUST_AZ12.xlsx β βββ LOC_A101.xlsx β βββ PX_CAT_G1V2.xlsx β βββ bronze/ βββ silver/ βββ gold/ βββ README.md
β Summary
This project demonstrates a complete Data Warehouse ETL process, starting from raw data ingestion to analytical insights. It showcases skills in:
- SQL Data Modeling
- ETL and Data Cleaning
- EDA & ADA
- Building a structured multi-layer architecture