**Project Title:** End-to-End Data Engineering Pipeline using Microsoft Fabric with Medallion Architecture
By -Neha Karpe 
INT-19

**Objective:**
To implement an automated ETL pipeline using Microsoft Fabric, following the Medallion Architecture (Bronze, Silver, Gold). This includes data ingestion, cleaning, transformation, aggregation, logging, and visualization using Power BI.




1. **Ingesting Data into Bronze Layer**
- **Method Used:** Copy Data Activity in Fabric pipeline
- **Source:** On-premise CSV files (e.g., `region_dim.csv`, `product_dim.csv`, `sales_fact.csv`)
-- **Access Method: Connected to the on-premise data source using Microsoft On-premises Data Gateway to securely access local files from within the cloud pipeline. This ensured seamless and secure data movement between on-premise file system and Fabric Lakehouse.
- **Destination:** Bronze Lakehouse (Files section)
- **Logging:**
  - Source filename
  - Ingestion timestamp
  - Row count
  - File format
  - Any ingestion error messages

**Good Practice Followed:**
- Understood and verified the column names during the initial step to prevent transformation errors downstream.
- Logged metadata at the point of ingestion.

#### 2. **Bronze to Bronze Table Conversion**
- **Tool Used:** Fabric Notebook (PySpark)
- **Operation:**
  - Renamed columns for consistency.
  - Wrote data into Bronze Delta Tables.
  - Ensured schema correctness.

```python
# Example
sales_df = (spark.read.option("header", True)
            .csv("/lakehouse/Files/sales_fact.csv")
            .withColumnRenamed("Qty", "TotalQuantity"))
sales_df.write.format("delta").mode("overwrite").saveAsTable("sales_fact")
```

---

#### 3. **Transforming Data in Silver Layer**  
- **Tool Used:** `Notebook26102002#silvernb.ipynb`
- **Operations:**
  - Read Bronze Delta Tables
  - Applied transformations: data type casting, null filtering, schema alignment
  - Logged:
    - Data Quality issues (nulls, mismatches)
    - Columns changed or removed
    - Schema before and after transformation

```python
# Example Log
silver_log = {
    "table": "sales_fact",
    "records_dropped": 12,
    "columns_converted": ["TotalCost: string -> double"],
    "transformation_time": current_timestamp()
}
```

**Best Practices Applied:**
- Maintained transformation logs
- Used modular functions with docstrings for reuse and readability
- Followed standard naming conventions


#### 4. **Aggregating and Loading Gold Layer (Star Schema)**
- **Tool Used:** `Notebook26102002#gold.ipynb`
- **Steps:**
  - Read from Silver
  - Created dimension tables (Product, Region)
  - Created fact tables with aggregations:
    - Total Sales by Region and Category
    - Top 5 Products by Sales Amount
    - Monthly Sales Trend
  - Logged aggregation steps, filters used, success/failure status



**Blockers:**
- 1)Warehouse table load failed due to unavailability of the assigned warehouse (documented and communicated)
  2)Unable to access Power BI via fabric so couldnt complete the audit logs of accessing the gold layer lakehouse.


#### 5. **Power BI Dashboard Creation**
- Used Gold layer tables
-KPI's
Total Profit: 10.45M
Total Sales: 29.36M
Total Quantity: 54K
- Built the following visuals:
  - KPIs: Total Sales, Quantity, Profit
  -Bar chart: Total Sales by Region
  -Pie chart: Total Sales by Category
  -Donut chart: Top Products by Sales
  -Line chart: Monthly Sales Trend (2010â€“2014)
- Made interactive using slicers (Region, Category, Date)
-Region
-Category
-Year & Month (slicer)

### ETL Terminologies Used:
- **ETL (Extract, Transform, Load):** End-to-end pipeline covering raw ingestion to BI
- **Delta Tables:** Used for versioned storage and optimized queries
- **Star Schema:** Dimensional modeling in Gold layer
- **Data Quality Checks:** Tracked nulls, types, and integrity
- **Audit Logging:** Every stage included logs for traceability

### Final Thoughts & Key Learning
- Understood the significance of metadata logging and modular design
- Practiced column verification as a good habit in the initial data load
- Built modular, maintainable code using functions and comments
- Gained exposure to Medallion Architecture and Power BI
- Learned how to debug load issues and create visual stories from data



