
## Objective: Understand the layered architecture of modern data platforms.

Dataset: ecommerce_orders.csv

### Tasks:
1. Split this raw dataset logically into:

    -> Bronze Layer: Raw ingestion (entire dataset)

    -> Silver Layer: Only successful (status = delivered) orders
    
    -> Gold Layer: Aggregated metrics like total revenue per product

2. Create a markdown visual representation (table or diagram) of how data flows from
Bronze → Silver → Gold.

3. Write one paragraph answering:
“Why is the Lakehouse model better than traditional Data Warehousing for modern companies?”

In [0]:
import pandas as pd
file_path = '/Workspace/Users/akashkchavan9900@gmail.com/data-engineering-projects/Datasets/ecommerce_orders.csv'
df = pd.read_csv(file_path)

### Task 1: 
Bronze Layer



In [0]:
df

SIlver Layer

In [0]:
df1 = df[df['status'] == 'delivered']
df1

Gold Layer

In [0]:
df2 = df1.groupby('product', as_index=False)['amount'].sum()
df2.rename(columns={'amount': 'total_revenue'}, inplace=True)
print(df2)

## Task 2: 
#### Data Flow: Bronze → Silver → Gold

| Layer  | Description                                      | Data Criteria                            |
|--------|--------------------------------------------------|------------------------------------------|
| Bronze | Raw ingestion of all data                        | All records from the source              |
| Silver | Cleaned and filtered data                        | Only where `status = delivered`         |
| Gold   | Aggregated insights for analysis or reporting    | Revenue grouped by product              |


## Task 3: 
#### Why is the Lakehouse model better than traditional Data Warehousing for modern companies?
-> 
The Lakehouse model combines the best of both worlds—data lakes and data warehouses. Traditional data warehouses are great for structured analytics but fall short when dealing with large volumes of semi-structured or unstructured data. On the flip side, data lakes offer flexibility and scalability but often lack reliability for business-critical analytics. Lakehouse solves this by using open formats like Parquet and Delta Lake to provide a single storage layer for both structured and unstructured data, with support for ACID transactions, schema enforcement, and BI workloads. For modern companies, it means faster insights, lower data duplication, and reduced infrastructure costs—all in one place.