## **_Data Modeling Basics – Entities, Attributes & Relationships_**

### **_What is Data Modeling_**
- The process of visually designing how data is structure.
- Defines entities, attributes & relationships.
- Ensure data is clean, connected, and query ready

Data Modeling ka matlab hota hai data ko ek structured tarike se design karna jisse wo easily samjha, use, aur manage kiya ja sake.
Ye basically ek blueprint (naksha) hota hai ki data database me kaise store hoga aur tables ke beech kya relationships honge.

👉 Example: Agar tum ek E-commerce app ke liye kaam kar rahe ho to tumhe decide karna hoga:

- Customers ka data alag table me hoga
- 
- Orders ka data alag table me
- 
- Products alag table me
- 
- Aur ye sab tables ek dusre se relations (Primary Key – Foreign Key) ke through jude honge.

**Types of Data Models (short me):**

- Conceptual Model → High-level, sirf entities aur unke relations (jaise Customers–Orders).
- 
- Logical Model → Thoda detailed, attributes define karna (Customer_Name, Order_ID, etc.).
- 
- Physical Model → Actual DB schema (SQL tables, datatypes, constraints).

**As a Data Engineer, tum data modeling ka use karte ho taaki:**

- Data clean aur organized rahe
- 
- Fast queries aur analytics ho
- 
- Scalability maintain ho

👉 Simple line: Data Modeling = Database ka Map / Blueprint jisme tum bataate ho data kaha, kaise aur kis form me store hoga.

![](https://www.gooddata.com/img/blog/_2000xauto/ldm-for-e-commerce.png.webp)

### Assignment 1: Identify Entities and Relationships (E-Commerce Use Case)

### 📦 E-Commerce Platform Database Schema
**_1. 🧱 Entity List_**

**_The core entities (tables) in the system are:_**

- Users
- Products
- Categories
- Orders
- OrderItems
- Payments
- ShippingDetails
- Reviews
- Cart

**_2. 📋 Table-wise Attributes_**

**_Table- Users_**
| Column         | Key | Description        |
|----------------|-----|--------------------|
| user_id        | PK  | Unique user ID     |
| name           |     | User full name     |
| email          |     | User email address |
| password_hash  |     | Hashed password    |
| phone          |     | Phone number       |
| address        |     | Address            |
| created_at     |     | Account creation   |

**_Table- Products_**
| Column         | Key | Description              |
|----------------|-----|--------------------------|
| product_id     | PK  | Unique product ID        |
| name           |     | Product name             |
| description    |     | Product description      |
| price          |     | Product price            |
| stock_quantity |     | Quantity in stock        |
| category_id    | FK  | References Categories    |
| created_at     |     | Product creation date    |

**_Table- Categories_**
| Column       | Key | Description            |
|--------------|-----|------------------------|
| category_id  | PK  | Unique category ID     |
| name         |     | Category name          |
| description  |     | Category description   |

**_Table- Orders_**
| Column       | Key | Description               |
|--------------|-----|---------------------------|
| order_id     | PK  | Unique order ID           |
| user_id      | FK  | References Users          |
| order_date   |     | Date of order             |
| status       |     | Order status              |
| total_amount |     | Total order amount        |

**_Table- OrderItems_**
| Column        | Key | Description                  |
|---------------|-----|------------------------------|
| order_item_id | PK  | Unique order item ID         |
| order_id      | FK  | References Orders            |
| product_id    | FK  | References Products          |
| quantity      |     | Quantity of the product      |
| price         |     | Price of the product         |

**_Table- Payments_**
| Column         | Key | Description                  |
|----------------|-----|------------------------------|
| payment_id     | PK  | Unique payment ID            |
| order_id       | FK  | References Orders            |
| payment_date   |     | Date of payment              |
| amount         |     | Payment amount               |
| payment_method |     | Payment method               |
| payment_status |     | Payment status               |

**_Table- ShippingDetails_**
| Column          | Key | Description                  |
|-----------------|-----|------------------------------|
| shipping_id     | PK  | Unique shipping ID           |
| order_id        | FK  | References Orders            |
| shipping_address|     | Shipping address             |
| shipping_method |     | Method of shipping           |
| shipping_date   |     | Date of shipping             |
| delivery_date   |     | Expected delivery date       |
| tracking_number |     | Shipment tracking number     |

**_Table- Reviews_**
| Column      | Key | Description                  |
|-------------|-----|------------------------------|
| review_id   | PK  | Unique review ID             |
| user_id     | FK  | References Users             |
| product_id  | FK  | References Products          |
| rating      |     | Rating given by user         |
| comment     |     | Review comment               |
| review_date |     | Date of review               |

**_Table- Cart_**
| Column      | Key | Description                  |
|-------------|-----|------------------------------|
| cart_id     | PK  | Unique cart ID               |
| user_id     | FK  | References Users             |
| created_at  |     | Cart creation date           |

**_Table- CartItems_**
| Column       | Key | Description                  |
|--------------|-----|------------------------------|
| cart_item_id | PK  | Unique cart item ID          |
| cart_id      | FK  | References Cart              |
| product_id   | FK  | References Products          |
| quantity     |     | Quantity of product in cart  |

3. ## 🔗 Relationship Mapping
| Relationship              | Type         | Description                                |
|----------------------------|-------------|--------------------------------------------|
| Users → Orders             | One-to-Many | A user can place multiple orders           |
| Orders → OrderItems        | One-to-Many | Each order can contain multiple items      |
| Products → OrderItems      | One-to-Many | A product can appear in many order items   |
| Orders → Payments          | One-to-One  | Each order has one payment                 |
| Orders → ShippingDetails   | One-to-One  | Each order has one shipping detail         |
| Users → Reviews            | One-to-Many | A user can write multiple reviews          |
| Products → Reviews         | One-to-Many | A product can have multiple reviews        |
| Categories → Products      | One-to-Many | A category can contain multiple products   |
| Users → Cart               | One-to-One  | Each user has one cart                     |
| Cart → CartItems           | One-to-Many | A cart can contain multiple items          |
| Products → CartItems       | One-to-Many | A product can appear in multiple cart items|


4 🔑 Keys

### Primary Keys
| Table             | Primary Key    |
|-------------------|----------------|
| Users             | user_id        |
| Products          | product_id     |
| Categories        | category_id    |
| Orders            | order_id       |
| OrderItems        | order_item_id  |
| Payments          | payment_id     |
| ShippingDetails   | shipping_id    |
| Reviews           | review_id      |
| Cart              | cart_id        |
| CartItems         | cart_item_id   |

### Foreign Keys
| Table           | Foreign Key     | References                          |
|-----------------|-----------------|--------------------------------------|
| Products        | category_id     | Categories(category_id)              |
| Orders          | user_id         | Users(user_id)                       |
| OrderItems      | order_id        | Orders(order_id)                     |
| OrderItems      | product_id      | Products(product_id)                 |
| Payments        | order_id        | Orders(order_id)                     |
| ShippingDetails | order_id        | Orders(order_id)                     |
| Reviews         | user_id         | Users(user_id)                       |
| Reviews         | product_id      | Products(product_id)                 |
| Cart            | user_id         | Users(user_id)                       |
| CartItems       | cart_id         | Cart(cart_id)                        |
| CartItems       | product_id      | Products(product_id)                 |

### 3 Levels of Data Modeling (Data Engineer perspective)

**_Conceptual Data Model_**

- High-level design hota hai (business view).
- Sirf entities (Customer, Order, Product) aur relationships (Customer places Orders) dikhte hain.
- Technical details nahi hote.
- 👉 Example: Customer ↔ Order ↔ Product

_**Logical Data Model**_

- Thoda detailed hota hai, jisme entities ke attributes aur relationships define hote hain. 
- Data types optional hote hain, par keys (PK/FK) define ho sakte hain.
- 👉 Example: Customer(Customer_ID, Name, Email), Order(Order_ID, Order_Date, Customer_ID)

_**Physical Data Model**_

- Actual database implementation level hota hai.
- Tables, columns ke data types, primary keys, foreign keys, constraints, indexes sab define hote hain.
- 👉 Example (SQL table):

**_Code:-_**
`CREATE TABLE Customers (
    Customer_ID INT PRIMARY KEY,
    Name VARCHAR(100),
    Email VARCHAR(100) UNIQUE
);`


_**👉 Simple line:**_

- Conceptual = Business view (kya data chahiye)
- Logical = Detailed structure (kaisa data hoga)
- Physical = Actual DB schema (data kaise store hoga)
![](https://guides.visual-paradigm.com/wp-content/uploads/2023/09/img_6507e93556627.png)

# 📊 Dimension Modeling

---

## 🔹 Dimension Modeling Kya Hota Hai?
Data warehousing / BI world mein, jab hum **analysis aur reporting** ke liye data store karte hain,  
toh uska design **dimension modeling** ke through kiya jata hai.  

Ye basically ek **technique hai jisme data ko facts aur dimensions** mein tod kar rakha jata hai.  

- **Fact Table** → Numbers / Measures hote hain (sales amount, profit, quantity, revenue, etc.)  
- **Dimension Table** → Context deta hai facts ka (customer kaun tha, product kya tha, location kahan thi, time kab tha, etc.)  

---

## 🔹 Example
Soch lo ek **Sales Data** warehouse ban raha hai.  

- **Fact Table: Sales_Fact**  
  - sales_amount  
  - quantity_sold  
  - discount  

- **Dimensions**:  
  - Customer_Dim (customer_id, name, age, city)  
  - Product_Dim (product_id, name, category, brand)  
  - Date_Dim (date_id, day, month, year, quarter)  
  - Store_Dim (store_id, store_name, location)  

👉 Example Query:  
"Total Sales by City in 2024"  
- sales_amount **(Fact se)** uthayenge  
- aur customer_dim ke saath join karke **city wise group** karenge.  

---

## 🔹 Dimension Modeling ke Benefits
1. ✅ **Simple aur Fast Queries** – Reporting ke liye easy ban jaata hai.  
2. ✅ **Readable Structure** – Business users ko samajhna easy (customer, product, date)  
3. ✅ **Performance** – Aggregation aur filtering fast hoti hai.  
4. ✅ **Flexibility** – Easily naye dimensions add kiye jaa sakte hain.

# 🌐 Types of Dimension Modeling Schemas

Dimension Modeling mein mainly **3 common schemas** use hote hain:  

---

## 1. ⭐ Star Schema
- Central mein **Fact Table** hoti hai  
- Directly uske aas-pass **Dimension Tables** connect hoti hain  
- Simple aur fast query performance deta hai  
- Mostly **reporting and BI tools** ke liye best  

**Example (Sales Data):**  
- **Fact Table:** Sales_Fact (sales_amount, quantity, discount)  
- **Dimensions:** Customer_Dim, Product_Dim, Date_Dim, Store_Dim  

👉 Query: *"Total Sales by City"* = Sales_Fact + Customer_Dim join  

---

## 2. ❄️ Snowflake Schema
- Star schema ka **advanced version**  
- Dimensions ko aur break (normalize) kar dete hain  
- Storage bachata hai lekin queries thodi **complex aur slow** ho jaati hain  
- Mostly **data storage optimization** ke liye use hota hai  

**Example:**  
- Product_Dim ko tod diya →  
  - Product Table (product_id, name, category_id)  
  - Category Table (category_id, category_name, brand_id)  
  - Brand Table (brand_id, brand_name)  

👉 Query: *"Total Sales by Brand"* → Fact → Product → Category → Brand  

---

## 3. 🌌 Galaxy Schema (Fact Constellation)
- Jab **multiple fact tables** ek hi dimension share karte hain  
- Large enterprise systems mein use hota hai  
- Complex but powerful reporting support karta hai  

**Example:**  
- **Fact Tables:**  
  - Sales_Fact  
  - Returns_Fact  
- **Shared Dimensions:**  
  - Customer_Dim, Product_Dim, Date_Dim  

👉 Query: *"Compare Sales vs Returns by Product Category"*  

---

# 🔑 Summary
- **Star Schema:** Simple, Fast, Easy to use  
- **Snowflake Schema:** Normalized, Storage efficient, Complex joins  
- **Galaxy Schema:** Multiple facts, Shared dimensions, Enterprise level  

⚡ In short:  
**Schema choice = Balance between Simplicity (Star), Storage (Snowflake), and Complexity (Galaxy)**


# 🕒 Slowly Changing Dimensions (SCD): Handling Change Over Time

**SCD Kya Hota Hai?**  
Dimension tables mein descriptive info hota hai (Customer, Product, Location, etc.), lekin ye info **time ke sath change hota rehta hai**.  
Example: Customer ka address change ho gaya, Product ka price update hua, Employee ka department badal gaya.  
In changes ko handle karne ke liye use hota hai **Slowly Changing Dimensions (SCDs)**.


# 🕒 Types of Slowly Changing Dimensions (SCD) with Examples

---

## 🔹 Intro
SCD ka matlab hai **Slowly Changing Dimensions** → yani dimension table ka data jo time ke sath change hota hai (jaise Customer ka address, Product ka price, Employee ka department).  
Different types hote hain depending on business need ki **history preserve karni hai ya nahi**.  

---

## ⭐ Type 0 – Fixed Dimension
- **No change allowed**  
- Data ek bar load hua → kabhi change nahi hota  
- Example: Date Dimension (dates kabhi change nahi hote)

**Example Table:**
| Date\_ID | Date       | Month | Year |
| -------- | ---------- | ----- | ---- |
| 1        | 2025-01-01 | Jan   | 2025 |

---

## ⭐ Type 1 – Overwrite (No History)
- Old value ko **overwrite** kar dete hain new value se  
- History preserve nahi hoti  
- Sirf latest info chahiye tab use hota hai  

**Example Table:**
| Customer\_ID | Name | City                                 |
| ------------ | ---- | ------------------------------------ |
| 101          | Amit | Delhi                                |
| 101          | Amit | Mumbai   <-- Delhi overwrite ho gaya |

---

## ⭐ Type 2 – Full History Track
- Old record ko **expire** karke new record insert hota hai  
- Extra columns use hote hain: `Start_Date`, `End_Date`, `Is_Current`  
- Pure history preserve hoti hai  

**Example Table:**
| Customer\_ID | Name | City   | Start\_Date | End\_Date  | Is\_Current |
| ------------ | ---- | ------ | ----------- | ---------- | ----------- |
| 101          | Amit | Delhi  | 2022-01-01  | 2023-05-01 | N           |
| 101          | Amit | Mumbai | 2023-05-02  | NULL       | Y           |

---

## ⭐ Type 3 – Limited History
- Sirf **current + previous value** preserve hoti hai  
- Purana purana data lost ho jaata hai  
- Extra column hota hai jaise `Previous_Value`  

**Example Table:**
| Customer\_ID | Name | Current\_City | Previous\_City |
| ------------ | ---- | ------------- | -------------- |
| 101          | Amit | Mumbai        | Delhi          |

---

## 🔑 Summary Table
| Type  | Description                              | History Preserved? |
|-------|------------------------------------------|--------------------|
| SCD 0 | Fixed, no change allowed                 | ❌ No |
| SCD 1 | Overwrite old value with new value       | ❌ No |
| SCD 2 | Insert new row + track history           | ✅ Full |
| SCD 3 | Store only current + previous value      | ⚠️ Limited |

---

## ⚡ Real-Life Example
Ek **Bank Customer Dimension** socho:  
- Sirf latest address chahiye → **SCD Type 1**  
- Pura address history chahiye → **SCD Type 2**  
- Sirf current aur previous address chahiye → **SCD Type 3**  
- Agar data kabhi change hi nahi hoga (Date Dimension) → **SCD Type 0**

---

## ✅ Conclusion
- **SCD 0:** No change allowed  
- **SCD 1:** Overwrite → latest info only  
- **SCD 2:** Insert new row → full history track  
- **SCD 3:** Current + previous only  

👉 Business requirement decide karega kaunsa type use karna hai.


# 🏢 Data Warehousing Explained

---

## 🔹 What is a Data Warehouse?
Data Warehouse ek **centralized repository** hai jahan alag-alag sources (ERP, CRM, flat files, databases, APIs) se data collect, clean aur transform karke store kiya jata hai.  
Ye mainly **analysis, reporting aur BI (Business Intelligence)** ke liye use hota hai – na ki transactional work ke liye.  

👉 Simple words me:  
**OLTP (Operational DB)** = Daily transactions ke liye  
**OLAP (Data Warehouse)** = Analytics & Decision Making ke liye  

---

## 🔹 Features of Data Warehouse
- ✅ **Subject-Oriented** → Business areas ke hisaab se (Sales, Finance, Customer) organize hota hai  
- ✅ **Integrated** → Multiple sources ka data ek jagah combine hota hai  
- ✅ **Time-Variant** → Historical data preserve karta hai (5–10 years)  
- ✅ **Non-Volatile** → Data ek bar load ho gaya to frequently change nahi hota  

---

## 🔹 Data Warehouse Architecture

1. **Data Sources**  
   - OLTP Systems (ERP, CRM, SQL Server, MySQL)  
   - Flat Files, APIs, Logs  

2. **ETL / ELT Layer**  
   - Extract → Transform → Load  
   - Tools: ADF, Informatica, Talend, Databricks  

3. **Data Storage (DW)**  
   - Central repository (Snowflake, Synapse, BigQuery, Redshift)  

4. **Presentation Layer**  
   - BI & Analytics: Power BI, Tableau, Looker  

---

## 🔹 Data Flow Diagram

**Data Sources → ETL/ELT → Data Warehouse → BI/Analytics**

Example:  
CRM (Customer Data) + ERP (Sales Data) → ETL → Central DW → Dashboard → "Sales by Region"

---

## 🔹 Data Marts
- Data Warehouse ka **subset** jo ek specific department ya business unit ke liye hota hai.  
- Example:  
  - Sales Data Mart  
  - Finance Data Mart  
  - HR Data Mart  

---

## 🔹 Types of Data Warehouse
1. **Enterprise Data Warehouse (EDW)** → Centralized, enterprise-wide  
2. **Data Mart** → Department specific  
3. **Operational Data Store (ODS)** → Near real-time reporting ke liye  

---

## 🔑 Benefits of Data Warehouse
- 📊 Better Business Insights  
- ⚡ Fast Query Performance  
- 🏛️ Historical Data Analysis  
- 🔄 Single Source of Truth (SSOT)  
- 🤝 Better Decision Making  

---

## ✅ Real-Life Example
- Flipkart ke paas alag-alag systems me data hai: Orders, Payments, Customers, Inventory.  
- Agar business ko chahiye "Top Selling Products in Last 3 Years" → Ye kaam normal OLTP DB me slow hoga.  
- Data Warehouse me sab data ek jagah optimized hota hai → Query fast run hoti hai.  

---

# 🎯 Conclusion
Data Warehouse ek **analytics-focused database** hai jo business ko **decision making** me help karta hai.  
- OLTP = Transactions  
- OLAP / DW = Insights & Analytics  
