## 1. Problem Identification (Unnormalized Data)

- Orders_raw_data : Sample wors includes repeated customer and product data

    | Order ID | Customer | City | Product | Price | Qty |
    |:--------|:----------|:--------|:---------|:--------|:----------
    | 1001     | John Smith    | New York     | Laptop       | 1200.00 | 1   |
    | 1001     | John Smith    | New York     | Mouse        | 25.00   | 2   |
    | 1002     | Jane Doe      | Los Angeles  | Keyboard     | 75.00   | 1   |
    | 1003     | John Smith    | New York     | Monitor      | 300.00  | 1   |
    | 1004     | Bob Johnson   | Chicago      | Laptop       | 1200.00 | 1   |
    | 1004     | Bob Johnson   | Chicago      | Mouse        | 25.00   | 1   |


* Repeated attributes : 
- What data is duplicated : 
    - customer_name , customer_city : repeated for same customer
     - product_name, product_price repeated for same product
- Why is this a problem?
    - Wasted storage , Risk of inconsistent values , Hard to maintain updates.
- Examples from Dataset : 
    - **Update Anomalies :** 
        - Changing John Smith’s city requires updating multiple rows
         - Changing Laptop price requires updating multiple rows
    - **Insert Anomalies :** 
        - Cannot insert a new customer without an order
        - Cannot insert a product unless it appears in an order
    - **Delete Anomalies :** 
        - Deleting order 1003 removes all information about John Smith if it was his only order
        - Deleting a product order removes product pricing data
        
* Partial Dependencies

    - **Implied composite key:** (order_id, product_name)
        - These attributes do not depend on the full composite key.

* Transitive Dependencies

    - order_id → customer_name → customer_city
        - Customer city depends on customer, not directly on order.
    


## 2.First Normal Form (1NF)

- 1NF Rules :
    - Atomic values
    - No repeating groups
    - Each row represents one fact
- Results : 
    The raw table already satisfies 1NF, because:
    - One product per row
    - One quantity per row
    - No multi-valued fields

- 1NF Schema
    **orders_1nf**
    - order_id
    - customer_name
    - customer_city
    - product_name
    - product_price
    - quantity

    

## 3. Second Normal Form (2NF)

* Why 2NF is Needed? 
    - The table has a composite key (order_id, product_name) and partial dependencies.

* Changes Made
    - Separate customer data
    - Separate product data
    - Keep order items with composite key

* 2NF requirements : 
    - Must be in 1NF
    - All non-key attributes fully depend on the entire primary key
    - NO partial dependencies

* 2NF Schema 
    **customers**
    - customer_id (PK)
    - customer_name
    - customer_city

    **products**
    - product_id (PK)
    - product_name
    - product_price

    **orders**
    - order_id (PK)
    - customer_id (FK)

    **order_items**
    - order_id (PK, FK)
    - product_id (PK, FK)
    - quantity

    



## 4.Third Normal Form (3NF)

* Why 3NF is Needed 
    - Customer city is transitively dependent on customer.

* Changes Made
    - Extract city into a reference table
    - Replace city name with foreign key

* 3NF Requirements:
    - Must be in 2NF
    - No transitive dependencies
    - All non-key attributes depend only on the primary key

* 3NF schema
    **customers**
    - customer_id (PK)
    - customer_name
    - customer_city_id (FK)

    **cities**
    - city_id (PK)
    - city_name

    **products**
    - product_id (PK)
    - product_name
    - product_price

    **orders**
    - order_id (PK)
    - customer_id (FK)
    - order_date

    **order_items**
    - order_id (PK, FK)
    - product_id (PK, FK)
    - quantity
    - unit_price



    
 

## 5. Final Schema review

* Tables and Purpose 

| Table | Purpose | 
|:------|:--------|
| cities | Stores unique city data |
| customers | Stores customer information |
| products | Stores product catalog|
| orders | Stores order headers |
| order_items | Stores products per order |

* Primary Keys
 - cities.city_id
 - customers.customer_id
 - products.product_id
 - orders.order_id
 - order_items (order_id, product_id)

* Foreign Keys
- customers.city_id → cities.city_id
- orders.customer_id → customers.customer_id
- order_items.order_id → orders.order_id
- order_items.product_id → products.product_id

* Anomalies Resolved
- Update anomalies eliminated
- Insert anomalies eliminated
- Delete anomalies eliminated
- Data duplication minimized
