In [1]:
import pandas as pd

df = pd.read_csv("C:/Users/loydt/Downloads/ecommerce_data.csv")

**order_item_id — explained**

Attribute Details
- **Column Name:** order_item_id  
- **Data Type:** Object / String (or UUID if generated programmatically)  
- **Definition:** Unique identifier for each individual product line within an order  
- **Purpose:** Distinguishes individual items when an order contains multiple products  

Relationship to order_id
- **order_id** → identifies the entire order (one purchase transaction)  
- **order_item_id** → identifies each specific item within that order  

Example

| order_id | order_item_id | product_name | quantity |
|----------|---------------|--------------|----------|
| 10234    | 1             | Widget A     | 2        |
| 10234    | 2             | Widget B     | 1        |

Explanation:
- Order 10234 contains two items, each with its own `order_item_id`.  
- This allows metrics like revenue per product or quantity per order to be calculated correctly.


In [2]:
df.head()

Unnamed: 0,order_id,order_item_id,customer_id,product_id,campaign_id,campaign_type,channel_id,platform_id,date,customer_name,...,category,item_price,quantity,channel_name,platform_name,impressions,spend,clicks,orders,revenue
0,ORDR75370,ORDR0034097-01,CUST53355,PROD47691,CMP68315,Retention/Loyalty,CHNL_01,PLT2,2024-07-23,Alicia Ward,...,Greeting Cards,102.35,259,Social,Facebook Ads,9,81.26,45,12,26508.65
1,ORDR16384,ORDR0034097-02,CUST06741,PROD41517,CMP10227,Seasonal/Promotional,CHNL_02,PLT1,2024-05-30,Angela Moore,...,Calendars,92.83,219,Paid_Search,Google Ads,17,68.31,47,37,20329.77
2,ORDR83177,ORDR0034097-03,CUST25345,PROD33618,CMP02804,Platform/Channel,CHNL_01,PLT2,2025-10-05,Victoria Foley,...,Greeting Cards,62.44,84,Social,Facebook Ads,24,45.81,112,16,5244.96
3,ORDR90095,ORDR0034097-04,CUST31280,PROD20247,CMP68315,Retention/Loyalty,CHNL_02,PLT1,2025-07-21,Jennifer Garza,...,Greeting Cards,62.44,84,Paid_Search,Google Ads,10,103.67,102,41,5244.96
4,ORDR88055,ORDR0034097-05,CUST25302,PROD88867,CMP10227,Seasonal/Promotional,CHNL_01,PLT2,2025-05-04,Jason Hunt,...,Greeting Cards,102.35,259,Social,Facebook Ads,5,84.49,89,47,26508.65


In [None]:
# We need to check if order_item_ids are mapped to product_ids and if there are duplicates

# Check for duplicates
duplicate_order_item_id = df[df['order_item_id'].duplicated(keep=False)].iloc[:,[1,3]]
duplicate_order_item_id

Unnamed: 0,order_item_id,product_id


**Handling Multiple `order_item_id`s per Product**

**Observation**
The same `product_id` can appear under multiple `order_item_id`s within a single order.

**Why It Matters**
- Even if the data is technically **correct**, failing to recognize this pattern can:
  - Lead to **over-counting products**, or  
  - **Inflate or distort revenue metrics**.
- Correct identification ensures dashboards and KPIs reflect the **true purchase behavior**.

**Recommended Awareness**
- Always check for **multiple `order_item_id`s linked to the same `product_id` within an order**.
- Always check for **duplicate `order_item_id`s** — this indicates a **data integrity or system error**.
