Issue:
When Orders Multiply: How Single Orders Get Assigned to Multiple Customers

Key Insight:
- Our sales reports assume each order belongs to one customer. 
- In reality, some orders are linked to multiple customers, creating hidden errors that can mislead business decisions.

----

1. Problem Definition — Reality Gap						
						
Business Belief:						
- Each order is associated with exactly one customer.						
- Dashboards and reports are trusted to show accurate revenue and customer behavior.						
- Multiple rows per order (for different items) are normal and expected.						
						
Actual Finding:						
- Some orders are linked to more than one customer, violating the single-customer-per-order rule.						
- Example:						
						
   - Order 10234 → Alice Smith, Bob Johnson						
   - Order 10456 → Carol Lee, Carol King						
						
- This is different from normal line-item rows, as the customer association itself is inconsistent.						

---

2. Why This Happens						
						
- No clear ownership for validating that each order belongs to one customer.						
- Mistakes occur from manual entry, system joins, or data simulation processes.						
- These issues are invisible in standard dashboards and can propagate silently through reports and analyses.						

---

3. Investigation Approach

In [1]:
import pandas as pd

df = pd.read_csv("C:/Users/loydt/Downloads/ecommerce_data.csv")

In [2]:
df.head()

Unnamed: 0,order_id,order_item_id,customer_id,product_id,campaign_id,campaign_type,channel_id,platform_id,date,customer_name,...,category,item_price,quantity,channel_name,platform_name,impressions,spend,clicks,orders,revenue
0,ORDR75370,ORDR0034097-01,CUST53355,PROD47691,CMP68315,Retention/Loyalty,CHNL_01,PLT2,2024-07-23,Alicia Ward,...,Greeting Cards,102.35,259,Social,Facebook Ads,9,81.26,45,12,26508.65
1,ORDR16384,ORDR0034097-02,CUST06741,PROD41517,CMP10227,Seasonal/Promotional,CHNL_02,PLT1,2024-05-30,Angela Moore,...,Calendars,92.83,219,Paid_Search,Google Ads,17,68.31,47,37,20329.77
2,ORDR83177,ORDR0034097-03,CUST25345,PROD33618,CMP02804,Platform/Channel,CHNL_01,PLT2,2025-10-05,Victoria Foley,...,Greeting Cards,62.44,84,Social,Facebook Ads,24,45.81,112,16,5244.96
3,ORDR90095,ORDR0034097-04,CUST31280,PROD20247,CMP68315,Retention/Loyalty,CHNL_02,PLT1,2025-07-21,Jennifer Garza,...,Greeting Cards,62.44,84,Paid_Search,Google Ads,10,103.67,102,41,5244.96
4,ORDR88055,ORDR0034097-05,CUST25302,PROD88867,CMP10227,Seasonal/Promotional,CHNL_01,PLT2,2025-05-04,Jason Hunt,...,Greeting Cards,102.35,259,Social,Facebook Ads,5,84.49,89,47,26508.65


In [3]:
df[df['order_id'].duplicated()]['order_id']

104    ORDR03238
573    ORDR19629
695    ORDR47322
815    ORDR68798
Name: order_id, dtype: object

In [4]:
# If we are going to filter order ids associated w/ multiple names, simply getting the duplicates does not satisfy our query 
duplicate_order_numbers = df[df['order_id'].duplicated(keep=False)].iloc[:,[0,9]]
duplicate_order_numbers

Unnamed: 0,order_id,customer_name
101,ORDR03238,Rita Gray
104,ORDR03238,Michael Summers
158,ORDR19629,Melissa Clark
307,ORDR68798,Christopher Peterson
312,ORDR47322,Corey Santos
573,ORDR19629,Daniel Werner
695,ORDR47322,Adam Long
815,ORDR68798,Amy Lewis DVM


In [5]:
# Count the number of customers in each order_id

customer_order_transaction = df.groupby('order_id')['customer_name'].nunique()
customer_order_transaction


order_id
ORDR00529    1
ORDR00682    1
ORDR00815    1
ORDR01005    1
ORDR01017    1
            ..
ORDR99581    1
ORDR99586    1
ORDR99634    1
ORDR99778    1
ORDR99905    1
Name: customer_name, Length: 996, dtype: int64

In [6]:
# Return the order_ids that have customers > 2

problematic_transctions = customer_order_transaction[customer_order_transaction > 1]
problematic_transctions

order_id
ORDR03238    2
ORDR19629    2
ORDR47322    2
ORDR68798    2
Name: customer_name, dtype: int64

In [7]:
# Get the index of the series
transactions_in_question = problematic_transctions.index

In [9]:
# Return the order ids with customers > 2 and the names associated. 
result = df[df['order_id'].isin(transactions_in_question)][['order_id', 'customer_name']]
result

Unnamed: 0,order_id,customer_name
101,ORDR03238,Rita Gray
104,ORDR03238,Michael Summers
158,ORDR19629,Melissa Clark
307,ORDR68798,Christopher Peterson
312,ORDR47322,Corey Santos
573,ORDR19629,Daniel Werner
695,ORDR47322,Adam Long
815,ORDR68798,Amy Lewis DVM


---

7. Impact on Financial and Operational Metrics

| Impact Area | Practical Effect on Analytics |
| --- | --- |
| Revenue Accuracy | Order-level revenue may be attributed to the wrong customer during aggregation |
| Downstream Analysis | Metrics such as revenue per customer, repeat orders, and average order value become inconsistent |
| Decision Support | Dashboards and reports built on this data become harder to trust |
| Operational Efficiency | Additional validation and cleanup steps are required in analysis workflows |
| Data Governance | Highlights the need for basic business-rule checks in the pipeline |

---

8. Intervention Framework

| Fix | Business Outcome |
| --- | --- |
| • Enforce single-customer-per-order validation | • Prevents future mismatches |
| • Automated checks in Python | • Early detection of errors |
| • Behavioral KPIs (e.g., duplicates per week) | • Early warning system |
| • Forensic dashboards | • Restore confidence in reports and decision-making |