## Advanced Consistency Check with Hierarchical Data

**Description**: You have two datasets `orders.csv` and `order_items.csv` . Perform a consistency check to ensure each order in `orders.csv` has corresponding items in `order_items.csv` .

In [2]:
import pandas as pd
from io import StringIO

# ---------------------------------------------
# Step 1: Create Simulated CSV Data in Code
# ---------------------------------------------

orders_csv = """
order_id,customer_name,order_date
1001,Alice,2024-06-01
1002,Bob,2024-06-02
1003,Charlie,2024-06-03
1004,Diana,2024-06-04
1005,Evan,2024-06-05
"""

order_items_csv = """
item_id,order_id,product_name,quantity
1,1001,Widget,2
2,1001,Gadget,1
3,1002,Widget,1
4,1003,Doodad,3
5,1005,Thingamajig,2
"""

# Read the CSV strings into DataFrames
orders_df = pd.read_csv(StringIO(orders_csv))
items_df = pd.read_csv(StringIO(order_items_csv))

# ---------------------------------------------
# Step 2: Identify Orders Without Items
# ---------------------------------------------

# Get unique order_ids from order_items.csv
order_ids_with_items = set(items_df['order_id'].unique())

# Find order_ids in orders.csv that are NOT in order_items.csv
orders_df['has_items'] = orders_df['order_id'].isin(order_ids_with_items)
missing_orders = orders_df[~orders_df['has_items']]

# ---------------------------------------------
# Step 3: Print Summary & Results
# ---------------------------------------------

total_orders = len(orders_df)
orders_missing_items = len(missing_orders)
orders_with_items = total_orders - orders_missing_items

print(f"Total Orders: {total_orders}")
print(f"Orders with Items: {orders_with_items}")
print(f"Orders WITHOUT Items: {orders_missing_items}")
print(f"Data Consistency Rate: {(orders_with_items / total_orders) * 100:.1f}%")

if not missing_orders.empty:
    print("\n🚨 Orders Without Items:")
    print(missing_orders[['order_id', 'customer_name', 'order_date']].to_string(index=False))

    # ---------------------------------------------
    # Step 4: Export Results for Manual Review
    # ---------------------------------------------
    missing_orders.to_csv("orders_missing_items.csv", index=False)
    print("\n⚠️ Missing orders saved to 'orders_missing_items.csv'")
else:
    print("\n✅ All orders have corresponding items.")


Total Orders: 5
Orders with Items: 4
Orders WITHOUT Items: 1
Data Consistency Rate: 80.0%

🚨 Orders Without Items:
 order_id customer_name order_date
     1004         Diana 2024-06-04

⚠️ Missing orders saved to 'orders_missing_items.csv'
