## Advanced Consistency Check with Hierarchical Data

**Description**: You have two datasets `orders.csv` and `order_items.csv` . Perform a consistency check to ensure each order in `orders.csv` has corresponding items in `order_items.csv` .

In [1]:
# Write your code from here
import pandas as pd
from io import StringIO

# Sample data for orders.csv (replace with your actual file)
orders_data = """order_id,customer_id,order_date
ORD001,CUST001,2025-05-10
ORD002,CUST002,2025-05-11
ORD003,CUST001,2025-05-12
ORD004,CUST003,2025-05-13
ORD005,CUST002,2025-05-14
"""
orders_df = pd.read_csv(StringIO(orders_data))

# Sample data for order_items.csv (replace with your actual file)
order_items_data = """item_id,order_id,product_id,quantity,price
ITEM001,ORD001,PROD001,2,25.50
ITEM002,ORD001,PROD002,1,12.00
ITEM003,ORD002,PROD003,3,45.75
ITEM004,ORD004,PROD001,1,25.50
ITEM005,ORD004,PROD004,2,9.99
ITEM006,ORD005,PROD002,1,12.00
ITEM007,ORD005,PROD005,4,15.00
"""
order_items_df = pd.read_csv(StringIO(order_items_data))

# Check for orders in orders.csv that are not present in order_items.csv
orders_without_items = orders_df[~orders_df['order_id'].isin(order_items_df['order_id'])]

# Check for order_ids in order_items.csv that are not present in orders.csv (optional, might indicate orphaned items)
items_without_orders = order_items_df[~order_items_df['order_id'].isin(orders_df['order_id'])]

print("Orders in 'orders.csv' without corresponding items in 'order_items.csv':")
if not orders_without_items.empty:
    print(orders_without_items)
else:
    print("All orders in 'orders.csv' have corresponding items.")

print("\nItems in 'order_items.csv' without corresponding orders in 'orders.csv' (orphaned items):")
if not items_without_orders.empty:
    print(items_without_orders)
else:
    print("All items in 'order_items.csv' have corresponding orders.")

# Calculate the consistency rate (percentage of orders with items)
total_orders = len(orders_df)
orders_with_items_count = total_orders - len(orders_without_items)
consistency_rate = (orders_with_items_count / total_orders) * 100 if total_orders > 0 else 0

print(f"\nConsistency Rate (Orders with Corresponding Items): {consistency_rate:.2f}%")

Orders in 'orders.csv' without corresponding items in 'order_items.csv':
  order_id customer_id  order_date
2   ORD003     CUST001  2025-05-12

Items in 'order_items.csv' without corresponding orders in 'orders.csv' (orphaned items):
All items in 'order_items.csv' have corresponding orders.

Consistency Rate (Orders with Corresponding Items): 80.00%
