In [8]:
import pandas as pd

# 1. Loading
df = pd.read_excel("customer.xlsx")

# 2. Cleaning & Renaming
df = df.rename(columns={
    'InvoiceNo': 'order_id', 'CustomerID': 'customer_id',
    'InvoiceDate': 'date', 'Description': 'product',
    'Quantity': 'quantity', 'UnitPrice': 'price', 'Country': 'country'
})

# --- DAY 5 SPECIAL: DATA INTEGRITY ---
initial_count = df.shape[0]
df = df[(df['quantity'] > 0) & (df['price'] > 0)]
df = df.dropna(subset=['product'])
cleaned_count = initial_count - df.shape[0]

print(f"ðŸ§¹ Cleaned up {cleaned_count} rows (returns, zero prices, or missing products).")

# 3. Revenue & Pareto Analysis
df['revenue'] = df['quantity'] * df['price']
product_revenue = df.groupby('product').agg(
    total_revenue=('revenue', 'sum'), 
    orders_count=('order_id', 'nunique')
).sort_values(by='total_revenue', ascending=False)


total_rev_sum = product_revenue['total_revenue'].sum()
product_revenue['revenue_share'] = product_revenue['total_revenue'] / total_rev_sum
product_revenue['cumulative_share'] = product_revenue['revenue_share'].cumsum()

# (80% revenue)
core_products = product_revenue[product_revenue['cumulative_share'] <= 0.8]

# --- OUTPUT ---
print("-" * 30)
print(f"Core products count: {core_products.shape[0]}")
print(f"Core products revenue share: {core_products['revenue_share'].sum():.2%}")
print("-" * 30)
print("\nTop 5 Core Products:\n", core_products.head(5))



ðŸ§¹ Cleaned up 11805 rows (returns, zero prices, or missing products).
------------------------------
Core products count: 828
Core products revenue share: 79.99%
------------------------------

Top 5 Core Products:
                                     total_revenue  orders_count  \
product                                                           
DOTCOM POSTAGE                          206248.77           706   
REGENCY CAKESTAND 3 TIER                174484.74          1988   
PAPER CRAFT , LITTLE BIRDIE             168469.60             1   
WHITE HANGING HEART T-LIGHT HOLDER      106292.77          2256   
PARTY BUNTING                            99504.33          1685   

                                    revenue_share  cumulative_share  
product                                                              
DOTCOM POSTAGE                           0.019336          0.019336  
REGENCY CAKESTAND 3 TIER                 0.016358          0.035694  
PAPER CRAFT , LITTLE BIRDIE     

ðŸ“ˆ Business Analysis

What happened:
A small portion of products generates a disproportionately large share of total revenue.

Why it matters:
These products drive cash flow, determine advertising efficiency, and heavily influence inventory management.

What to do:
Focus marketing efforts on core products, place upsell offers alongside them, and maintain tighter control over the rest of the assortment.