In [None]:
import pandas as pd

#Loading
df = pd.read_excel("customer.xlsx")

#Cleaning & Renaming
df = df.rename(columns={
    'InvoiceNo': 'order_id', 'CustomerID': 'customer_id',
    'InvoiceDate': 'date', 'Description': 'product',
    'Quantity': 'quantity', 'UnitPrice': 'price', 'Country': 'country'
})
#Clean
df = df[(df['quantity'] > 0) & (df['price'] > 0)]
df = df.dropna(subset=['product'])
#Agregation by products
df['revenue'] = df['quantity'] * df['price']
product_metrics = df.groupby('product').agg(total_revenue = ('revenue', 'sum'),
orders_count = ('order_id', 'nunique'), total_quantity = ('quantity', 'sum'))
# Top 5 by differently metrics
top_by_revenue = product_metrics.sort_values(by='total_revenue', ascending=False).head(5)
top_by_orders = product_metrics.sort_values(by='orders_count', ascending=False).head(5)
top_by_quantity = product_metrics.sort_values(by='total_quantity', ascending=False).head(5)
#Absolute leaders
best_revenue_product = product_metrics['total_revenue'].idxmax()
best_orders_product = product_metrics['orders_count'].idxmax()
best_quantity_product = product_metrics['total_quantity'].idxmax()
#-----OUTPUT------
print("Top 5 products by revenue:\n", top_by_revenue)
print("\nTop 5 products by number of orders:\n", top_by_orders)
print("\nTop 5 products by quantity sold:\n", top_by_quantity)
print("\n--------Absolute leaders--------")
print(f"Highest revenue product: {best_revenue_product}")
print(f"Most ordered product: {best_orders_product}")
print(f"Most sold units product: {best_quantity_product}")



Top 5 products by revenue:
                                     total_revenue  orders_count  \
product                                                           
DOTCOM POSTAGE                          206248.77           706   
REGENCY CAKESTAND 3 TIER                174484.74          1988   
PAPER CRAFT , LITTLE BIRDIE             168469.60             1   
WHITE HANGING HEART T-LIGHT HOLDER      106292.77          2256   
PARTY BUNTING                            99504.33          1685   

                                    total_quantity  
product                                             
DOTCOM POSTAGE                                 706  
REGENCY CAKESTAND 3 TIER                     13879  
PAPER CRAFT , LITTLE BIRDIE                  80995  
WHITE HANGING HEART T-LIGHT HOLDER           37891  
PARTY BUNTING                                18295  

Top 5 products by number of orders:
                                     total_revenue  orders_count  \
product                   

ðŸ“ˆBusiness Analysis
What happened: After cleaning 11,805 rows of noise, the data reveals three distinct product types:

The Whale: Paper Craft, Little Birdie (Massive volume, 1 single order).

The Service Leader: Dotcom Postage (Highest revenue, not a physical product).

The Traffic Driver: White Hanging Heart (Most frequent orders, lower unit value).

Why it matters:

B2B Risk: Having a top-3 revenue product depend on one single transaction is a business risk. We are not just a retailer; we are an accidental wholesaler.

Logistics Heavy: Postage generating more revenue than any single physical product suggests logistics is a bottleneck or a major profit center.

Operational Friction: The products with the most orders require the most labor (packing, shipping) but contribute less to the gross margin.

What to do:

B2B Strategy: Identify the buyer of the "Little Birdie" and create a VIP wholesale contract.

Upselling: Target the 2,200+ customers of "T-Light Holders" with high-margin "Regency Cakestands" to increase average order value (AOV).

Shipping Audit:: Review the "Postage" revenue. Is it high due to inefficiency or high international demand?