# EDA: Product Revenue & Boxes Shipped Comparison

**User Story 2:** As a Regional Sales Director, I want to compare revenue and boxes shipped by product (e.g., 50% Dark Bites, Smooth Silky Caramel) in order to identify best-selling products and adjust inventory or promotions.

This notebook demonstrates that the Chocolate Sales dataset can support this user story by visualizing product-level revenue and shipment volume.

In [8]:
import pandas as pd
import plotly.express as px

In [9]:
df = pd.read_csv("../data/raw/Chocolate_Sales.csv")
df["Amount"] = df["Amount"].str.replace(r"[\$,]", "", regex=True).astype(float)
df["Date"] = pd.to_datetime(df["Date"], format="%d/%m/%Y")
df.head()

Unnamed: 0,Sales Person,Country,Product,Date,Amount,Boxes Shipped
0,Jehu Rudeforth,UK,Mint Chip Choco,2022-01-04,5320.0,180
1,Van Tuxwell,India,85% Dark Bars,2022-08-01,7896.0,94
2,Gigi Bohling,India,Peanut Butter Cubes,2022-07-07,4501.0,91
3,Jan Morforth,Australia,Peanut Butter Cubes,2022-04-27,12726.0,342
4,Jehu Rudeforth,UK,Peanut Butter Cubes,2022-02-24,13685.0,184


## Visualization 1: Total Revenue by Product

This bar chart ranks all products by total revenue, allowing the director to quickly spot which products are the top revenue drivers and which may need promotional support or discontinuation.

In [10]:
product_stats = (
    df.groupby("Product")
    .agg(Total_Revenue=("Amount", "sum"), Total_Boxes=("Boxes Shipped", "sum"))
    .reset_index()
)

chocolate_colors = ["#3B1F0B", "#5C3317", "#7B3F00", "#8B4513",
                    "#A0522D", "#B5651D", "#C68E17", "#D2A679",
                    "#DEB887", "#F5DEB3"]

fig = px.bar(
    product_stats.sort_values("Total_Revenue", ascending=True),
    x="Total_Revenue",
    y="Product",
    orientation="h",
    title="Total Revenue by Product",
    labels={"Total_Revenue": "Total Revenue (USD)", "Product": ""},
    hover_data=["Product", "Total_Revenue"],
    color="Total_Revenue",
    color_continuous_scale=chocolate_colors,
    width=700,
    height=500,
)
fig.update_layout(coloraxis_showscale=False)
fig.show()

## Visualization 2: Revenue vs. Boxes Shipped by Product

This scatter plot compares revenue against boxes shipped for each product. Products in the upper-right are high-volume, high-revenue stars. Products with high boxes but low revenue may have pricing issues, while those with high revenue but low boxes indicate premium, high-margin items. This directly supports the director's decision on inventory allocation and promotional strategy.

In [11]:
fig = px.scatter(
    product_stats,
    x="Total_Boxes",
    y="Total_Revenue",
    text="Product",
    title="Revenue vs. Boxes Shipped by Product",
    labels={"Total_Boxes": "Total Boxes Shipped", "Total_Revenue": "Total Revenue (USD)"},
    hover_data=["Product", "Total_Revenue", "Total_Boxes"],
    color="Total_Revenue",
    color_continuous_scale=chocolate_colors,
    width=700,
    height=500,
)
fig.update_traces(textposition="middle right", marker=dict(size=12))
fig.update_layout(coloraxis_showscale=False)
fig.show()

## Summary

These visualizations confirm the dataset supports User Story 2. The bar chart enables direct product revenue ranking, while the scatter plot reveals the relationship between shipment volume and revenue per productâ€”both essential for a Regional Sales Director to identify best-sellers and adjust inventory or promotions accordingly. We can see a clear positive correlation between both variables, though there are some outliers