In [None]:
import pandas as pd
import plotly.express as px
df = pd.read_csv("../data/02_DS_product_perf.csv")

### Exploratory Data Analysis

#### Comparison between the sales rank and profit margin rank of the products

In [3]:
df

Unnamed: 0,product_name,total_sales,total_profit,profit_margin,sales_rank,margin_rank
0,Wonka Bar - Triple Dazzle Caramel,28485.0,18610.2,0.653333,1,6
1,Wonka Bar -Scrumdiddlyumptious,27874.8,19357.5,0.694444,2,4
2,Wonka Bar - Milk Chocolate,26867.75,17443.37,0.649231,3,7
3,Wonka Bar - Fudge Mallows,24890.4,16593.6,0.666667,4,5
4,Wonka Bar - Nutty Crunch Surprise,23574.95,16819.95,0.713467,5,3
5,Lickable Wallpaper,7860.0,3930.0,0.5,6,11
6,Kazookles,1205.75,92.75,0.076923,7,15
7,Wonka Gum,597.5,310.7,0.52,8,10
8,Everlasting Gobstopper,130.0,104.0,0.8,9,1
9,Fizzy Lifting Drinks,78.75,47.25,0.6,10,9


- High sales rank does not always mean high profit margin rank and vice versa.

- For example, Triple Dazzle Caramel ranks first in sales but only sixth in profit margin.

- Similarly, Everlasting Gobstopper ranks first in profit margin but only ninth in sales.

In [4]:
df[["total_sales", "total_profit", "profit_margin"]].describe()

Unnamed: 0,total_sales,total_profit,profit_margin
count,15.0,15.0,15.0
mean,9452.242,6229.52,0.573886
std,12561.389876,8523.273337,0.181065
min,12.0,4.8,0.076923
25%,69.0,40.365,0.483333
50%,597.5,104.0,0.623116
75%,24232.675,16706.775,0.680556
max,28485.0,19357.5,0.8


- The mean value of the total sales is much larger than the median. This right skew means that a small number of products account for most of the revenue.

- Same with the total profit, so high profit is coming from a small number of products.

### Visualisations

In [None]:
# Creating a product group with Wonka bars together and others in a separate group
df["product_group"] = df["product_name"].apply(
    lambda x: "Wonka Bars" if x.startswith("Wonka Bar") else "Other Products"
)

df["product_label"] = (
    df["product_name"]
    .str.replace("Wonka Bar - ", "", regex=False)
    .str.replace("Wonka Bar -", "", regex=False)
)

In [16]:
x_mid = df["sales_rank"].median()
y_mid = df["margin_rank"].median()

fig = px.scatter(
    df,
    x="sales_rank",
    y="margin_rank",
    color="product_group",
    text="product_label",
    title="Sales Rank vs Profit Margin Rank by Product",
    labels={
        "sales_rank": "Sales Rank (1 = Highest Sales)",
        "margin_rank": "Profit Margin Rank (1 = Highest Margin)",
        "product_group": "Product Type"
    }
)

fig.update_traces(textposition="top center")

fig.update_layout(
    width=800,
    height=600,
    xaxis=dict(
        autorange="reversed",
        scaleanchor="y",
        scaleratio=1
    ),
    yaxis=dict(autorange="reversed"),
    margin=dict(l=60, r=40, t=80, b=60)
)

fig.add_shape(
    type="line",
    x0=1,
    y0=1,
    x1=df["sales_rank"].max(),
    y1=df["margin_rank"].max(),
    line=dict(dash="dash")
)

fig.add_shape(
    type="line",
    x0=x_mid,
    x1=x_mid,
    y0=df["margin_rank"].min(),
    y1=df["margin_rank"].max(),
    line=dict(dash="dot", color="grey")
)

fig.add_shape(
    type="line",
    x0=df["sales_rank"].min(),
    x1=df["sales_rank"].max(),
    y0=y_mid,
    y1=y_mid,
    line=dict(dash="dot", color="grey")
)

fig.show()

- Wonka Bar – Triple Dazzle Caramel (Sales Rank **1**, Margin Rank **6**)
- Wonka Bar – Milk Chocolate (Sales Rank **3**, Margin Rank **7**)
- Wonka Bar – Fudge Mallows (Sales Rank **4**, Margin Rank **5**)

These products generate large revenue, but their profit margins are only mid-range, not top performers.

- Everlasting Gobstopper - Sales Rank 9 but has the highest profit margin, Margin Rank 1 (80%)
- Hair Toffee - Sales Rank 11 but Margin Rank 2 (≈78%)

These products contribute less revenue, but are far more efficient in generating profit per unit sold.

- Some products, like Kazookles, Fun Dip, Nerds, are weak on both dimensions - low sales and very low profit margins.

These products neither drive revenue nor profitability.