## This report analyzes monthly business performance using delivered orders.
### Revenue shows clear growth trends with seasonality effects, while average order value remains relatively stable.

 This notebook prepares a Monthly Business Review for leadership by analyzing revenue trends, product performance, cancellations, and regional demand using delivered orders only.

In [1]:
import pandas as pd

fact = pd.read_parquet("/E2E Data Analysis Project/data/processed/fact_order_items.parquet")
fact.head()


Unnamed: 0,order_id,order_item_id,product_id,seller_id,price,freight_value,customer_id,order_status,order_purchase_timestamp,product_category_name,customer_unique_id,customer_state,payment_value_total,payment_installments_max,payment_type_nunique,payment_type_primary,purchase_month,item_revenue,item_total_value
0,00010242fe8c5a6d1ba2dd792cb16214,1,4244733e06e7ecb4970a6e2683c13e61,48436dade18ac8b2bce089ec2a041202,58.9,13.29,3ce436f183e68e07877b285a838db11a,delivered,2017-09-13 08:59:02,cool_stuff,871766c5855e863f6eccc05f988b23cb,RJ,72.19,2.0,1.0,credit_card,2017-09,58.9,72.19
1,00018f77f2f0320c557190d7a144bdd3,1,e5f2d52b802189ee658865ca93d83a8f,dd7ddc04e1b6c2c614352b383efe2d36,239.9,19.93,f6dd3ec061db4e3987629fe6b26e5cce,delivered,2017-04-26 10:53:06,pet_shop,eb28e67c4c0b83846050ddfb8a35d051,SP,259.83,3.0,1.0,credit_card,2017-04,239.9,259.83
2,000229ec398224ef6ca0657da4fc703e,1,c777355d18b72b67abbeef9df44fd0fd,5b51032eddd242adc84c38acab88f23d,199.0,17.87,6489ae5e4333f3693df5ad4372dab6d3,delivered,2018-01-14 14:33:31,moveis_decoracao,3818d81c6709e39d06b2738a8d3a2474,MG,216.87,5.0,1.0,credit_card,2018-01,199.0,216.87
3,00024acbcdf0a6daa1e931b038114c75,1,7634da152a4610f1595efa32f14722fc,9d7a1d34a5052409006425275ba1c2b4,12.99,12.79,d4eb9395c8c0431ee92fce09860c5a06,delivered,2018-08-08 10:00:35,perfumaria,af861d436cfc08b2c2ddefd0ba074622,SP,25.78,2.0,1.0,credit_card,2018-08,12.99,25.78
4,00042b26cf59d7ce69dfabb4e55b4fd9,1,ac6c3623068f30de03045865e4e10089,df560393f3a51e74553ab94004ba5c87,199.9,18.14,58dbd0b2d70206bf40e62cd34e84d795,delivered,2017-02-04 13:57:51,ferramentas_jardim,64b576fb70d441e8f1b2d7d446e483c5,SP,218.04,3.0,1.0,credit_card,2017-02,199.9,218.04


In [2]:
fact_delivered = fact[fact["order_status"] == "delivered"].copy()


In [3]:
monthly_kpis = (
    fact_delivered
    .groupby("purchase_month")
    .agg(
        revenue=("item_revenue", "sum"),
        orders=("order_id", "nunique"),
        items_sold=("order_item_id", "count"),
    )
    .reset_index()
)

monthly_kpis["aov"] = monthly_kpis["revenue"] / monthly_kpis["orders"]
monthly_kpis["mom_revenue_growth"] = monthly_kpis["revenue"].pct_change()

monthly_kpis.head()


Unnamed: 0,purchase_month,revenue,orders,items_sold,aov,mom_revenue_growth
0,2016-09,134.97,1,3,134.97,
1,2016-10,40325.11,265,313,152.170226,297.770912
2,2016-12,10.9,1,1,10.9,-0.99973
3,2017-01,111798.36,750,913,149.06448,10255.730275
4,2017-02,234223.4,1653,1858,141.695947,1.095052


In [4]:
monthly_kpis.describe()


Unnamed: 0,revenue,orders,items_sold,aov,mom_revenue_growth
count,23.0,23.0,23.0,23.0,22.0
mean,574847.743913,4194.695652,4791.173913,133.034398,479.786062
std,337263.532856,2481.181398,2845.456468,27.595875,2184.407859
min,10.9,1.0,1.0,10.9,-0.99973
25%,349934.265,2424.5,2733.0,132.165182,-0.04715
50%,607399.67,4193.0,4797.0,137.99725,0.081128
75%,862015.66,6453.0,7330.0,143.98583,0.39569
max,987765.37,7289.0,8475.0,152.170226,10255.730275


In [5]:
monthly_kpis_clean = monthly_kpis[monthly_kpis["orders"] >= 100].copy()
monthly_kpis_clean


Unnamed: 0,purchase_month,revenue,orders,items_sold,aov,mom_revenue_growth
1,2016-10,40325.11,265,313,152.170226,297.770912
3,2017-01,111798.36,750,913,149.06448,10255.730275
4,2017-02,234223.4,1653,1858,141.695947,1.095052
5,2017-03,359198.85,2546,2897,141.083602,0.533574
6,2017-04,340669.68,2303,2569,147.924307,-0.051585
7,2017-05,489338.25,3546,4004,137.99725,0.436401
8,2017-06,421923.37,3135,3489,134.584807,-0.137767
9,2017-07,481604.52,3872,4416,124.381333,0.14145
10,2017-08,554699.7,4193,4797,132.291844,0.151774
11,2017-09,607399.67,4150,4737,146.361366,0.095006


In [6]:
monthly_kpis_clean["mom_revenue_growth"] = (
    monthly_kpis_clean["revenue"].pct_change()
)


In [7]:
monthly_kpis_clean.head()


Unnamed: 0,purchase_month,revenue,orders,items_sold,aov,mom_revenue_growth
1,2016-10,40325.11,265,313,152.170226,
3,2017-01,111798.36,750,913,149.06448,1.772425
4,2017-02,234223.4,1653,1858,141.695947,1.095052
5,2017-03,359198.85,2546,2897,141.083602,0.533574
6,2017-04,340669.68,2303,2569,147.924307,-0.051585


In [9]:
category_kpis = (
    fact_delivered
    .groupby("product_category_name")
    .agg(
        revenue=("item_revenue", "sum"),
        orders=("order_id", "nunique"),
        items_sold=("order_item_id", "count"),
    )
    .reset_index()
    .sort_values("revenue", ascending=False)
)

category_kpis["revenue_share"] = (
    category_kpis["revenue"] / category_kpis["revenue"].sum()
)

category_kpis.head(5)


Unnamed: 0,product_category_name,revenue,orders,items_sold,revenue_share
11,beleza_saude,1233131.72,8647,9465,0.094487
66,relogios_presentes,1166176.98,5495,5859,0.089357
13,cama_mesa_banho,1023434.76,9272,10953,0.078419
32,esporte_lazer,954852.55,7530,8431,0.073164
44,informatica_acessorios,888724.61,6530,7644,0.068097


A small number of product categories contribute a disproportionate share of total revenue, indicating a classic Pareto distribution.
Focusing marketing and inventory planning on top categories could significantly improve revenue efficiency.

In [10]:
category_kpis_sorted = category_kpis.sort_values("revenue", ascending=False).copy()
category_kpis_sorted["cumulative_revenue_share"] = category_kpis_sorted["revenue_share"].cumsum()

category_kpis_sorted.head(10)


Unnamed: 0,product_category_name,revenue,orders,items_sold,revenue_share,cumulative_revenue_share
11,beleza_saude,1233131.72,8647,9465,0.094487,0.094487
66,relogios_presentes,1166176.98,5495,5859,0.089357,0.183844
13,cama_mesa_banho,1023434.76,9272,10953,0.078419,0.262264
32,esporte_lazer,954852.55,7530,8431,0.073164,0.335428
44,informatica_acessorios,888724.61,6530,7644,0.068097,0.403526
54,moveis_decoracao,711927.69,6307,8160,0.054551,0.458076
72,utilidades_domesticas,615628.69,5743,6795,0.047172,0.505248
26,cool_stuff,610204.1,3559,3718,0.046756,0.552004
8,automotivo,578966.65,3810,4140,0.044363,0.596367
12,brinquedos,471286.48,3804,4030,0.036112,0.632479


In [11]:
(category_kpis_sorted["cumulative_revenue_share"] <= 0.8).sum()


16

Revenue is highly concentrated across product categories.
Approximately 16 categories contribute to 80% of total revenue, indicating a strong Pareto effect.
Strategic focus on these categories can maximize return on marketing and operational investments.

### Cancellations and lost revenue

In [12]:
fact_cancelled = fact[fact["order_status"] == "canceled"].copy()


In [13]:
cancelled_kpis = (
    fact_cancelled
    .groupby("purchase_month")
    .agg(
        cancelled_revenue=("item_revenue", "sum"),
        cancelled_orders=("order_id", "nunique"),
    )
    .reset_index()
)
cancelled_kpis.head()


Unnamed: 0,purchase_month,cancelled_revenue,cancelled_orders
0,2016-09,59.5,1
1,2016-10,2992.67,12
2,2017-01,214.6,2
3,2017-02,2343.67,15
4,2017-03,6002.98,24


In [14]:
revenue_compare = (
    monthly_kpis_clean
    .merge(cancelled_kpis, on="purchase_month", how="left")
    .fillna(0)
)

revenue_compare["cancelled_revenue_share"] = (
    revenue_compare["cancelled_revenue"] / revenue_compare["revenue"]
)

revenue_compare.head()


Unnamed: 0,purchase_month,revenue,orders,items_sold,aov,mom_revenue_growth,cancelled_revenue,cancelled_orders,cancelled_revenue_share
0,2016-10,40325.11,265,313,152.170226,0.0,2992.67,12,0.074214
1,2017-01,111798.36,750,913,149.06448,1.772425,214.6,2,0.00192
2,2017-02,234223.4,1653,1858,141.695947,1.095052,2343.67,15,0.010006
3,2017-03,359198.85,2546,2897,141.083602,0.533574,6002.98,24,0.016712
4,2017-04,340669.68,2303,2569,147.924307,-0.051585,6084.25,14,0.01786


Cancelled revenue represents a small but non-negligible share of potential sales.
Reducing cancellations even marginally could lead to meaningful revenue uplift without additional acquisition costs.

### Regions

In [15]:
state_kpis = (
    fact_delivered
    .groupby("customer_state")
    .agg(
        revenue=("item_revenue", "sum"),
        orders=("order_id", "nunique"),
    )
    .reset_index()
    .sort_values("revenue", ascending=False)
)

state_kpis.head(10)


Unnamed: 0,customer_state,revenue,orders
25,SP,5067633.16,40501
18,RJ,1759651.13,12350
10,MG,1552481.83,11354
22,RS,728897.47,5345
17,PR,666063.51,4923
23,SC,507012.13,3546
4,BA,493584.14,3256
6,DF,296498.41,2080
8,GO,282836.7,1957
7,ES,268643.45,1995


In [17]:
state_kpis["revenue_share"] = state_kpis["revenue"] / state_kpis["revenue"].sum()
state_kpis["cumulative_revenue_share"] = state_kpis["revenue_share"].cumsum()

state_kpis.head()


Unnamed: 0,customer_state,revenue,orders,revenue_share,cumulative_revenue_share
25,SP,5067633.16,40501,0.383287,0.383287
18,RJ,1759651.13,12350,0.13309,0.516378
10,MG,1552481.83,11354,0.117421,0.633799
22,RS,728897.47,5345,0.05513,0.688928
17,PR,666063.51,4923,0.050377,0.739306


Revenue is geographically concentrated, with a small number of states contributing a majority of sales.
These regions should be prioritized for logistics optimization and customer experience improvements.