# Business Analytics

In this notebook we will explore the business growth and different KPIs that might be useful to understand the business better.

In general, we will look at changes overtime and try to see acquisition, retention and offering trends.


In [12]:
import pandas as pd
import plotly.express as px

In [13]:
date_cols = ['order_purchase_timestamp', 'order_approved_at', 'order_delivered_carrier_date',
             'order_delivered_customer_date', 'order_estimated_delivery_date']
training_set = pd.read_csv('../data/train_df.csv', parse_dates=date_cols)

In [None]:
# First of, let's do the number of monthly orders
date_col = 'order_purchase_timestamp'
orders_per_month = training_set.resample('ME', on=date_col).size().reset_index(name='order_count')

# let's plot the number of orders per month
fig = px.bar(orders_per_month, x=date_col, y='order_count',color='order_count', title='Number of Orders per Month')
fig.add_trace(px.line(orders_per_month, x=date_col, y='order_count').data[0])
fig.update_layout(hovermode="x unified")
fig.show()

We can see that the number of orders has been growing over time, reaching a peak at the end of 2017.  
2018 started off quite strong, not dropping below 6K orders per month.

In [51]:
# First of, let's do the number of monthly orders
sales_per_month = training_set.resample('ME', on=date_col)['price'].sum().reset_index(name='sales_total')

# let's plot the total sales (order prices) per month
fig = px.bar(sales_per_month, x=date_col, y='sales_total',color='sales_total', title='Total Sales per Month')
fig.add_trace(px.line(sales_per_month, x=date_col, y='sales_total').data[0])
fig.update_layout(hovermode="x unified")
fig.show()s


In [16]:
customers_per_month = training_set.resample('ME', on=date_col)['customer_unique_id'].nunique().reset_index(name='customer_count')

# let's plot the number of customers per month
fig = px.bar(customers_per_month, x=date_col, y='customer_count',color='customer_count', title='Number of Customers per Month')
fig.add_trace(px.line(customers_per_month, x=date_col, y='customer_count').data[0])
fig.update_layout(hovermode="x unified")
fig.show()

The plot above shows the number of unique customer per month.  
We can clearly see the growth in active customers over time which is quite impressive.  

We should analyze new VS returning customers to better understand the growth trend.
Let's see how many orders are made by the customers, this would help us understand the customer lifetime value.


In [None]:
# for each customer, check how much orders they made
training_set['customer_order_count'] = training_set.groupby('customer_unique_id')['order_id'].transform('nunique')
customers_orders_tally = pd.DataFrame(training_set['customer_order_count'].value_counts())

# add a column with the percentage of customers that made that many orders
customers_orders_tally['percentage'] = customers_orders_tally['count'] / customers_orders_tally['count'].sum()
customers_orders_tally['cumsum'] = customers_orders_tally.percentage.cumsum()
customers_orders_tally

Unnamed: 0_level_0,count,percentage,cumsum
customer_order_count,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,84797,0.934505,0.934505
2,5115,0.05637,0.990875
3,609,0.006711,0.997587
4,91,0.001003,0.998589
5,49,0.00054,0.999129
6,44,0.000485,0.999614
9,14,0.000154,0.999769
12,12,0.000132,0.999901
7,9,9.9e-05,1.0


Well, there is essentially **no significant customer retention**:
 -  Only ~6% of customers made more than 1 order. 
 -  Only ~1% of customers made more than 2 orders. 

In [None]:
monthly_sellers = training_set.resample('ME', on=date_col)['seller_id'].nunique().reset_index(name='seller_count')

# Initialize a set to keep track of all unique sellers seen so far
seen_sellers = set()
new_sellers = []

# Loop through each month to identify new sellers
for index, row in monthly_sellers.iterrows():
    current_month_sellers = training_set[training_set[date_col].dt.to_period('M') == row[date_col].to_period('M')]['seller_id'].unique()
    new_sellers_this_month = set(current_month_sellers) - seen_sellers
    new_sellers.append(len(new_sellers_this_month))
    seen_sellers.update(current_month_sellers)

monthly_sellers['new_seller_count'] = new_sellers

# Let's plot the number of new sellers per month
fig = px.bar(monthly_sellers, x=date_col, y='new_seller_count', color='new_seller_count', title='Number of New Sellers per Month')
fig.add_trace(px.line(monthly_sellers, x=date_col, y='new_seller_count').data[0])
fig.update_layout(hovermode="x unified")
fig.show()


Looks like the growth in sellers is more of a seasonal growth rather than linear.  
While it is hard to interpret this graph by itself, in general it looks like the platform is attracting new sellers at a lower rate than new customers - which could mean that the platform is becoming more competitive (more demand but supply is not growing as fast).

