# PF Sales Projections: Q3-Q4 '23
#### Adrafül Labs

**Notes:**

Sales will be projected using a combination of:
- customer retention on historical sales
- aov on historical sales
- partially grown with projected marketing spend, which can be found in `expenses.ipynb`
- application of (user-defined level of) random noise on smoothed sales projections to model inventory stress

Charts will be produced using the plotly charting library.

All future projections are estimates.

_______

# Historical Sales Analysis

We start by reading in all of the local data.

In [19]:
# imports
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import math

In [20]:
# read historical sales data
sales = pd.read_csv('../../data/imports/shopify/sales_2023-05-01_2023-08-01.csv')

# read historical shipping data
shipping_df = pd.read_excel('../../data/imports/excel/pirateship_shipping_export.xlsx')

# get average order value
aov = (sales[sales['day'] > '2023-05-17']['gross_sales'] / sales[sales['day'] > '2023-05-17']['orders']).mean()
print(f'AOV: {aov}')

# define retention metrics
TARGET_RETENTION = 0.4
TARGET_DAYS_TO_REORDER = 14

# read in marketing spend df
adspend_df = pd.read_excel('../../data/imports/excel/marketing_projected_expenses.xlsx')

AOV: 38.44194795783551


_______

#### Projections:

I will first define the algebra behind our sales projections. These formulas will reference variables that have already been discovered in the previous section. Sales will also be grown with marketing spend data, which will be pulled from an external source found in the data directory.

**Formulas:**

Sales at time $t$ is equal to the sum of order value from new customers and order value from repeat customers.

$sales_t$ = $AOV$($new\_customers_t$ + $repeat\_customers_t$), where
- $AOV$: Average Order Value on All Orders (defined above)

Quantity new customers at time $t$ is equal to a conversion rate times total ad spend less retargeting ad spend at time $t$.

$new\_customers_t$ = $CR$($cumulative\_ad\_spend_t$ - $retargeting\_ad\_spend_t$), where
- $CR$: Number of new customers per $1 ad spend

Quantity repeat customers at time $t$ is equal to the number of repeat customers from the previous timestep plus new repeat customers, which is approximately equal to an aggregate retention rate times the number of new customers from a number of days before $t$ which we find to be the average number of days between repeat orders.

$repeat\_customers_t$ = $RR$($new\_customers_{t - ADRO}$) + $repeat\_customers_{t-1}$

The miniscule sample size, poor attribution tracking, and higher propensity of our early adopters has left us with an abnormally-high ROAS. Additionally, the repetitive sellouts have clouded our ability to accurately gauge retention.

Our ROAS currently sits around 50x. The industry average for f&b / supplement ecommerce seems to be 2-3x.

Our retention rate has a wide MoM range from 59% to 15% due to our inability to remain in-stock.

To address this uncertainty, we will be using an exponential decay in ROAS from current levels to our estimate of a terminal ROAS by EOY as well as an average of retention rate when we've remained in-stock (computations for this can be found in data/imports/customer_retention.xlsx).

In [21]:
# Our advertising conversion rate will follow an exponential decay from our current conversion rate to our target conversion rate

ROAS1 = 25  # <-- conservative decrease on our actual ROAS
ROAS2 = 5

NCR1 = ROAS1 / aov
NCR2 = ROAS2 / aov

roas_decay = []
decay_speed = 0.05
for i in range(len(adspend_df['date'])):
    decay_factor = math.exp(-decay_speed * i)
    current_value = ROAS2 + (ROAS1 - ROAS2) * decay_factor
    roas_decay.append(current_value)
    
conversion_rates = [i / aov for i in roas_decay]

# Linearly grow retention rates from 0.2 to 0.4 over the date period.
RET1 = 0.2
RET2 = 0.4

retention_rates = []
for index in adspend_df.index:
    if index == 0:
        retention_rates.append(RET1)
    else:
        retention_rates.append(RET1 + (index / (len(adspend_df.index) - 1)) * (RET2 - RET1))

In [57]:
# Plot conversion rates
cr_fig = go.Figure(data=go.Scatter(x=adspend_df['date'], y=roas_decay, name='ROAS'))
cr_fig.update_layout(title='ROAS Decay', xaxis_title='Date', yaxis_title='ROAS')
cr_fig.update_traces(line_color='#052c38')
cr_fig.update_layout(plot_bgcolor='#ece9f1')
# change the font to courier
cr_fig.update_layout(font_family='Courier New')
# Make the text bold
cr_fig.update_layout(title_font=dict(size=25, family='Courier New', color='#052c38'))
cr_fig.show()

In [58]:
# Plot retention rates
rr_fig = go.Figure(data=go.Scatter(x=adspend_df['date'], y=retention_rates, name='Retention Rates'))
rr_fig.update_traces(line_color='#052c38')
rr_fig.update_layout(plot_bgcolor='#ece9f1')
rr_fig.update_layout(title='Retention Rates over Time', xaxis_title='Date', yaxis_title='Retention Rate') # #ece9f1
rr_fig.show()

We will grow new customers as a function of retargeting and new marketing spend. Retargeting ad spend goes to customers who have visited the site but not yet converted.

We will assume a constant percent of total ad spend on retention to maintain our linear retention rate improvement.

In [56]:
adspend_fig = go.Figure(data=go.Scatter(x=adspend_df['date'], y=adspend_df['total_marketing_spend'], name='Total Marketing Spend'))
adspend_fig.add_trace(go.Scatter(x=adspend_df['date'], y=adspend_df['retargeting_marketing_spend'], name='Retargeting Spend'))
adspend_fig.add_trace(go.Scatter(x=adspend_df['date'], y=adspend_df['retention_marketing_spend'], name='Retention Spend'))
adspend_fig.add_trace(go.Scatter(x=adspend_df['date'], y=adspend_df['new_marketing_spend'], name='New Spend'))
adspend_fig.update_layout(title_text='Daily Marketing Spend')
adspend_fig.update_layout(plot_bgcolor='#ece9f1')
adspend_fig.show()

In [44]:
# Group the date column by weeks.
wkly_df = pd.DataFrame(columns=['week', 'acq_spend', 'retargeting_spend', 'retention_spend'])
# Group the sales in the sales_df into chunks of seven (making them weekly) and add the sales to wkly_sales_df.
for i in range(0, len(adspend_df), 7):
    wkly_df = wkly_df.append({'week': adspend_df['date'][i], 'acq_spend': adspend_df['new_marketing_spend'][i:i+7].sum(), 'retargeting_spend': adspend_df['retargeting_marketing_spend'][i:i+7].sum(), 'retention_spend': adspend_df['retention_marketing_spend'][i:i+7].sum()}, ignore_index=True)
wkly_df.head()


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated a

Unnamed: 0,week,acq_spend,retargeting_spend,retention_spend
0,2023-08-01,105.0,0.0,0.0
1,2023-08-08,105.0,0.0,0.0
2,2023-08-15,105.0,0.0,0.0
3,2023-08-22,105.0,0.0,0.0
4,2023-08-29,1645.0,300.0,100.0


In [45]:
wkly_df.to_csv('../../data/wklymktg.csv')

To forecast sales, we will grow the new customer base and repeat customer base separately. The functions for these can be found above. We will add the current number of repeat customers we actually have to the first index in the returning customers dataframe.

In [25]:
# grow new customer base

new_customers_df = pd.DataFrame(columns=['date', 'count'])
new_customers_rolling_sum = [0]
for row in adspend_df.itertuples():
    new_customers_df = new_customers_df.append({
        'date': row.date,
        'count': int((row.new_marketing_spend + row.retention_marketing_spend) * conversion_rates[row.Index])
    }, ignore_index=True)
    new_customers_rolling_sum.append(new_customers_rolling_sum[-1] + (row.new_marketing_spend + row.retention_marketing_spend) * conversion_rates[row.Index])


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated a

In [26]:
# grow returning customer base

returning_customers_df = pd.DataFrame(columns=['date', 'count'])
returning_customers_df['date'] = new_customers_df['date']
returning_customers_df['count'] = 0
returning_customers_df.head()

for row in returning_customers_df.itertuples():
    new_count = 0
    indx = max(0, row.Index - 14)
    new_count = int(retention_rates[row.Index] * new_customers_df.loc[indx, 'count'])
    returning_customers_df.loc[row.Index, 'count'] = new_count

returning_customers_df.loc[0, 'count'] = 140

In [59]:
new_customers_sum = np.cumsum(new_customers_df['count'])
returning_customers_sum = np.cumsum(returning_customers_df['count'])

# Plot customer base growth
customer_base_fig = go.Figure(data=go.Scatter(x=new_customers_df['date'], y=new_customers_rolling_sum, name='New Customer Base'))
customer_base_fig.add_trace(go.Scatter(x=new_customers_df['date'], y=returning_customers_sum, name='Returning Customer Base'))
customer_base_fig.update_layout(title='Customer Base Growth', xaxis_title='Date', yaxis_title='Customers')
customer_base_fig.update_layout(plot_bgcolor='#ece9f1')
customer_base_fig.show()

**Addition of Random Noise & Shocks**


The purpose of this will be to introduce random noise to the number of new and retained customers on any given day. Random noise and volatility of traffic are helpful in modeling inventory / stress.

The user can modify the relative volatility of sales traffic by interfering with the `noisiness` variable in the below method. We assume a higher volatility for new customers and use .25 for new customers and .10 for returning customers.

In [28]:
def add_noise(dataframe, noisiness):
    # 'noisiness' is the desired noisiness level (e.g., 0.1 for 10% noisiness)
    
    df_with_noise = dataframe.copy()

    # Calculate the standard deviation based on the desired noisiness
    std_dev = dataframe['count'].mean() * noisiness

    # Generate random noise using a normal distribution with mean 0 and standard deviation 'std_dev'
    noise = np.random.normal(loc=0, scale=std_dev, size=len(dataframe))

    df_with_noise['count'] += noise

    # ensure the count values are all at least 0
    df_with_noise['count'] = df_with_noise['count'].clip(lower=0)

    return df_with_noise

enriched_new_customers_df = add_noise(new_customers_df, .25)
enriched_returning_customers_df = add_noise(returning_customers_df, .1)


In [29]:
# Plot sales breakdown

noise_fig = go.Figure(data=go.Scatter(x=new_customers_df['date'], y=new_customers_df['count'], name="Original"))
noise_fig.add_trace(go.Scatter(x=new_customers_df['date'], y=enriched_new_customers_df['count'], name="Added Noise"))
noise_fig.update_layout(title="Addition of Random Noise to Customer Acquisition", xaxis_title="Date", yaxis_title="Customer Count")
noise_fig.update_layout(plot_bgcolor='#ece9f1')
noise_fig.show()

Each data point shows the expected number of new customers on any given day. The numbers sharply increase due to MoM changes in maketing budget, and they slope downwards due to our exponentially decreasing ROAS assumption.

**Building Sales**

Quantity new customers at time $t$ is equal to a conversion rate times total ad spend less retargeting ad spend at time $t$.

$new\_customers_t$ = $CR$($cumulative\_ad\_spend_t$ - $retargeting\_ad\_spend_t$), where
- $CR$: Number of new customers per $1 ad spend

In [30]:
new_customer_sales = pd.DataFrame(columns=['date', 'gross_sales'])
new_customer_sales['date'] = enriched_new_customers_df['date']
new_customer_sales['gross_sales'] = enriched_new_customers_df['count'] * aov

Quantity repeat customers at time $t$ is equal to the number of repeat customers from the previous timestep plus new repeat customers, which is approximately equal to an aggregate retention rate times the number of new customers from a number of days before $t$ which we find to be the average number of days between repeat orders.

$repeat\_customers_t$ = $RR$($new\_customers_{t - ADRO}$) + $repeat\_customers_{t-1}$

In [31]:
returning_customer_sales = pd.DataFrame(columns=['date', 'gross_sales'])
returning_customer_sales['date'] = enriched_returning_customers_df['date']
returning_customer_sales['gross_sales'] = 0

for row in returning_customer_sales.itertuples():
    
    if row.Index > 14:
        if row.Index < 59:
            sales = 150 + aov * enriched_returning_customers_df.loc[row.Index - 14, 'count']
            returning_customer_sales.loc[row.Index, 'gross_sales'] = sales
        else:  # Allowing returning customers to catch up before dropping off existing base...
            sales = aov * enriched_returning_customers_df.loc[row.Index - 14, 'count']
            returning_customer_sales.loc[row.Index, 'gross_sales'] = sales
    else:  # use starting customers
        sales = 150
        returning_customer_sales.loc[row.Index, 'gross_sales'] = sales

In [32]:
cum_returning_sales = np.cumsum(returning_customer_sales['gross_sales'])
cum_new_sales = np.cumsum(new_customer_sales['gross_sales'])
cum_total_sales = cum_returning_sales + cum_new_sales

In [52]:
# Plot Daily new Customer sales as bars
new_sales_fig = go.Figure(data=[go.Bar(x=new_customer_sales['date'], y=new_customer_sales['gross_sales'])])
new_sales_fig.update_layout(title='Daily New Customer Sales', xaxis_title='Date', yaxis_title='Gross Sales')
new_sales_fig.update_layout(plot_bgcolor='#ece9f1')
new_sales_fig.show()

In [53]:
# Plot Daily Returning Customer sales as bars
ret_sales_fig = go.Figure(data=[go.Bar(x=returning_customer_sales['date'], y=returning_customer_sales['gross_sales'])])
ret_sales_fig.update_layout(title='Daily Returning Customer Sales', xaxis_title='Date', yaxis_title='Gross Sales')
ret_sales_fig.update_layout(plot_bgcolor='#ece9f1')
ret_sales_fig.show()

Solving for returning customer sales is more complicated than new customer sales. Here, we start of by assuming a continuation of the average repeat orders per day we already recieve. We iteratively add the runoff from new customers. The reason for the steep incline in late September is due to a drastic increase in marketing spend from the beginning of the month that, due to the time it takes for a new customer to re-purchase, takes approximately two weeks.

In [50]:
# Plot sales breakdown

sales_fig = go.Figure(data=go.Scatter(x=returning_customer_sales['date'], y=cum_returning_sales, name="Returning Customer Sales"))
sales_fig.add_trace(go.Scatter(x=new_customer_sales['date'], y=cum_new_sales, name="New Customer Sales"))
sales_fig.add_trace(go.Scatter(x=new_customer_sales['date'], y=cum_total_sales, name="Total Sales"))
sales_fig.update_layout(title='Daily New Customer Sales', xaxis_title='Date', yaxis_title='Gross Sales')
sales_fig.update_layout(plot_bgcolor='#ece9f1')
sales_fig.update_layout(title="Sales Breakdown", xaxis_title="Date", yaxis_title="Sales")
sales_fig.show()

We project between $550,000 - $600,000 in sales by EOY. We achieve this by a combination of consistently-increasing ad spend on retargeting and new customers combined with an exponentially-decaying ROAS from a conservative assumption of current levels (we use 20%, actual top-line ROAS is >50x) as well as maintenance of an average 14-day repeat purchase period and 40% customer retention. 

Given our strong record of organic growth, we believe these sales to be a close approximation of what is actually possible during the coming months.

_________

In [36]:
# Exporting data

combined_daily_sales_df = new_customer_sales.copy()
# rename gross_sales column
combined_daily_sales_df = combined_daily_sales_df.rename(columns={'gross_sales': 'new_gross_sales'})
combined_daily_sales_df['returning_gross_sales'] = returning_customer_sales['gross_sales']
combined_daily_sales_df['total_gross_sales'] = combined_daily_sales_df['new_gross_sales'] + combined_daily_sales_df['returning_gross_sales']

In [37]:
# export
combined_daily_sales_df.to_excel('../../data/outputs/projected_daily_sales.xlsx')

In [41]:
# Group the date column by weeks.
wkly_df = pd.DataFrame(columns=['week', 'new_sales', 'returning_sales'])
# Group the sales in the sales_df into chunks of seven (making them weekly) and add the sales to wkly_sales_df.
for i in range(0, len(combined_daily_sales_df), 7):
    wkly_df = wkly_df.append({'week': combined_daily_sales_df['date'][i], 'new_sales': combined_daily_sales_df['new_gross_sales'][i:i+7].sum(), 'returning_sales': combined_daily_sales_df['returning_gross_sales'][i:i+7].sum()}, ignore_index=True)
wkly_df.head()


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.


The frame.append method is deprecated a

Unnamed: 0,week,new_sales,returning_sales
0,2023-08-01,3162.320593,1050.0
1,2023-08-08,2621.214234,1050.0
2,2023-08-15,4026.488152,1170.106074
3,2023-08-22,3857.003121,1497.572654
4,2023-08-29,16445.499432,1585.434831


In [43]:
wkly_df.to_csv('../../data/wklysales.csv')