## Designing an Experiment

In [66]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

#### Background
Omnibay, a fast-growing e-commerce platform selling everything from books to electronics, began noticing unusual trends - electronics were driving high traffic but also the most cart abandonments and support issues. Customers were engaging heavily with electronics but often leaving without completing purchases, prompting internal discussions about whether a dedicated electronics website might better serve that audience. Recognizing the need for data-driven decision-making, the leadership team decided to bring in a data analyst to investigate customer behavior, shopping patterns, and potential impacts on the overall business.

Omnibay wants to run an experiment to see if we can convince more people to purchase our electronics if we use a dedicated technology website. We’ve designed a new modern website, but aren’t sure if this strategy will sell enough units to benefit from establishing a business relationship with a new provider.

Before running this experiment, of course, we need to know the sample size that will be required to detect the difference we are hoping for. There are three things we need to know before we can determine that number.

- The Baseline Conversion Rate
- Minimum Detectable Effect (desired lift)
- and the Statistical Significance Threshold


In order to get our baseline, first we need to know how many customers visit the site as well as how many visitors ultimately end up buying an electronic device in a typical month.

In [67]:
# Load visitor information
all_visitors = pd.read_csv("electronics_customers.csv")
paying_visitors = pd.read_csv("electronics_purchasing_customers.csv")

# Tranform to date format
all_visitors['Purchase Date'] = pd.to_datetime(all_visitors['Purchase Date'])
paying_visitors['Purchase Date'] = pd.to_datetime(paying_visitors['Purchase Date'])

# Focus on the month of august 2023
start = pd.Timestamp('2023-08-01')
end = pd.Timestamp('2023-08-31')

# Filter dataset based on month
august_visitors = all_visitors[(all_visitors['Purchase Date'] >= start) & (all_visitors['Purchase Date'] <= end)]
august_paying = paying_visitors[(paying_visitors['Purchase Date'] >= start) & (paying_visitors['Purchase Date'] <= end)]

# Calculate the lengths of the two lists
total_visitor_count = len(august_visitors)
paying_visitor_count = len(august_paying)

print(f"Total Visitors: {total_visitor_count}")
print(f"Paying Visitors: {paying_visitor_count}")



Total Visitors: 1462
Paying Visitors: 442


Now to get the baseline: Divide the number of purchasing visitors by the number of total visitors. 

In [68]:
baseline_conversion_rate = paying_visitor_count / total_visitor_count 
print(f"Baseline conversion rate: {baseline_conversion_rate:.4f} ({baseline_conversion_rate*100:.2f}%)")


Baseline conversion rate: 0.3023 (30.23%)


Omnibay would like to know for sure that, with this change, we’ll be pulling in at least $100k more every month, to cover the cost of implementing and managing the new website. We need to figure out how many more paying visitors we need. First we’ll have to investigate the average revenue generated from a given sale to determine how many purchases it would take to reach $100k in additional revenue using our historical data. 

In [69]:
# Calculate the average payment made in august
average_payment = np.mean(august_paying['Total Purchase Amount'])
print(f"Average Payment: {average_payment}")

# Calculate required additional customers
additional_customers_needed = np.ceil(100000 / average_payment)
print(f"Additional customers needed: {additional_customers_needed}")

Average Payment: 2729.425339366516
Additional customers needed: 37.0


Now we need find the additional percent of weekly visitors who must make a purchase in order to make this change worthwhile.

In [70]:
# Calculate required absolute percentage point increase
absolute_percentage_point_increase = additional_customers_needed / total_visitor_count
print(f"Absolute percentage point increase needed: {absolute_percentage_point_increase:.4f} ({absolute_percentage_point_increase*100:.2f} pp)")


Absolute percentage point increase needed: 0.0253 (2.53 pp)


In [71]:
# Calculate the new conversion rate
new_conversion_rate = baseline_conversion_rate + absolute_percentage_point_increase
print(f"Target conversion rate: {new_conversion_rate:.4f} ({new_conversion_rate*100:.2f}%)")

Target conversion rate: 0.3276 (32.76%)


In order to find our minimum detectable effect/desired lift, we need to express `percentage_point_increase` as a percent of `baseline_percent`.

In [72]:
# Calculate relative percentage increase (MDE)
relative_increase = (absolute_percentage_point_increase / baseline_conversion_rate) * 100
print(f"Relative increase (MDE): {relative_increase:.2f}%")

Relative increase (MDE): 8.37%


Omnibay would like to be fairly certain of the results of the experiment. We will decide on a significance treshold of 5%. 


Lastly, we need to calculate how many people need to be shown the new assets before we can check if the results are a significant improvement.


In [73]:
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportion_effectsize

# Inputs
p1 = baseline_conversion_rate                # Baseline conversion rate (30%)
p2 = new_conversion_rate                     # Variant conversion rate
alpha = 0.05                                 # Significance level
power = 0.8                                  # Desired power (1 - beta)

# Calculate effect size
effect_size = proportion_effectsize(p1, p2)

# Initialize power analysis object
analysis = NormalIndPower()

# Calculate sample size per group
sample_size = analysis.solve_power(effect_size=effect_size, alpha=alpha, power=power, alternative='two-sided')
print(f"Required sample size per group: {int(round(sample_size))}")
print(f"Total sample size needed: {int(np.ceil(sample_size * 2))}")

Required sample size per group: 5286
Total sample size needed: 10573


We would need to show each version of the newb website to 5286 different visitors to get statistically significant result. Given we only receive around 2000 visitors per month, we would need to run the test for at least 6 months.

Note: Determining the sample size is generally not needed when you are live testing different versions of a website. It is only important to determine testing and control group numbers in advanceto to in a controled research setting. 

## Evaluating A/B Test

We conducted an A/B test using the new website and we will be evaluating the result. Data was simulated in `Data_Manupulation.ipynb` where the control visitors were 30% likely to purchase an electronic device on the old website. The treatment drives a relative increases in the purchase rate by 10% using the old website.

We are interested in whether visitors are more likely to make a purchase if they are in any one group compared to the other. 

In [98]:
# Load the dataset
data = pd.read_csv("electronics_ab_test_simulated_10pct_lift.csv")

data['treated'] = data['AB_Group'].map({'A':0, 'B':1})

In [100]:
# print count and mean purchase rate of each group
print(data.groupby(['AB_Group'])['Purchased'].agg(['count','mean']).round(2))

          count  mean
AB_Group             
A         31302  0.30
B         31328  0.33


### Evaluate experiment findings using linear regression


In [101]:
import statsmodels.formula.api as smf 

formula = 'Purchased ~ treated'
model = smf.ols(formula,data).fit()

print(model.summary())

                            OLS Regression Results                            
Dep. Variable:              Purchased   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     76.77
Date:                Mon, 09 Jun 2025   Prob (F-statistic):           1.97e-18
Time:                        23:55:16   Log-Likelihood:                -40920.
No. Observations:               62630   AIC:                         8.184e+04
Df Residuals:                   62628   BIC:                         8.186e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.3008      0.003    114.436      0.0

### Extract key values from the regression summary

In [103]:
print("The estimated absolute impact is: {0:.2f}% \
      \nThe estimated relative impact is {1:.2f}% \
      \nThe t-statistic is {2:.1f} \
      \nThe p-value is {3:.2f}%".format(
        100*model.params['treated'],
        100*model.params['treated']/model.params['Intercept'],
        model.tvalues['treated'],
        100*model.pvalues['treated']
))

The estimated absolute impact is: 3.26%       
The estimated relative impact is 10.83%       
The t-statistic is 8.8       
The p-value is 0.00%


### Replicate using a two-sample t-test comparing means

In [106]:
from scipy.stats import ttest_ind 

treated_users = data[data['treated']==1]['Purchased']
control_users = data[data['treated']==0]['Purchased']

t_stat, p_value = ttest_ind(treated_users, control_users)

print(f"t-statistic: {round(t_stat,1)}")
print(f"p-value: {100*round(p_value,3)}%")


t-statistic: 8.8
p-value: 0.0%


### Conclusion

We simulated data for the experiment where the true effect of the treatment increased signup rates by a relative 10% from a baseline signup rate of 30%. Naturally there will be sampling error as we only observe users in the experiment. In this case, our estimated treatment effect was a relative increase of 10.8% and the result was highly statistically significant (p <<< 0.01).