# A/B test

<span style="color:orange">**The data here is fictious right now. When the data gathering stage is over, I will plug the actual numbers and see which hypothesis is correct.**</span>

Having the code from src work continuosly for 8-9 months on my Raspberry Pi would produce enough data to do some statistical tests. The most obvious choice for the first one is A/B test.

Now, let's define the design of our experiment:

1st variant
- Message option - 50%
- No message option - 50%
- Baseline conversion rate (message) - 15%
- Target for no message option - 20%
- MDE (lift) - 33%
- Significance threshold - 0.05
- Minimum sample size - 1840

2nd variant
- Message option - 50%
- No message option - 50%
- Baseline conversion rate (message) - 20%
- Target for no message option - 25%
- MDE (lift) - 25%
- Significance threshold - 0.05
- Minimum sample size - 2180

*since we are currently in data gathering stage we cannot know for sure how many invites will be sent eventually and what the conversion rates will be. so the stuff above is just an approximation of how it would look like. in the end of data gathering, i will plug all the actual numbers here

For good enough power in both cases sample size should be 2470 - that's 19 connection requests Monday through Friday for half a year
- Power for 1st design - 87-88%
- Power for 2nd design - 85-86%

Another consideration here is that if I add non premium low activity low connection count profiles the conversion would be probably even less than 15%, but 2470 still seems alright

Number of premium profiles for 2470 is 74 (3%, stratified sampling through random choosing of those 74 profiles)

Sample size calculators and AB test guides here:
- https://www.codecademy.com/paths/data-science-inf/tracks/dsinf-statistics-fundamentals-part-ii/modules/dsinf-experimental-design/lessons/experimental-design/exercises/review
- https://www.codecademy.com/paths/data-science-inf/tracks/dsinf-statistics-fundamentals-part-ii/modules/dsinf-experimental-design/lessons/a-b-test-sample-size-calculator/exercises/review

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
#np.random.seed(42)

Variables for the designs below

In [2]:
baseline_rate1 = .15
mde1 = .33
target_rate1 = (1 + mde1) * baseline_rate1

baseline_rate2 = .20
mde2 = .25
target_rate2 = (1 + mde2) * baseline_rate2

significance_threshold = .05

sample_size = 2470

Function for simulation of datasets (testing with this now, later here will be insertion of real values from experiment)

In [3]:
def simulate_datasets(baseline_rate, target_rate, sample_size):
    sample_with_message = np.random.choice(['yes', 'no'],  size=int(sample_size/2), p=[baseline_rate, 1-baseline_rate])
    sample_with_no_message = np.random.choice(['yes', 'no'], size=int(sample_size/2), p=[target_rate, 1-target_rate])
    group = ['message']*int(sample_size/2) + ['no_message']*int(sample_size/2)
    outcome = list(sample_with_message) + list(sample_with_no_message)
    simulated_data = {"Connection_request": group, "Accepted": outcome}
    simulated_data = pd.DataFrame(simulated_data)
    return simulated_data

dataset1 = simulate_datasets(baseline_rate1, target_rate1, sample_size)
dataset2 = simulate_datasets(baseline_rate2, target_rate2, sample_size)

Now we can test finding the power

In [4]:
def simulate_and_find_power(baseline_rate, target_rate, significance_threshold, sample_size):

    results = np.ones(1000, dtype=np.int8)

    for i in range(1000):
        sample_message = np.random.choice(['yes', 'no'],  size=int(sample_size/2), p=[baseline_rate, 1-baseline_rate])
        sample_no_message = np.random.choice(['yes', 'no'], size=int(sample_size/2), p=[target_rate, 1-target_rate])
        group = ['message']*int(sample_size/2) + ['no_message']*int(sample_size/2)
        outcome = list(sample_message) + list(sample_no_message)
        simulated_data = {"Connection_request": group, "Accepted": outcome}
        simulated_data = pd.DataFrame(simulated_data)       

        ab_contingency = pd.crosstab(np.array(simulated_data.Connection_request), np.array(simulated_data.Accepted))
        chi2, pval, dof, expected = chi2_contingency(ab_contingency)
        result = (1 if pval < significance_threshold else 0) #1 is significant, 0 is not significant      
        results[i] = result
    
    proportion = np.count_nonzero(results)/1000
    #print("Proportion of significant results, in other words, Power of the test: {}%.".format(proportion*100))
    #return f"Proportion of significant results, in other words, Power of the test: {proportion*100}%."
    return proportion

power1 = simulate_and_find_power(baseline_rate=baseline_rate1, target_rate=target_rate1, significance_threshold=significance_threshold, sample_size=sample_size)
power2 = simulate_and_find_power(baseline_rate=baseline_rate2, target_rate=target_rate2, significance_threshold=significance_threshold, sample_size=sample_size)
print(f"Power1 is {round(power1*100, 2)}%\nPower2 is {round(power2*100, 2)}%")

Power1 is 89.3%
Power2 is 83.0%


And the AB test itself is below

In [5]:
def do_test(dataset):
    ab_contingency = pd.crosstab(np.array(dataset.Connection_request), np.array(dataset.Accepted))
    chi2, pval, dof, expected = chi2_contingency(ab_contingency)
    print(f"Chi2 statistic is {chi2}")
    print(f"p-value is {pval}")
    print(f"Degrees of freedom: {dof}")
    print(f"Expected values: \n{expected}")
#double check prints

In [6]:
do_test(dataset1)

Chi2 statistic is 10.967834708061252
p-value is 0.000927070219498056
Degrees of freedom: 1
Expected values: 
[[1032.  203.]
 [1032.  203.]]


In [7]:
do_test(dataset2)

Chi2 statistic is 14.910922942396384
p-value is 0.0001127085032368192
Degrees of freedom: 1
Expected values: 
[[968. 267.]
 [968. 267.]]
