# Frequentist AB testing

calculators:<br>
https://www.surveymonkey.com/mp/ab-testing-significance-calculator/<br>
https://abtestguide.com/calc/<br>
https://abtestguide.com/abtestsize/<br>

interesting reading:<br>
https://towardsdatascience.com/the-art-of-a-b-testing-5a10c9bb70a4<br>
https://www.invespcro.com/blog/calculating-sample-size-for-an-ab-test/<br>
https://www.richrelevance.com/blog/2013/08/26/bayesian-ab-testing-with-a-log-normal-model/<br>
https://portal.pixelfederation.com/cs/blog/article/ab-testing-methodology-change<br>
http://varianceexplained.org/r/bayesian-ab-testing/<br>
https://towardsdatascience.com/hypothesis-testing-in-machine-learning-using-python-a0dc89e169ce<br>

In [1]:
import sys
sys.path.append("./tools")
from ab_testing import *
from scipy.stats import norm

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
z_test_sample_size(0.5, 0.1)
z_test_sample_size(0.5, 0.2)
z_test_sample_size(0.5, 0.1, 0.99)
z_test_sample_size(0.1, 0.2)
z_test_sample_size(0.05, 0.1)
z_test_sample_size(0.02, 0.15)

[1561.927067135466, 1561.927067135466]

[384.5951069831055, 384.5951069831055]

[2324.1146665611577, 2324.1146665611577]

[3838.102190096708, 3838.102190096708]

[31230.69246297497, 31230.69246297497]

[36690.02436708896, 36690.02436708896]

In [3]:
z_test_sample_size_alternative(0.5, 0.1)
z_test_sample_size_alternative(0.5, 0.2)
z_test_sample_size_alternative(0.5, 0.1, 0.99)
z_test_sample_size_alternative(0.1, 0.2)
z_test_sample_size_alternative(0.05, 0.1)
z_test_sample_size_alternative(0.02, 0.15)

782.5259552102923

193.84728588314755

1165.0151714801314

1855.3855181282368

15328.285996263996

17827.30199088534

In [4]:
2324*0.5
2324*(0.5*1.1)

1162.0

1278.2

In [5]:
AB_z_test(
    totals_A = 2324
    ,successes_A = 1162
    ,totals_B = 2324
    ,successes_B = 1279
    ,confidance = 0.99 #confidance level (0.95 if not set)
    #,test_type = 'two-tailed' #('two-tailed' if not set)
)

winning: B (significant)
conversion rate A: 50.0%
conversion rate B: 55.034%
uplift: 10.069%
standard error A: 0.01037
standard error B: 0.01032
Z-score: 3.44101
p-value: 0.00058
power: 0.80653


In [6]:
# import numpy as np
# from scipy.stats import chi2_contingency, fisher_exact

# obs = np.array([[15000, 700],[15000,800]])

# g, p, dof, expctd = chi2_contingency(obs, correction = False)
# p

Let's say we are testing a new feature on our web (e.g. online shop).<br>
We split a trafic 50/50 between two variants: ____A____ for a current version and __B__ for a new version. After 7 days of testing we get results:<br>
- number of sessions A = 15,000<br>
- number of sessions B = 15,000<br>
- number of conversions A = 700<br>
- number of conversions B = 800<br>

We can see that conversion rate for variant B is higher (5.2% vs 4.7%).<br>
To test if this difference is significant we can perform Z-test with __hypothesis__ that there is no difference in conversion rates for these two variants

In [7]:
AB_z_test(
    totals_A = 10000
    ,successes_A = 100
    ,totals_B = 10000
    ,successes_B = 130
    #,confidance = 0.95 #confidance level (0.95 if not set)
    #,test_type = 'two-tailed' #('two-tailed' if not set)
)

winning: B (significant)
conversion rate A: 1.0%
conversion rate B: 1.3%
uplift: 30.0%
standard error A: 0.00099
standard error B: 0.00113
Z-score: 1.98981
p-value: 0.04661
power: 0.51191


As p-value < 0.05, we are __rejecting__ the hypothesis and we can be 95% confident that this result is a consequence of variant difference.

In [8]:
import pandas as pd
import datetime


from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [9]:
params = [
    {'variant' : 'A', 'source': 'mobile', 'impressions': 5000, 'orders': 200, 'mean': 2},
    {'variant' : 'B', 'source': 'mobile', 'impressions': 5000, 'orders': 215, 'mean': 2.3},
    {'variant' : 'A', 'source': 'desktop', 'impressions': 10000, 'orders': 500, 'mean': 2.7},
    {'variant' : 'B', 'source': 'desktop', 'impressions': 10000, 'orders': 565, 'mean': 2.9},
]

dates_1 = np.array([datetime.date(2019, 5, np.random.randint(1,7+1)) for i in range(500)])
orders = np.random.lognormal(mean=2, sigma=1.0, size=500)

In [10]:
data = []

for p in params:
    dates = np.array([datetime.date(2019, 5, np.random.randint(1,7+1)) for i in range(p['impressions'])])
    orders = np.append(
        np.random.lognormal(mean=p['mean'], sigma=1.0, size=p['orders']),
        np.zeros(p['impressions'] - p['orders'])
    )
    for i in range(p['impressions']):
        data.append({
            'date': dates[i],
            'variant': p['variant'],
            'source': p['source'],
            'conversion': 1 if orders[i] > 0 else 0,
            'revenue': orders[i]
        })
    

In [11]:
df = pd.DataFrame(data).sample(frac=1).reset_index(drop=True)
df.head(10)

Unnamed: 0,conversion,date,revenue,source,variant
0,0,2019-05-03,0.0,desktop,B
1,0,2019-05-03,0.0,desktop,A
2,0,2019-05-01,0.0,desktop,A
3,0,2019-05-01,0.0,desktop,A
4,0,2019-05-02,0.0,mobile,B
5,0,2019-05-03,0.0,mobile,A
6,0,2019-05-06,0.0,mobile,B
7,0,2019-05-01,0.0,desktop,A
8,0,2019-05-01,0.0,mobile,A
9,0,2019-05-01,0.0,mobile,B


In [12]:
df.groupby('variant').sum()

Unnamed: 0_level_0,conversion,revenue
variant,Unnamed: 1_level_1,Unnamed: 2_level_1
A,700,14331.342913
B,780,21711.876206


In [13]:
AB_z_test(
    totals_A = 15000
    ,successes_A = 700
    ,totals_B = 15000
    ,successes_B = 780
    #,confidance = 0.95 #confidance level (0.95 if not set)
    #,test_type = 'two-tailed' #('two-tailed' if not set)
)

winning: B (significant)
conversion rate A: 4.667%
conversion rate B: 5.2%
uplift: 11.429%
standard error A: 0.00172
standard error B: 0.00181
Z-score: 2.13294
p-value: 0.03293
power: 0.56866


In [14]:
pbb_conversion([80000,80000,80000], [1600,1700,1650])

[0.0237, 0.7895, 0.1868]

In [15]:
pbb_conversion([80000,80000,80000], [1600,1700,1650])

pbb_revenue(
    [4212694,4213358,4333878],
    [639,716,760],
    [2280,2585,2691],
    [8265,9466,9680]
)

[0.0215, 0.7948, 0.1837]

[0.002, 0.7052, 0.2928]