## Introductory information
#### Experiment context:

Company "N" conducted an experiment.  
When opening the company's website, one part of the users was shown a standard pop-up window with a questionnaire, and the other part of the users were shown a new version of the pop-up window. In both cases, users could answer "yes" or "no" to the question asked in the questionnaire or close the pop-up window without answering.

Did the new version of the questionnaire affect the conversion?

#### Comment:
Data was taken from this [source](https://www.kaggle.com/datasets/osuolaleemmanuel/ad-ab-testing).


## Examination of the data collected during the experiment

#### Description of columns:
- __auction_id__ - unique user identifier assigned when a user is shown a pop-up window with a questionnaire;
- __experiment__ - user group ("control" - control, "exposed" - experimental);
- __date__ - date;
- __hour__ - hour;
- __device_make__ - the device on which the site was opened;
- __platform_os__ - device OS identifier;
- __browser__ - the browser used by the user;
- __yes__ - the user answered yes to the questionnaire question (0 - no, 1 - yes);
- __no__ - the user answered the questionnaire question negatively (0 - no, 1 - yes).

In [3]:
import pandas as pd
import numpy as np
from scipy import stats

In [4]:
# loading of dataframe 
df = pd.read_csv(r"data\ab_data_1.csv")
df

Unnamed: 0,auction_id,experiment,date,hour,device_make,platform_os,browser,yes,no
0,0008ef63-77a7-448b-bd1e-075f42c55e39,exposed,2020-07-10,8,Generic Smartphone,6,Chrome Mobile,0,0
1,000eabc5-17ce-4137-8efe-44734d914446,exposed,2020-07-07,10,Generic Smartphone,6,Chrome Mobile,0,0
2,0016d14a-ae18-4a02-a204-6ba53b52f2ed,exposed,2020-07-05,2,E5823,6,Chrome Mobile WebView,0,1
3,00187412-2932-4542-a8ef-3633901c98d9,control,2020-07-03,15,Samsung SM-A705FN,6,Facebook,0,0
4,001a7785-d3fe-4e11-a344-c8735acacc2c,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0
...,...,...,...,...,...,...,...,...,...
8072,ffea24ec-cec1-43fb-b1d1-8f93828c2be2,exposed,2020-07-05,7,Generic Smartphone,6,Chrome Mobile,0,0
8073,ffea3210-2c3e-426f-a77d-0aa72e73b20f,control,2020-07-03,15,Generic Smartphone,6,Chrome Mobile,0,0
8074,ffeaa0f1-1d72-4ba9-afb4-314b3b00a7c7,control,2020-07-04,9,Generic Smartphone,6,Chrome Mobile,0,0
8075,ffeeed62-3f7c-4a6e-8ba7-95d303d40969,exposed,2020-07-05,15,Samsung SM-A515F,6,Samsung Internet,0,0


In [5]:
# experiment time 
df['date'].nunique()

# number of demonstrations on different days of the week 
df['date'] = pd.to_datetime(df['date'])
df['date'].dt.day_name().value_counts()

## experiment lasted 8 days
## A-B test lasts more than a week, this allows you to take into account the difference in user behavior on different days of the week

Friday       2908
Thursday     1208
Wednesday    1198
Saturday      903
Sunday        890
Monday        490
Tuesday       480
Name: date, dtype: int64

In [6]:
# which browsers were used by the participants of the experiment

df['browser'].value_counts()

## infrequently used browsers are merged into one "OTHER" group
valid_br = df['browser'].value_counts().index[:5]
df['browser'] = df['browser'].apply(lambda a: a if a in valid_br else 'OTHER')

df['browser'].value_counts(normalize=True)

## more than half of the users used "Chrome Mobile" during the experiment

Chrome Mobile            0.563823
Chrome Mobile WebView    0.184351
Samsung Internet         0.102018
Facebook                 0.094590
Mobile Safari            0.041723
OTHER                    0.013495
Name: browser, dtype: float64

In [7]:
# adding a "pass" column that determines whether the user has completed the target action or not 
df['pass'] = df.apply(lambda row: 0 if (row['yes']==0)&(row['no']==0) else 1, axis=1)

# dividing the dataframe into experimental and control groups
df_control = df[df['experiment']=='control']
df_exposed = df[df['experiment']=='exposed']

In [8]:
# experimental group
print('experimental group:', df_exposed['pass'].value_counts(normalize=True), sep='\n', end='\n\n')

# control group
print('control group:', df_control['pass'].value_counts(normalize=True), sep='\n')

## in the experimental group, the conversion is 16%, in the control group - 14%
## i.e. effect size was 2

experimental group:
0    0.835996
1    0.164004
Name: pass, dtype: float64

control group:
0    0.856055
1    0.143945
Name: pass, dtype: float64


## Testing the hypothesis about the impact of the new version of the questionnaire on the conversion ¶

H<sub>0</sub>: The percentage of people who completed the survey is the same in both groups  
H<sub>А</sub>: The percentage of people who completed the survey is higher in the experimental group than in the control group

In [9]:
## to test the hypothesis, we use the Z-test for the difference in shares in independent samples
from statsmodels.stats.proportion import proportions_ztest

# control sample size 
control_len = len(df_control)
# number of people who completed the survey in the control sample 
control_pass = len(df_control[df_control['pass']==1])

# experimental sample size 
exposed_len = len(df_exposed)
# number of people who completed the survey in the experimental sample 
exposed_pass = len(df_exposed[df_exposed['pass']==1])

# significance level
alpha = 0.05 

In [10]:
# testing the hypothesis of equality of shares  
_, p_val =  proportions_ztest(count=(exposed_pass, control_pass), nobs=(exposed_len, control_len))
print(f"p-value: {p_val:.3}") 
## p-value is less than the previously defined significance level. null hypothesis hypothesis rejected
## the proportions of people who answered the question in the two groups are not equal

p-value: 0.0125


In [11]:
## testing the hypothesis of equality of shares, indicating an alternative hypothesis: the proportion of those who completed the survey in the experimental group is higher than in the control group
_, p_val =  proportions_ztest(count=(exposed_pass, control_pass), nobs=(exposed_len, control_len), alternative='larger')
print(f"p-value: {p_val:.3}") 
## p-value is less than the previously determined significance level, the null hypothesis is rejected in favor of the alternative

p-value: 0.00625


In [12]:
# calculation of the power of the experiment 

# the "get_power" function makes it easy to work with the "zt_ind_solve_power" function from the "statsmodels" library
def get_power(l_1, l_2, g_1, g_2, alpha, alternative='two-sided'):
    
    from statsmodels.stats.power import zt_ind_solve_power
    
    # share of users who answered the question from the questionnaire
    p_1 = g_1 / l_1
    p_2 = g_2 / l_2 
    
    # dispersion
    var_1 = p_1 * (1-p_1)
    var_2 = p_2 * (1-p_2)
    
    # standardized effect size
    st_ef = (p_2 - p_1) / ((l_1*np.sqrt(var_1) + l_2*np.sqrt(var_2)) / (l_1 + l_2))

    return zt_ind_solve_power(effect_size=st_ef, # standardized effect size
                       nobs1=l_1 + l_2,          # total number of observations
                       alpha=alpha,              # significance level
                       power=None,               # power (None, because it needs to be found)
                       ratio=l_1 / l_2,          # sample size ratio
                       alternative=alternative)  # alternative
        
get_power(len(df_control), 
          len(df_exposed), 
          df_control['pass'].value_counts()[1], 
          df_exposed['pass'].value_counts()[1],
          alpha=0.05, alternative='larger')    
## at a significance level of 0.05, the power of the experiment is 0.97

0.9715829560005882

Checking the result of the power calculation result.  
Based on the obtained power, we calculate the sample size known to us in advance. The calculation of the sample size required to obtain the given errors of the first and second kind, in this case, will be performed according to the formula:  

$$n = \left(\frac{{Z_{1-\alpha} \cdot \sqrt{{p_{\text{0}} \cdot (1 - p_{\text{0}})}} + Z_{1-\beta} \cdot \sqrt{{p_{\text{a}} \cdot (1 - p_{\text{a}})}}}}{{p_{\text{a}} - p_{\text{0}}}}\right)^2$$

where:  
__n__ - number of observations;  
__α__ - error of the first kind;  
__β__ - error of the second kind;  
__z<sub>1-α</sub>,  z<sub>1-β</sub>__ - quantiles of the normal distribution;  
__p<sub>0</sub>__,  __p<sub>a</sub>__ - ratio of answers to a question to window displays in the control group and in the experimental group;  
__(p<sub>0</sub> - p<sub>a</sub>)__ - the size of the effect we want to detect.

In [13]:
# function for calculating the required sample size to achieve the required values of errors of the first and second kind

def get_size(p0, pa, alpha, beta):
    za = stats.norm.ppf(1 - alpha)
    zb = stats.norm.ppf(1 - beta)    
    n = za * np.sqrt(p0*(1 - p0)) + zb * np.sqrt(pa*(1-pa))
    n = n / (pa - p0) 
    return int(np.ceil(n*n))

alpha = 0.05
beta = 0.03 # 1 - 0.97 (the resulted value of power)

get_size(df_control['pass'].value_counts(normalize=True)[1],
         df_exposed['pass'].value_counts(normalize=True)[1], 
         alpha, beta)
## the result of calculating the required size of one group is 4033 (assuming the two groups are the same). actual group sizes are 4071, 4006
## according to the result of the check, we can conclude that the power of the experiment was calculated correctly

4033

## The influence of the browser used on the conversion during the experiment
Let's test the hypothesis about the equality of conversions in the control and experimental groups, taking into account the browser in which users opened the site.

In [14]:
# summary table of the number of profile impressions in different browsers 

browser_tab = df_exposed.groupby(['browser']).agg({'pass':'count'}).merge(
    df_control.groupby(['browser']).agg({'pass':'count'}), 
    left_index=True, right_index=True,  suffixes=('_exposed', '_control'))

browser_tab

Unnamed: 0_level_0,pass_exposed,pass_control
browser,Unnamed: 1_level_1,Unnamed: 2_level_1
Chrome Mobile,2144,2410
Chrome Mobile WebView,1197,292
Facebook,203,561
Mobile Safari,91,246
OTHER,39,70
Samsung Internet,332,492


In [15]:
# summary table of the number of passes and answers to the questionnaire question
browser_tab_dit = df_exposed.groupby(['browser']).agg({'pass':'value_counts'}).merge(
    df_control.groupby(['browser']).agg({'pass':'value_counts'}), 
    left_index=True, right_index=True,  suffixes=('_exposed', '_control'))

browser_tab_dit

Unnamed: 0_level_0,Unnamed: 1_level_0,pass_exposed,pass_control
browser,pass,Unnamed: 2_level_1,Unnamed: 3_level_1
Chrome Mobile,0,1773,2086
Chrome Mobile,1,371,324
Chrome Mobile WebView,0,1017,245
Chrome Mobile WebView,1,180,47
Facebook,0,159,449
Facebook,1,44,112
Mobile Safari,0,87,236
Mobile Safari,1,4,10
OTHER,0,38,65
OTHER,1,1,5


In [None]:
# testing the hypothesis of equality of conversions in the control and experimental groups for each browser

In [16]:
_, p_val =  proportions_ztest(count=(browser_tab_dit.loc['Chrome Mobile', 'pass_exposed'][1],browser_tab_dit.loc['Chrome Mobile', 'pass_control'][1]), 
                              nobs=(browser_tab.loc['Chrome Mobile', 'pass_exposed'], browser_tab.loc['Chrome Mobile', 'pass_control']), alternative='larger')
print(f"p-value: {p_val:.3}") 
print('power:', get_power(browser_tab.loc['Chrome Mobile', 'pass_control'],
          browser_tab.loc['Chrome Mobile', 'pass_exposed'],
          browser_tab_dit.loc['Chrome Mobile', 'pass_control'][1],
          browser_tab_dit.loc['Chrome Mobile', 'pass_exposed'][1],
          alpha=0.05, alternative='larger'))

p-value: 0.00015
power: 0.9998633200807792


In [17]:
_, p_val =  proportions_ztest(count=(browser_tab_dit.loc['Chrome Mobile WebView', 'pass_exposed'][1],browser_tab_dit.loc['Chrome Mobile WebView', 'pass_control'][1]), 
                              nobs=(browser_tab.loc['Chrome Mobile WebView', 'pass_exposed'], browser_tab.loc['Chrome Mobile WebView', 'pass_control']), alternative='smaller')
print(f"p-value: {p_val:.3}") 
print('power:', get_power(browser_tab.loc['Chrome Mobile WebView', 'pass_control'],
          browser_tab.loc['Chrome Mobile WebView', 'pass_exposed'],
          browser_tab_dit.loc['Chrome Mobile WebView', 'pass_control'][1],
          browser_tab_dit.loc['Chrome Mobile WebView', 'pass_exposed'][1],
          alpha=0.05, alternative='smaller'))

p-value: 0.326
power: 0.12679098371579473


In [18]:
_, p_val =  proportions_ztest(count=(browser_tab_dit.loc['Facebook', 'pass_exposed'][1],browser_tab_dit.loc['Facebook', 'pass_control'][1]), 
                              nobs=(browser_tab.loc['Facebook', 'pass_exposed'], browser_tab.loc['Facebook', 'pass_control']), alternative='larger')
print(f"p-value: {p_val:.3}") 
print('power:', get_power(browser_tab.loc['Facebook', 'pass_control'],
          browser_tab.loc['Facebook', 'pass_exposed'],
          browser_tab_dit.loc['Facebook', 'pass_control'][1],
          browser_tab_dit.loc['Facebook', 'pass_exposed'][1],
          alpha=0.05, alternative='larger'))

p-value: 0.302
power: 0.26123956220629674


In [19]:
_, p_val =  proportions_ztest(count=(browser_tab_dit.loc['Mobile Safari', 'pass_exposed'][1],browser_tab_dit.loc['Mobile Safari', 'pass_control'][1]), 
                              nobs=(browser_tab.loc['Mobile Safari', 'pass_exposed'], browser_tab.loc['Mobile Safari', 'pass_control']), alternative='larger')
print(f"p-value: {p_val:.3}") 
print('power:', get_power(browser_tab.loc['Mobile Safari', 'pass_control'],
          browser_tab.loc['Mobile Safari', 'pass_exposed'],
          browser_tab_dit.loc['Mobile Safari', 'pass_control'][1],
          browser_tab_dit.loc['Mobile Safari', 'pass_exposed'][1],
          alpha=0.05, alternative='larger')) 

p-value: 0.446
power: 0.08302908202596099


In [20]:
_, p_val =  proportions_ztest(count=(browser_tab_dit.loc['Samsung Internet', 'pass_exposed'][1],browser_tab_dit.loc['Samsung Internet', 'pass_control'][1]), 
                              nobs=(browser_tab.loc['Samsung Internet', 'pass_exposed'], browser_tab.loc['Samsung Internet', 'pass_control']), alternative='smaller')
print(f"p-value: {p_val:.3}") 
print('power:', get_power(browser_tab.loc['Samsung Internet', 'pass_control'],
          browser_tab.loc['Samsung Internet', 'pass_exposed'],
          browser_tab_dit.loc['Samsung Internet', 'pass_control'][1],
          browser_tab_dit.loc['Samsung Internet', 'pass_exposed'][1],
          alpha=0.05, alternative='smaller'))

p-value: 0.395
power: 0.10993357712883428


In [21]:
_, p_val =  proportions_ztest(count=(browser_tab_dit.loc['OTHER', 'pass_exposed'][1],browser_tab_dit.loc['OTHER', 'pass_control'][1]), 
                              nobs=(browser_tab.loc['OTHER', 'pass_exposed'], browser_tab.loc['OTHER', 'pass_control']), alternative='smaller')
print(f"p-value: {p_val:.3}") 
print('power:', get_power(browser_tab.loc['OTHER', 'pass_control'],
          browser_tab.loc['OTHER', 'pass_exposed'],
          browser_tab_dit.loc['OTHER', 'pass_control'][1],
          browser_tab_dit.loc['OTHER', 'pass_exposed'][1],
          alpha=0.05, alternative='smaller'))

p-value: 0.158
power: 0.5323486996315873


In [None]:
## statistically significant difference in conversions is observed only in the case of "Chrome Mobile"
## in other cases, the hypothesis of the absence of differences in conversions is not rejected, however, the power of these experiments is low

## Summary 

#### Research results:
- A new pop-up window option has a positive effect on the conversion rate of the survey. A statistical test confirms the significance of the difference.
- Under the current conditions of the experiment, the conversion in the control group was 14%, and in the experimental group 16%.
- Under the current experimental conditions, there is a statistically significant difference between conversions in the control and experimental groups only among users using "Chrome Mobile" to view the site. However, in the case of users of other browsers, the power of the corresponding experiments is small.

 
#### Recommendations:
- The owner of the product (website) needs to make a conclusion about the significance of the conversion change.
- If changes are considered significant, it is recommended that an A-A test be performed to verify the results of the A-B test.
- To be more confident about the absence of statistically significant differences between conversions in the control and experimental groups among users who use not "Chrome Mobile" to view the site, it is necessary to increase the number of users participating in the experiment.

 
#### Comment:
All conclusions are made based on the assumption that the users participating in the experiment belong to the same cohort, are randomly selected, and also do not participate in other experiments that affect the passage of the current one.