## Marketing company
The purpose of the test is to verify the implementation of the new functionality. We have 2 groups, control and test, the test group saw the old version of the site, and the new one with changes.
The data is presented in the file ab_test_data.csv

#### Import libraries

In [44]:
import pandas as pd
import numpy as np
import plotly.figure_factory as ff

#### Read data

In [45]:
data = pd.read_csv('ab_test_data.csv')
print(data.head())

   group         country  impressions  clicks   revenue
0      1          France           20       5  0.296400
1      1          France            6       4  0.228307
2      0         Germany           12       3  0.206884
3      1  United Kingdom           25       4  0.150042
4      0       Australia           16       1  0.045251


### Bootstrap
We will introduce a bootstrap function to the test that will sample the data and return the difference between the metrics for the test and control groups

In [57]:
data_grouped = data.groupby('group').agg({'impressions': 'sum', 'clicks': 'sum', 'revenue': 'sum'})
data_grouped

Unnamed: 0_level_0,impressions,clicks,revenue
group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,390467,90336,4523.304348
1,390298,90177,4507.60924


In [46]:
def bootstrap(data, n):
    iterations = n
    len_control = len(data[data['group'] == 0])
    len_test = len(data[data['group'] == 1])
    diffs = []
    for _ in range(iterations):
        control = data[data['group'] == 0].sample(n=len_control, replace=True)
        test = data[data['group'] == 1].sample(n=len_test, replace=True)
    
        # calculate sample sums
        c_impressions = control['impressions'].sum()
        c_clicks = control['clicks'].sum()
        c_revenue = control['revenue'].sum()
    
        t_impressions = test['impressions'].sum()
        t_clicks = test['clicks'].sum()
        t_revenue = test['revenue'].sum()
    
        # calculate metrics
        c_ctr = c_clicks / c_impressions
        t_ctr = t_clicks / t_impressions
        uplift_ctr = t_ctr / c_ctr - 1
    
        c_arpu = c_revenue / len_control
        t_arpu = t_revenue / len_test
        uplift_arpu = t_arpu / c_arpu - 1
    
        c_cpc = c_revenue / c_clicks
        t_cpc = t_revenue / t_clicks
        uplift_cpc = t_cpc / c_cpc - 1
    
        # append to diffs
        diffs.append({
            'ctr': uplift_ctr,
            'arpu': uplift_arpu,
            'cpc': uplift_cpc
        })
    
    return diffs
    
    

#### Run the test

In [47]:
diffs = bootstrap(data, 10000)

#### Calculate uplifts function

In [48]:
def calculate_uplifts(diffs):
    ctr_uplifts = np.array([diff['ctr'] for diff in diffs])
    arpu_uplifts = np.array([diff['arpu'] for diff in diffs])
    cpc_uplifts = np.array([diff['cpc'] for diff in diffs])
    return ctr_uplifts, arpu_uplifts, cpc_uplifts

#### Calculate uplifts

In [49]:
# calculate uplifts
ctr_uplifts = np.array([diff['ctr'] for diff in diffs])
arpu_uplifts = np.array([diff['arpu'] for diff in diffs])
cpc_uplifts = np.array([diff['cpc'] for diff in diffs])


#### Function for plotting data
The function builds three separate graphs, each of which visualizes the distribution of uplift for the corresponding metric (CTR, ARPU, CPC).

In [50]:
def make_plot(uplift_arpu, uplift_ctr, uplift_cpc):
    fig_ctr = ff.create_distplot([ctr_uplifts], ['CTR Uplifts'], show_hist=False, show_rug=False)
    fig_ctr.update_layout(width=800, shapes=[dict(type='line', x0=0, x1=0, y0=0, y1=1, xref='x', yref='paper', line=dict(color='red', dash='dash'))])
    fig_ctr.show()
    
    fig_arpu = ff.create_distplot([arpu_uplifts], ['ARPU Uplifts'], show_hist=False, show_rug=False)
    fig_arpu.update_layout(width=800, shapes=[dict(type='line', x0=0, x1=0, y0=0, y1=1, xref='x', yref='paper', line=dict(color='red', dash='dash'))])
    fig_arpu.show()
    
    fig_cpc = ff.create_distplot([cpc_uplifts], ['CPC Uplifts'], show_hist=False, show_rug=False)
    fig_cpc.update_layout(width=800, shapes=[dict(type='line', x0=0, x1=0, y0=0, y1=1, xref='x', yref='paper', line=dict(color='red', dash='dash'))])
    fig_cpc.show()

#### Calculate p-value
This function performs a Two-tailed test and returns a p value

In [51]:
# calculate p-value
def calculate_p_value(uplifts):
    return 2 * min(sum(uplifts < 0), sum(uplifts > 0)) / len(uplifts)

We will display graphs and p value for all countries

In [52]:
make_plot(arpu_uplifts, ctr_uplifts, cpc_uplifts)

p_value_ctr = calculate_p_value(ctr_uplifts)
print(p_value_ctr, p_value_ctr < 0.05)

p_value_arpu = calculate_p_value(arpu_uplifts)
print(p_value_arpu, p_value_arpu < 0.05)

p_value_cpc = calculate_p_value(cpc_uplifts)
print(p_value_cpc, p_value_cpc < 0.05)


0.7828 False
0.392 False
0.5268 False


Let's count uplifts

In [53]:

print(f'ctr : {np.mean(ctr_uplifts) * 100:.2f}, arpu : {np.mean(arpu_uplifts) * 100:.2f}, cpc : {np.mean(cpc_uplifts) * 100:.2f}')

ctr : -0.13, arpu : -0.71, cpc : -0.17


### Result for all countries

| Metric       | p-value   | Uplift   |
|--------------|-----------|----------|
| CTR          | 0.7534    | -0.13    |
| ARPU         | 0.3876    | -0.7     |
| CPC          | 0.521     | -0.17    |



The overall result is not statistically significant, and the uplifts are negative. It would be worth checking these results for all countries.
To do this, we will reuse our code and examine each country separately.

In [54]:
data = pd.DataFrame(data)

countries = ['United Kingdom', 'Germany', 'France', 'Australia']

for country in countries:
    print(country)
    
    data_country = data[data['country'] == country]
    diffs_country = bootstrap(data_country, 10000)
    
    diffs_country = pd.DataFrame(diffs_country)

    ctr_uplifts_country = diffs_country['ctr'].values
    arpu_uplifts_country = diffs_country['arpu'].values
    cpc_uplifts_country = diffs_country['cpc'].values

    print(f'ctr : {np.mean(ctr_uplifts_country) * 100:.2f}, arpu : {np.mean(arpu_uplifts_country) * 100:.2f}, cpc : {np.mean(cpc_uplifts_country) * 100:.2f}')
    
    make_plot(arpu_uplifts_country, ctr_uplifts_country, cpc_uplifts_country)
    
    p_value_ctr = calculate_p_value(ctr_uplifts_country)
    print('p value ctr', p_value_ctr, p_value_ctr < 0.05)
    
    p_value_arpu = calculate_p_value(arpu_uplifts_country)
    print('p value arpu', p_value_arpu, p_value_arpu < 0.05)
    
    p_value_cpc = calculate_p_value(cpc_uplifts_country)
    print('p value cpc', p_value_cpc, p_value_cpc < 0.05)


United Kingdom
ctr : 5.60, arpu : 5.10, cpc : -0.37


p value ctr 0.0 True
p value arpu 0.0 True
p value cpc 0.4214 False
Germany
ctr : 0.11, arpu : -1.44, cpc : -0.51


p value ctr 0.8932 False
p value arpu 0.3946 False
p value cpc 0.3668 False
France
ctr : -8.61, arpu : -8.45, cpc : -0.12


p value ctr 0.0 True
p value arpu 0.0 True
p value cpc 0.8002 False
Australia
ctr : 5.56, arpu : 6.62, cpc : 1.19


p value ctr 0.0002 True
p value arpu 0.0012 True
p value cpc 0.078 False


### Result for each countries
We can notice that the results have changed. Therefore, in the table, I will highlight positive changes in green and negative changes in red. Columns where the test turned out to be statistically insignificant will remain gray.



| Dimension  | Metric_name | users | control | test    | uplift  | p_value |
|------------|-------------|-------|---------|---------|---------|---------|
| Total      | CTR         | 50000 | 0.231354 | 0.231047 | <span style="color:gray;">-0.70%</span>  | <span style="color:gray;">0.7614</span>  |
| Total      | ARPU        | 50000 | 0.181258 | 0.179980 | <span style="color:gray;">-0.13%</span>  | <span style="color:gray;">0.3934</span>  |
| Total      | CPC         | 50000 | 0.050072 | 0.049986 | <span style="color:gray;">-0.17%</span>  | <span style="color:gray;">0.5402</span>  |
| UK         | CTR         | 16407 | 0.199120 | 0.210292 | <span style="color:green;">5.61%</span>   | <span style="color:green;">0.0</span>     |
| UK         | ARPU        | 16407 | 0.198527 | 0.208624 | <span style="color:green;">5.08%</span>   | <span style="color:green;">0.0002</span>  |
| UK         | CPC         | 16407 | 0.049999 | 0.049811 | <span style="color:gray;">-0.37%</span>  | <span style="color:gray;">0.4176</span>  |
| Australia  | CTR         | 8582  | 0.218041 | 0.230156 | <span style="color:green;">5.55%</span>   | <span style="color:green;">0.0002</span>  |
| Australia  | ARPU        | 8582  | 0.109125 | 0.116312 | <span style="color:green;">6.58%</span>   | <span style="color:green;">0.003</span>   |
| Australia  | CPC         | 8582  | 0.049678 | 0.050270 | <span style="color:gray;">1.19%</span>   | <span style="color:gray;">0.077</span>   |
| France     | CTR         | 14088 | 0.256540 | 0.234449 | <span style="color:red;">-8.61%</span>  | <span style="color:red;">0.0</span>     |
| France     | ARPU        | 14088 | 0.206224 | 0.188777 | <span style="color:red;">-8.45%</span>  | <span style="color:red;">0.0</span>     |
| France     | CPC         | 14088 | 0.050163 | 0.050102 | <span style="color:gray;">-0.12%</span>  | <span style="color:gray;">0.7998</span>  |
| Germany    | CTR         | 10923 | 0.273747 | 0.274048 | <span style="color:gray;">0.10%</span>   | <span style="color:gray;">0.907</span>   |
| Germany    | ARPU        | 10923 | 0.178948 | 0.176339 | <span style="color:gray;">-1.45%</span>  | <span style="color:gray;">0.3896</span>  |
| Germany    | CPC         | 10923 | 0.050250 | 0.049992 | <span style="color:gray;">-0.51%</span>  | <span style="color:gray;">0.375</span>   |






Let's analyze the test.

At the overall test level, no metric showed statistically significant changes.

The United Kingdom has a positive impact on the CTR and ARPU metrics.

In France, the changes caused a negative result on the CTR and ARPU metrics.

In Germany, the changes are not statistically significant.

Australia has a positive impact on the CTR and ARPU metrics.

We can apply our changes to countries like the United Kingdom and Australia.

The changes should not be implemented in France.

For Germany, it is worth continuing the test.